# NLP (MOC) ## 📓Notes NLP or "natural language processing" is the method by which we "teach" a computer to read text. Since computers don't understand words, we have to convert text into numbers. The challenge is to convert it in such a way that "somehow" not only explains the meaning of the word, but perhaps only maintains it's role within a sentence, and the connection it has to the general meaning of it. The best way to do that is through [[TF-IDF]], a method of converting text into a vector. The other is a [[Naive Bayes classifier]] . ### Types of Analysis With NLP, you can: 1. [[Text Classification]] - For example to identify and categorize text as either "spam" or not spam 2. [[Text Generation]] - The basis for all Chat AI models, that generate text based on a prompt 3. [[Sources/References/Sentiment Analysis]] - To analyze whether a text (perhaps a review) is either positive, negative, or neutral 4. [[Topic Modeling]] - To group text by topic, for example news articles into political, economics, etc. ### Techniques Most common features for NLP: 1. [[Named Entity Recognition]] - To detect popular names such as companies within the text 2. [[Regex]] - To search for matches within the text based on a special pattern. Also see [[pattern matching]] 3. [[Tokenization]] - To break town a sentence into base components (which can then be converted into a vector) ### 📥Unsorted Notes ```dataview LIST FROM [[NLP (MOC)]] AND -outgoing([[NLP (MOC)]]) AND !#Type/MOC sort file.name asc ``` ## 📧Sources ### Courses [[natural language processing course]] [[NLP with python]] ### Websites ## 🌐Other MOC ### Overview