Posts

Showing posts from July, 2017

Basic Workflows/Algorithms for machine Learning in Text Processing

Web Extraction:  Apache Tika (or Any Custom Crawler) Concept Extraction      Sentence Detection (maxent-3.0.0.jar, opennlp-tools-1.5.3.jar)      NER(Named Entity Recognition)      Concept Extraction (snowball-stemmer-1.3.0.581.1.jar)      Multi phrase to list phrase Concept Filtering    Zipf Filtering   ChiSq Filtering   Low frequency Filtering   Signal Filtering String Indexer:     StringIndexer (spark-mlib_2.10-2.1.0.jar) Hashing TF     HashingTF (spark-mlib_2.10-2.1.0.jar)     IDF (spark-mlib_2.10-2.1.0.jar) Classifier for Supervised Classification Algorithms:    [Train & Predict] . Naive Bayes   . Support Vector Machines (C Value, Gamma Value & Karnel) . Decision Trees    (Entropy, Min-Sample-Slits, Impurity, Information Gain) . K Nearest Neighbors . Random Forest .  Linear Regression/ Logistic Regression . Adaboost (Sometimes Also Called Boosted Decision Tree) UnSupervised Classification algorith