Mahout Classification

Classification of unstructured data using Mahout classification

Começar. É Gratuito
ou inscrever-se com seu endereço de e-mail
Mahout Classification por Mind Map: Mahout Classification

1. Problem

1.1. Text content in the digital form is exploding.

1.2. The Google search engine alone indexes over 20 billion web documents

1.3. The estimated size of text data (both public and private) could go well beyond the petabyte range

1.4. machine learning algorithms like clustering and classification to figure out some structure and meaning in this unstructured world

2. Steps

2.1. Collect data

2.2. Prepare the input data

2.3. Analyze the input data

2.3.1. Preprocessing raw data

2.3.1.1. Raw data is rearranged into records with identical fields

2.3.1.1.1. continuous

2.3.1.1.2. categorical

2.3.1.1.3. word-like

2.3.1.1.4. text-like

2.3.2. Converting data to vectors:

2.3.2.1. One cell per word

2.3.2.2. Vectors as bag of words

2.3.2.3. Feature Hashing

2.4. algorithm to train the classifier

2.4.1. Stochastic gradient descent (SGD)

2.4.1.1. Small to medium (less than tens of millions of training examples)

2.4.1.1.1. OnlineLogisticRegression

2.4.1.1.2. CrossFoldLearner

2.4.1.1.3. AdaptiveLogisticRegression

2.4.2. Naive Bayes/Complementary naive Bayes

2.4.2.1. Medium to large (millions to hundreds of millions of training examples)

2.4.3. Random forests

2.4.3.1. Small to medium (less than tens of millions of training examples)

2.4.3.1.1. Leo Breiman’s random forests

2.5. Test the model

2.6. Use the model

3. Applications

3.1. Books classification

3.2. mail spam detection

3.3. credit card fraud detection

3.4. Tweets categorization

3.5. search based classification

3.5.1. classification of data related to firewall,switches and router domains.

3.6. News Classification

3.7. Sentiment analysis

3.8. Classification of URLs

3.9. Bank marketing

3.10. Hand written digits Recognition

3.11. Diabetes detector

3.12. Cancer Detection

3.13. Insurance or telecommunication

3.14. classifying movies