1. Problem
1.1. Classification
1.2. Clustering
1.3. Regression
1.4. Anomaly detection
1.5. Association rules
1.6. Reinforcement learning
1.7. Structure prediction
1.8. Feature learning
1.9. Online learning
1.10. Semi-supervised learning
1.11. Grammar induction
2. Approach
2.1. Decision learning
2.1.1. Use
2.1.1.1. Classification tree analysis
2.1.1.2. Regression tree analysis
2.1.2. Advanced
2.1.2.1. Simple
2.1.2.2. Require little data to representation
2.1.2.3. Use a white box model
2.1.2.4. Able to handle both numerical and category data
2.1.2.5. Validated by statistic test
2.1.2.6. Robust
2.1.2.7. Perform well with large dataset
2.1.3. Limited
2.1.3.1. NP-hard, go to local-optimal
2.1.3.2. Can create over-complex tree
2.1.3.3. Some data hard to learn
2.1.3.4. Bias in favor on multi levels data
2.1.4. Algorithm
2.1.4.1. ID3 (Iterative Dichotomiser 3)
2.1.4.2. C4.5 (successor of ID3)
2.1.4.3. CART (Classification And Regression Tree)
2.1.4.4. CHAID (CHi-squared Automatic Interaction Detector).
2.1.4.5. MARS: handle numerical data better.
2.1.4.6. Conditional Inference Trees. Statistics-based approach
2.2. Association rule learning
2.2.1. Discover interesting relations between variable in large database
2.2.2. Use
2.2.2.1. Web usage mining
2.2.2.2. Instrution detection
2.2.2.3. Continuous production
2.2.2.4. Bioinformatics
2.3. Artificial neural network
2.3.1. Statistic learning models
2.3.2. Use to estimate or approciate function has large number input and unknow
2.3.3. Neural: adaptive weight & capable of non-linear function
2.3.4. ANN defined three type of params
2.3.4.1. Interconnection pattern
2.3.4.2. Learning process to update weights
2.3.4.3. Activation functions to convert neural's weight input to output activation
2.3.5. Perpective views
2.3.5.1. mathematics (composition of funtions)
2.3.5.2. probabilistic (graph model)
2.3.6. Learning paradigms
2.3.6.1. Supervised learning
2.3.6.1.1. Cost function: mean square error, gradient descent
2.3.6.1.2. Use: pattern recognition, sequence data
2.3.6.2. Unsupervised learning
2.3.6.2.1. Cost function: prior know
2.3.6.2.2. Use: estimation, filtering, clustering
2.3.6.3. Reinforcement learning
2.3.6.3.1. Use: sequential decision making, control problem
2.4. Inductive logic programming
2.5. Support vector machines
2.6. Clustering
2.7. Bayes neural network
2.7.1. Probabilistic graphic model
2.7.1.1. Random variables
2.7.1.2. Conditional dependencies
2.7.2. Inference and learning
2.7.2.1. Unobserved variables
2.7.2.1.1. Variable elimination
2.7.2.1.2. Clique tree propagation