1. Conclusion
1.1. - what summarizes what is most interesting about your topic?
1.1.1. (-)
1.1.1.1. greedy algorithm - local optimal decison
1.1.2. (+)
1.1.2.1. interpretability
1.1.2.2. multidamenttion data
1.1.2.3. scalability
1.2. What configuration, if any, the user can do to improve performance.
1.2.1. on the test data level
1.2.1.1. 1. Too much noise in either the test or training data sets,
1.2.1.2. 2. Over training
1.2.1.3. 3. Training and/or test data sets are not representative samples.
1.3. The type of data / patterns that algorithm is best suited to.
2. Decission tree
2.1. How your chosen algorithm works. chapter 7 page 7
2.2. Types of decision tree algorithms,
2.2.1. id3 , cd 4,5 c5.0
2.2.1.1. only work with categorical variables
2.2.1.2. assessing locally best split at the node
2.2.1.2.1. information gain
2.2.1.2.2. impurity measures
2.2.2. Oblique Decison Tree
2.2.2.1. can take in calculation combination of 2 variables
2.2.2.1.1. impurity measured based on C&R algoritm
2.2.3. CHAID
2.2.3.1. impurity chi square
2.2.4. tree pruning
2.2.4.1. two approaches
2.2.4.1.1. prepruning
2.2.4.1.2. postprunning
2.2.5. typ algorytmu podstawowego
2.2.6. class imbalance problem
2.2.7. wady
2.2.7.1. Glowna wada decision trees jest nemoznosc wychwycenia korelacji pomiedzy atrybutami bez dodatiokych obliczen
3. classification - general overview
3.1. Applications of Classyfication
3.1.1. predictive
3.1.2. could be also Descriptive
3.2. it is Supervised lerning
3.2.1. have Class lable
3.2.1.1. class labels are symbolic variable (distinkt values- work best on categorical class labels)
3.2.1.2. could be also numreic (contiues)
3.2.1.2.1. prediction problem
3.3. building a classifyer (classyfication model) by spliting data in to training and test data set
3.3.1. methods
3.3.1.1. hold out
3.3.1.1.1. limitations
3.3.1.1.2. advantage
3.3.1.2. cross validation
3.3.1.2.1. k=5
3.3.1.2.2. limitations
3.3.1.2.3. advantage
3.3.1.3. bootstrapping
3.3.1.3.1. techinque to randomely replicate training data set
3.3.2. trainng data
3.3.2.1. training error
3.3.2.1.1. if there is not enoug of traininig data that causes underfitting - undertraining
3.3.2.1.2. minimising training error lead to overfiting (too specific) over trainin
3.3.2.1.3. cost of error can help with selecting to eithere overtraing or undertraing
3.3.3. test data
3.3.3.1. generalisation error
3.3.3.1.1. how to improve accuracy of classifyer?
3.4. many algoritmes
3.4.1. decision tree
3.4.2. neural networks
3.4.3. support vectior machines
3.4.4. combain methods are becomiming popoular
3.4.5. etc.
3.5. current trends in classyfication
3.5.1. Strategic Advancements in Utilizing Data Mining and Warehousing Technologies: New Concepts and Developments
3.5.2. Trends in Data Mining and Knowledge Discovery
3.5.3. Modern Trends in Data Mining
4. Introduction
5. Abstract
6. Bibliography
6.1. unformated
6.1.1. References
6.1.1.1. Books
6.1.1.1.1. Using CHAID for classification
6.1.1.2. papers
6.1.1.2.1. CHAID
6.1.1.2.2. pruning
6.1.1.2.3. Using-Data-Mining-Techniques-to-Build-a-Classification-Model-for-Predicting-Employees-Performance
6.1.1.3. Web
6.1.1.3.1. http://www.decisiontrees.net/
6.1.1.4. Supporting sources
6.1.1.4.1. Find examples
6.1.2. Tan/Steinbach/Kumar, Introduction to Data mining, Addison Wesley.
6.1.3. Han, Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann
6.1.4. Freitas, Data Mining and Knowledge Discovery with Evolutionary Algorithms, Springer