Data Science Terminologies

Jetzt loslegen. Gratis!
oder registrieren mit Ihrer E-Mail-Adresse
Data Science Terminologies von Mind Map: Data Science Terminologies

1. GenSVM:

1.1. The GenSVM classifier is a generalized multiclass support vector machine (SVM). This classifier aims to find decision boundaries that separate the classes with as wide a margin as possible.

1.2. In GenSVM, the loss function is very flexible in the way that misclassifications are penalized. This allows the user to tune the classifier to the dataset at hand and potentially obtain higher classification accuracy than alternative multiclass SVMs.

1.3. One of the other advantages of GenSVM is that it is trained in the primal space, allowing the use of warm starts during optimization.

1.4. Moreover, this flexibility means that GenSVM has many other multiclass SVMs as special cases.

1.5. This means that for common tasks such as cross-validation or repeated model fitting, GenSVM can be trained very quickly

2. Regression

2.1. Linear regression (predict/ forecast values)

2.1.1. Cost Function/Mean Square Error: Minimisation function that minimises difference between predicted and actual values

2.1.1.1. Gradient Function: a concave graph that represents minima (can have more than 1 minimas(local minimas))

2.1.1.1.1. Learning rate: No. of steps to reach target value (bottom of concavity)

2.2. Sigmoid Regression (classification)

2.2.1. Cost Function: Takes output of the linear function (similar to linear regression) and squashes the value within the range of [0,1] using the sigmoid function

3. Support Vector Machine (SVM) - Both Forecast and classification possible

3.1. Hyperplane: A plane that separates/ classifies 2 classes. Objective is to make it such that it has maximum distances between data points of both the classes.

3.2. Support vectors: Data points nearest to either side of the Hyperplane that support the construction of the hyperplane's "structure". Any change in these vectors can change position of the Hyperplane

3.3. Input Variables: Directly affect the nature of Hyperplane. Eg. 2 variables= Hyperplane is a line, 3 variables= Hyperplane is a 2d Plane

3.3.1. 2. Linearly Inseparable, High definition data

3.3.1.1. KERNEL FUNCTIONS: ascend linearly non separable data from lower dimension into higher dimension so as to make them linearly separable

3.4. Large Margin Intuition: In SVM, we take the output of the linear function and if that output is greater than 1, we identify it with one class and if the output is -1, we identify is with another class. Since the threshold values are changed to 1 and -1 in SVM, we obtain this reinforcement range of values([-1,1]) which acts as margin. (In sigmoid, values were 0 to 1)

3.4.1. Soft margins: dotted line margins that have misclassifications/ error points, eg: if c=3, we have 3 points falling inside marginal distance

3.4.2. Hard Margins: classical dotted line margins which have no misclassification/error points

3.5. OBJECTIVE : In the SVM algorithm, we are looking to maximize the margin between the data points and the hyperplane. The loss function that helps maximize the margin is hinge loss. READ: Support Vector Machine — Introduction to Machine Learning Algorithms

3.6. TYPE OF DATA

3.6.1. 1. Linearly separable Low DImensional

3.6.2. 2. Linearly inseparable