### Even on the go

with our free apps for iPhone, iPad and Android

Already have an account?

Machine Learning by
4.6 stars - 11 reviews range from 0 to 5

# Machine Learning

## Why M.L ?

### because we need to make machines ....

think like humans

notice similarties betwen things and generate new ideas

learn from mistakes

give explanation why things went wrong

Solve problems difficult or impossible for human to solve, problems, Phenomena are changing rapidly, Application need to be customized for each user separately, No human experts, experts unable to explain thier experience

## Types

### Supervised

Regression, Learning Algorithm, Non Parametric, Locally Weighted Linear Regression, Disadvantages, every time to make a prediction , we need to fit θ again, time consuming (expensive), not practical for large data sets, Algorithm, w - weight, supposed to be big near x, τ - bandwidth, Exponential Decay function, τ large, wide bell shape for Exponential Decay function, w falls slowly with distance from x, gives weight for more neighbours points, τ small, narrow bell shape for Exponential Decay function, w falls rapidly with distance from x, gives weight for strictly close neighbours points, | Xi - X |, small - close to 0, Wi close to 1 (maximum), large - close to ∞, Wi close to 0 (minimum), Probabilistic Integration of Linear Regression, Algorithm, ε - error term, captures, Unmodelled effects ( features ), Random Noise, Gaussian distribution, Mean = Zero, Variance = σ², IID ( Independently & Identically distributed ), error terms are independent of each others, same gaussian distribution, same variance, P(ε) - density, bell-shaped curve, P(y|xθ), Gaussian distribution, Mean = θт Xi, Variance = σ², How to fit a model, L(θ), same as probability , but function in θ (with fixed Y and X), Estimate θ, maximize L(θ), maximize l(θ) (log L), minimize J(θ), Parametric, Linear Regression, Batch Gradient Descent, ALVIN, Algorithm, θ - Parameter, ɑ - learning rate, if too small, very tiny steps to convergence, if too large, may end up overshooting minimum, How to test Convergence ?, look at θ, if θ didn't change much within 2 iterations, look at J(θ), if not changed any more, Disadvantage, Time consuming (expensive), used only in small training sets, Depend on initial value of θ, Stochastic Gradient Descent (Incremental), advantages, faster, used in large training sets, to modify parameter you need to look only at the specific training example, no need to scan over entire db before adapting parameters, Fitting Models, Order, Linear, Quadratic, Higher Order ..., Problems, Underfitting, Overfitting

Classification, Fitting Models, Linear Regression, sometimes works and gives reasonable fit, Disadvantages, changing the training set, new regression will change predictions of hypothesis completely, Sol ?, Logistic Regression, Sigmoid Function (Logistic), g(z) = 1 / ( 1 + e ^ -z ), z <<, g(z) --> 0, z >>, g(z) --> 1, hθ(x) = g(θ^T x) = 1 / ( 1 + e ^ -(θ^T)X ), Probabilistic Interpretation, P(y|x;θ) = (hθ(x) )^y ( 1 - hθ(x))^(1-y), Likelihood, Maximization of log Likelihood, Algrothims, Gradient Ascent, Preception, Fisher Scoring, depend on, Newton's Raphson, depend on, Newton Method, Hessian, Multi i/p case, 2- dimension, 3- dimension, ..., infinite dimension, using SVM ( support vector machines )

### Unsupervised

Clustring, examples, Image Processing, Organizing computer Clusters, Social Network Analysis, Market Segmentation, Astronomical data analysis, Understanding genes data

### ReInforcement

Applied to, Web-Crawling, Robotics, teaching Robot to get over obstcales, teaching Car drive off road and avoiding obstacles, robotic snake, 4-legged robotic dog

idea, Reward function