Data Science / AI

Road Map For Data Science

Comienza Ya. Es Gratis
ó regístrate con tu dirección de correo electrónico
Data Science / AI por Mind Map: Data Science / AI

1. Statastics (4 Days)

1.1. Descriptive

1.1.1. Measure of Central Tendency

1.1.1.1. Mean

1.1.1.2. Mode

1.1.1.3. Median

1.1.2. Measure of Variation

1.1.2.1. Range

1.1.2.2. Variance

1.1.2.3. Standard Deviation

1.2. Inferential

1.2.1. Hypothesis Testing

1.2.1.1. Types

1.2.1.1.1. One -Tail Test

1.2.1.1.2. Two Tail Test

1.2.1.2. Confidence Intervals

1.2.1.2.1. Alpha Value

1.2.1.2.2. P Value

1.2.1.3. Error Types

1.2.1.3.1. Type 1 Error

1.2.1.3.2. Type 2 Error

1.2.1.4. Test Statastic

1.2.1.4.1. Z - Test

1.2.1.4.2. T- Test

1.2.1.4.3. Chi-sq Test

1.2.1.4.4. Anova

1.2.2. Co Relation

1.2.2.1. Multi Corelation

1.2.2.2. Auto Corelation

1.2.3. Co Variance

1.2.4. Regression Analysis

2. Probaility (4 Days)

2.1. Random Variables

2.1.1. Types

2.1.1.1. Numerical

2.1.1.1.1. Discrete

2.1.1.1.2. Continuous

2.1.1.2. Categorical

2.1.2. Levels of measurement

2.1.2.1. Qualitative

2.1.2.1.1. Nominal

2.1.2.1.2. Ordinal

2.1.2.2. Quantitative

2.1.2.2.1. Interval

2.1.2.2.2. Ratio

2.2. PDF

2.2.1. Discrete

2.2.1.1. Binomial

2.2.1.2. Bernouli

2.2.1.3. Poisions

2.2.2. Continuous

2.2.2.1. Data Distributions

2.2.2.1.1. Left Skewed

2.2.2.1.2. Normal Distribution

2.2.2.1.3. Right Skewed

2.2.2.2. Central Limit Theorem

2.2.2.3. Chi-sq Test

2.2.2.4. T Test

2.3. Types of Probabilities

2.3.1. Joint probability

2.3.2. Conditional Probability

2.4. Bayes Theorem

3. Python (15 Days)

3.1. Basics

3.1.1. Data Structures

3.1.2. Conditional Statements

3.1.3. Loops

3.1.4. Functions

3.1.5. Oops

3.1.6. Regular Expressions

3.2. Libraries

3.2.1. Numpy

3.2.2. Pandas

3.2.3. Beautiful Soup

3.2.4. Scikit Learn

3.3. IDE

3.3.1. Anaconda

3.3.1.1. Jupyter

3.3.1.2. Spyder

3.3.2. Google Colab

3.3.3. Pycharm

3.3.4. Visual Stuido

4. Data Preprocessing (10 Days)

4.1. Data Preparation

4.1.1. Data Cleaning

4.1.1.1. Handling Missing Values

4.1.1.2. Encoding

4.1.1.3. Handling Outliers

4.1.1.4. Binning

4.1.1.5. Data Deduplication

4.1.2. Feature Engineering

4.1.2.1. Feature Selection

4.1.2.1.1. Sampling

4.1.2.2. Feature Scaling

4.1.2.2.1. Normalization

4.1.2.2.2. Standardization

4.1.2.2.3. Robust Scaler

4.1.2.3. Feature Transformation

4.1.2.3.1. Box-Cox

4.1.2.3.2. log / square / cube

4.1.2.4. Feature Spilt

4.1.2.5. Feature Extraction

4.1.2.6. Feature Generation

4.2. Exploratory Data Analysis

4.2.1. Methods

4.2.1.1. Quantitative

4.2.1.2. Graphical

4.2.1.2.1. Python

4.2.1.2.2. Power BI

4.2.1.2.3. Tabuleau

4.2.2. Types

4.2.2.1. Uni Variant Analysis

4.2.2.2. Bi Variant Analysis

4.2.2.3. Multi Variante Analysis

5. Machine Learning (20 Days)

5.1. Model Building

5.1.1. Supervised Learning

5.1.1.1. Regression

5.1.1.1.1. Linear

5.1.1.1.2. Multi Linear

5.1.1.1.3. Polynomial

5.1.1.1.4. Regularization

5.1.1.2. Classification

5.1.1.2.1. Logistic Regression

5.1.1.2.2. KNN

5.1.1.2.3. Navie Bayes

5.1.1.2.4. SVM

5.1.1.2.5. Decision Tress

5.1.1.2.6. Types

5.1.2. Unsupervised Learning

5.1.2.1. Clustering

5.1.2.1.1. K-Means

5.1.2.1.2. Hierarchical Clustering

5.1.2.1.3. Mean Shift

5.1.2.1.4. DB Scan

5.1.2.1.5. Fuzzy C Means

5.1.2.2. Dimensionality Reduction

5.1.2.2.1. PCA

5.1.2.2.2. SVD

5.1.2.2.3. LDA

5.1.3. Ensemble Learning

5.1.3.1. Stacking

5.1.3.2. Boosting

5.1.3.2.1. Adaboost

5.1.3.2.2. Gradient Boosting

5.1.3.2.3. XG Boost

5.1.3.2.4. Cat , Light

5.1.3.3. Bagging

5.1.3.3.1. Random Forest

5.1.3.3.2. Decision Trees

5.1.4. Reinforcement Learning

5.2. Model Evaluation

5.2.1. Bias - Variance Tradeoff

5.2.2. Performance Metrices

5.2.2.1. Regression

5.2.2.1.1. MAE

5.2.2.1.2. MSE

5.2.2.1.3. RMSE

5.2.2.1.4. R - Square

5.2.2.1.5. Adjusted R Square

5.2.2.2. Classification(Confusion Matrix)

5.2.2.2.1. Precision

5.2.2.2.2. Accuracy

5.2.2.2.3. Re Call

5.2.2.2.4. F Score

5.2.2.2.5. Specificity

5.2.3. Model Fit

5.2.3.1. Under Fitting

5.2.3.2. Over Fitting

5.2.3.3. Best Fit

5.2.4. Hyperparameter Tuning

5.2.4.1. Grid Search

5.2.4.2. Random Search

5.2.5. Cross Validation

5.2.5.1. K - Fold

5.2.5.2. Stratified K - Fold

6. Deep Learning (10 Days)

6.1. Terminologies

6.1.1. Perceptrons

6.1.2. Layers

6.1.2.1. Input

6.1.2.2. Output

6.1.2.3. Hidden

6.1.3. Weight Matrix

6.1.3.1. Weights

6.1.3.2. Bias

6.1.4. Learning Rate

6.1.5. Epoch

6.1.6. Local - Global Minima

6.1.7. Early Stooping

6.1.8. Dropout Layer

6.2. ANN

6.2.1. Forward Propagation

6.2.1.1. Activation Functions

6.2.1.1.1. Sigmoid

6.2.1.1.2. Tanh

6.2.1.1.3. RELU

6.2.1.1.4. Leaky RELU

6.2.1.1.5. MaxOut

6.2.1.1.6. ELu

6.2.1.1.7. Soft Max

6.2.1.1.8. Swish

6.2.2. Back Propagation

6.2.2.1. Chain Rule

6.2.2.2. Loss Functions

6.2.2.2.1. Regression

6.2.2.2.2. Binary Classification

6.2.2.2.3. Multi Class Classification-

6.2.2.3. Optimizers

6.2.2.3.1. Gradient Descent

6.2.2.3.2. SGD With Momentum

6.2.2.3.3. Ada Grad

6.2.2.3.4. NAG

6.2.2.3.5. Ada delta

6.2.2.3.6. Adam

6.2.2.3.7. RMSprop

6.3. Frame Works

6.3.1. Tensor Flow

6.3.2. PyTorch

6.3.3. Keras

6.4. Unsupervised DL

6.4.1. Auto Encoders

6.4.1.1. Sparse Auto Encoders

6.4.1.2. Denoising

6.4.1.3. Contractive

6.4.1.4. Generative Models

6.4.1.4.1. Variationally Auto Encoders

6.4.1.4.2. GAN's

6.4.2. Boltzman Machines

6.4.3. Data Augmentation

7. Time Series Analysis (7 Days)

7.1. Components of Time Series

7.1.1. Trend

7.1.2. Seasonality

7.1.3. Cycle

7.1.4. Stationary / Non-Stationary

7.2. Smoothing Techniques

7.2.1. AR

7.2.2. MA

7.3. Steps in Time Series

7.3.1. Check stationary

7.3.1.1. Dicky Fuller Test

7.3.2. Stationarize

7.3.2.1. De-Trending

7.3.2.2. Differencing

7.3.2.3. Decomposition

7.3.2.4. Moving Average

7.3.2.5. De Seasonality

7.3.2.6. Log Transformation

7.3.3. ACF/ PACF Plots

7.3.4. Build ML Models

7.3.4.1. AR-MA

7.3.4.2. ARIMA

7.3.4.3. SARIMA

7.3.4.4. SARIMAX

7.4. Error Measures

8. Recommendation Systems (4 days)

8.1. Types

8.1.1. Content Based Filtering

8.1.2. Collaborative Filtering

8.1.2.1. Item - Item

8.1.2.2. User - User

8.2. Cold Start Problem

8.3. Error Measures

9. NLP ( 8 Days)

9.1. Types

9.1.1. NLU

9.1.2. NLG

9.2. Components of NLP

9.2.1. Morphological & Lexical Analysis

9.2.2. Syntactic Analysis

9.2.3. Semantic Analysis

9.2.4. Discourse Integration

9.2.5. Pragmatic Analysis

9.3. Terminology

9.3.1. Corpus

9.3.2. Parsing

9.3.3. Tokens

9.3.4. Tokenization

9.3.5. Lexicon

9.4. Text Cleaning

9.4.1. Tokenization

9.4.2. Noise Entities Removal

9.4.3. Removal of Stop Words

9.4.4. POS Tagging

9.4.5. Normalization

9.4.5.1. Stemming

9.4.5.2. Lemmatization

9.5. Text Representation in Vector Space

9.5.1. Bag of Words

9.5.2. Word Embeddings

9.5.2.1. Word2Vec

9.5.2.1.1. CBOW

9.5.2.1.2. Skip Gram

9.5.2.2. Glove

9.5.3. SVD

9.5.4. TF-IDF

9.5.5. Count Vectorizer

9.6. Topic Modelling

9.6.1. LDA

9.7. Sequential Modeling

9.7.1. RNN

9.7.1.1. One To Many

9.7.1.2. Many To Many

9.7.1.3. Many To One

9.7.2. LSTM

9.7.3. GRU

9.8. Transfer Learning

9.8.1. BERT

9.8.2. GTP 2

9.9. Libraries

9.9.1. NLTK

9.9.2. Spacy

9.9.3. Genism

9.9.4. Core NLP

9.9.5. Text Blob

9.9.6. Hugging Face

9.10. Conversational AI

9.10.1. Text (Chat Bots)

9.10.1.1. Azure Bot Framework

9.10.1.2. Amazon lex

9.10.1.3. Google Dialogflow

9.10.1.4. Rasa

9.10.1.5. kore.ai

9.10.2. Audio

10. Computer Vision (8 Days)

10.1. OpenCV

10.1.1. Reading / Storing / Writing Images

10.1.2. Resizing / Rotating / Cropping the Image

10.1.3. Drawing Functions

10.1.4. Changing Image Colors / Channels

10.1.5. Spilting / Merging Images

10.1.6. Accessing / Modifying Pixel Values

10.1.7. Accessing / Modifying Image Properties

10.1.8. Reading Edges

10.1.9. Image Filter Functions

10.1.10. Thresholding

10.1.11. Transformation

10.1.12. Extracting the Region of Interest (ROI)

10.1.13. HOG

10.2. Essentials

10.2.1. Bounding Boxes

10.2.2. IOU

10.2.3. Anchor Boxes

10.2.4. Regional Proposals

10.2.5. Non - Max supression

10.3. Functions

10.3.1. Classification

10.3.2. Segmentation

10.3.3. Localization

10.3.4. Object Detection

10.4. CNN

10.4.1. Terminologies

10.4.1.1. Padding

10.4.1.1.1. Valid

10.4.1.1.2. Same

10.4.1.2. Stride

10.4.1.3. Pooling layer

10.4.1.3.1. Max

10.4.1.3.2. Average

10.4.1.3.3. Sum

10.4.1.4. Convolution Layer

10.4.2. Architectures

10.4.2.1. VGG - 16 , 19

10.4.2.2. RCNN : Fast , Faster , Mask RCNN

10.4.2.3. YOLO

10.4.2.4. Lenet 5

10.4.2.5. AlexNet

10.4.2.6. Resnet

10.4.2.7. Inception

11. Deployment (5 Days)

11.1. Frame Works

11.1.1. Flask

11.1.2. Django

11.2. Clouds

11.2.1. AWS Sage Maker

11.2.2. Azure

11.2.3. GCP

11.2.4. Heroku

12. Resources

12.1. Websites

12.1.1. Towards Data Science

12.1.2. Math is Fun

12.1.3. Data camp

12.1.4. Analytics Vidya

12.1.5. Medium

12.1.6. Kd Nuggets

12.1.7. Machine Learning Mastery

12.2. You Tube

12.2.1. Krish Naik

12.2.2. Codebasics (Computer Vision)

12.2.3. FreeCodeCamp

12.2.4. statquest (Stats)

12.2.5. Telusko (Python)

12.3. Datasets

12.3.1. Scikit-learn datasets

12.3.2. Kaggle

12.3.3. UCI Machine Learning Repository

12.3.4. Government Datasets

12.3.5. Google's Dataset Search Engine

12.3.6. Registry of Open Data on AWS

12.3.7. Microsoft Datasets

12.3.8. Awesome Public Datasets Collection

12.3.9. Visualdata

12.4. Increase Coding Skills

12.4.1. Hackeerank

12.4.2. LeetCode

12.4.3. HackerEarth

12.4.4. CodeChef

12.4.5. Geeks for Geeks

13. Profile Building (3 Days)

13.1. Linkedin

13.2. Github

13.3. Writing Blogs

13.4. Portfolio

13.5. Resume Making

13.5.1. Jobscan's

13.5.2. CakeResume

13.5.3. Resume Genius

13.5.4. Zety

13.5.5. Overleaf

13.5.6. My Perfect Resume

13.5.7. NovoResume

13.5.8. KickResume

13.6. Apply for Jobs

14. Further

14.1. Pipelines

14.2. MLOps

14.3. AIOps

14.4. Big Data

14.5. No Code ML

14.5.1. PyCaret

14.5.2. BigML

14.5.3. Create ML

14.5.4. Google Cloud AutoML

14.5.5. RunwayML

15. Tips

15.1. Never Underestimate role of datapreprocessing

15.2. Just use one and stick to it

15.3. Focus on one course

15.4. Practice more

15.5. Don’t spend too much time on theory

15.6. Get engaged in Data Science communities

15.7. Narrow down your expertise

15.8. You don’t have to know everything beforeapplying to jobs

15.9. Study Research Papers

15.10. Keep up to date with trends

15.11. Participate in Hackathons

15.12. Case Studies

16. Jobs

16.1. Roles

16.1.1. Data Analyst

16.1.2. Data Engineer

16.1.3. Data Scientist

16.1.4. ML Engineer /Developer

16.1.5. NLP Engineer

16.1.6. CV Engineer

16.1.7. AI Engineer / Developer

16.1.8. Business Analyst

16.1.9. BI Developers

16.1.10. Researchers

16.1.11. MLops Engineers

16.2. Places To Hunt

16.2.1. Naukri

16.2.2. Linkedin

16.2.3. AngelList

16.2.4. Cut Short

16.2.5. Hirist

16.2.6. Indeed

16.3. Companies

16.3.1. The Math Company

16.3.2. Genpact

16.3.3. Tredence Analytics

16.3.4. Tiger Analytics

16.3.5. Ugam

16.3.6. Fractel Analytics

16.3.7. GE Health Care

16.3.8. Bridgei2i

16.3.9. Latentview

17. Introduction (2 Days)

17.1. Why

17.1.1. High Demand

17.1.2. High Pay

17.1.3. Respected Role

17.1.4. Versatile Career

17.2. Required

17.2.1. Domain Knowledge

17.2.2. Programming

17.2.2.1. Python/R

17.2.2.2. SQL

17.2.2.3. Web Scrapping

17.2.2.4. Version Control

17.2.3. Mathematics

17.2.3.1. Linear Algebra

17.2.3.2. Statistics & Probability

17.2.3.3. Matrix Manipulation

17.2.3.4. Calculus

17.2.4. Communication Skills (Story Telling)

17.2.5. Analytical Mindset

17.3. Life Cycle

17.4. Basic Terminology

17.4.1. Data

17.4.1.1. Types

17.4.1.1.1. Structured

17.4.1.1.2. Semi Structured

17.4.1.1.3. Un Structured

17.4.1.2. Analysis

17.4.1.2.1. Descriptive Analysis

17.4.1.2.2. Diagnostic Analysis

17.4.1.2.3. Predictive Analysis

17.4.1.2.4. Prescriptive Analysis

17.4.2. Population - Sample

17.4.3. Dependent - Independent Variables

17.4.4. DS vs ML vs Dl vs AI