Get Started. It's Free
or sign up with your email address
Data Science by Mind Map: Data Science

1. XAI

1.1. Model-agnostic

1.1.1. Global

1.1.1.1. Monotonicity

1.1.1.2. Feature Importance (or Permutation Importance)

1.1.1.3. Partial Dependence Plot

1.1.1.3.1. Individual Conditional Expectation

1.1.1.4. Global Surrogate Analysis

1.1.2. Local

1.1.2.1. LIME

1.1.2.2. SHAP

1.2. Model-specific

1.2.1. Neural Net Specific

1.2.1.1. Visualizing Weight/Bias Distribution

1.2.1.2. Filter/Layer Visualization (or Weight Visualization)

1.2.1.3. Activation Maximization

1.2.1.4. Saliency Map

1.2.1.5. Occlusion Map

1.2.1.6. Layer-Wise Relevance Propagation

1.2.1.7. Class Activation Map

1.2.1.7.1. Grad-CAM

1.3. Intrinsic models

1.4. Uncertainty Quantification

1.4.1. Uncertainty Type

1.4.1.1. Aleatoric Uncertainty

1.4.1.2. Epistemic Uncertainty

1.4.1.3. Out-of-Distribution Uncertainty

1.4.2. Estimation Methods

1.4.2.1. Gaussian Process

1.4.2.2. Bayesian Deep Learning

2. Deep learning

2.1. Backpropagation

2.2. Regularization

2.2.1. Dropout

2.2.2. Batch normalization

2.2.3. Weight decay

2.2.4. Early Stoping

2.3. Activation function

2.3.1. Softmax

2.3.2. Sigmoid

2.3.3. Tanh

2.3.4. ReLU

2.3.5. Leaky ReLU

2.4. Optimizer

2.4.1. SGD

2.4.2. RMSprop

2.4.3. Adagrad

2.4.4. Adadelta

2.4.5. Adam

2.4.6. Adamax

2.4.7. Nadam

2.4.8. RAdam

2.5. Initialization

2.5.1. He

2.5.2. Xavier

2.6. Transfer learning

2.7. DL model types by field

2.7.1. Vision

2.7.1.1. CNN

2.7.1.1.1. Classification

2.7.1.1.2. Object detection

2.7.1.1.3. Semantic segmentation

2.7.1.2. Vision Transformer(ViT)

2.7.2. Sequential data / NLP

2.7.2.1. Word embedding

2.7.2.1.1. NPLM

2.7.2.1.2. Word2Vec

2.7.2.1.3. FastText

2.7.2.1.4. LSA

2.7.2.1.5. GloVE

2.7.2.1.6. Swivel

2.7.2.2. Sentence embedding

2.7.2.2.1. LSA

2.7.2.2.2. Doc2Vec

2.7.2.2.3. LDA

2.7.2.2.4. Language Model

2.7.2.3. RNN

2.7.2.3.1. Vanilla RNN

2.7.2.3.2. LSTM

2.7.2.3.3. GRU

2.7.2.4. Seq2Seq

2.7.2.5. Attention

2.7.2.5.1. Transformer

2.7.3. RL

2.7.3.1. Deep RL

2.7.3.1.1. Key concept

2.7.3.1.2. Policy gradient

2.7.3.1.3. Value based

2.7.3.1.4. Model-based

2.7.3.1.5. Inverse RL

2.7.3.1.6. Meta-learning

2.7.4. Generative models

2.7.4.1. VAE

2.7.4.2. GAN

2.7.4.2.1. DCGAN

2.7.4.2.2. CGAN

2.7.4.2.3. LSGAN

2.7.4.2.4. WGAN

2.7.4.2.5. EBGAN

2.7.4.2.6. BEGAN

2.7.4.2.7. InfoGAN

2.7.4.2.8. CycleGAN

2.7.4.2.9. SGAN

2.7.4.2.10. SAGAN

2.7.4.3. Style transfer

2.8. Accelaration

2.8.1. GPGPU

2.8.1.1. CUDA

2.8.1.1.1. OpenACC

2.8.1.2. OpenCL

2.8.2. Accelerator

2.9. Neural architecture search

2.9.1. Random Search

2.9.2. RL

2.9.3. Gradient-based Methods

2.9.4. Evolutionary Methods

2.9.5. Bayesian Optimization

2.10. Framework

2.10.1. Tensorflow

2.10.1.1. 2.0

2.10.2. Keras

2.10.3. PyTorch

2.10.4. Torch

2.10.5. MXNet

2.10.5.1. Gluon

2.10.6. Caffe

2.10.7. CNTK

3. Machine Learning

3.1. Supervised Learning

3.1.1. Naive Bayes Classifier

3.1.1.1. MAP vs. MLE

3.1.2. k-Nearest Neirhbours

3.1.3. Linear Regression

3.1.3.1. Lasso(L1 regularization)

3.1.3.2. Ridge(L2 regularization)

3.1.3.3. Elastic Net

3.1.4. Logistic Regression

3.1.5. Tree-based Models

3.1.5.1. Decision Tree

3.1.5.1.1. ID3

3.1.5.1.2. CART

3.1.5.2. Ensemble

3.1.5.2.1. Voting

3.1.5.2.2. Bagging

3.1.5.2.3. Boosting

3.1.5.2.4. Stacking

3.1.6. Artificial Neural Network

3.1.7. Support Vector Machine

3.2. Unsupervised Learning

3.2.1. Clustering

3.2.1.1. k-Means Clustering

3.2.1.2. Hierarchical Clustering

3.2.1.3. Mixture Model

3.2.1.3.1. Gaussian

3.2.1.3.2. Bernoulli

3.2.1.3.3. EM algorithm

3.2.1.4. DBSCAN

3.2.2. Dimensionality Reduction

3.2.2.1. Principal Component Analysis

3.2.2.2. Linear Discriminant Analysis

3.2.2.3. Singular Value Decomposition

3.2.2.4. t-distributed Stochastic Neighbor Embedding(t-SNE)

3.2.2.5. Non-Negative Matrix Factorization

3.2.2.6. Autoencoder

3.2.3. Self-Supervised Learning

3.2.3.1. Pretext Task

3.2.3.1.1. Exemplar

3.2.3.1.2. Context Prediction

3.2.3.1.3. Jigsaw Puzzle

3.2.3.1.4. Autoencoder-based Approach

3.2.3.1.5. Count

3.2.3.1.6. Multi-task

3.2.3.1.7. Rotation

3.2.3.2. Contrastive Learning

3.2.3.2.1. NPID

3.2.3.2.2. CPC

3.2.3.2.3. SimCLR

3.2.3.2.4. MoCo

3.2.3.2.5. BYOL

3.2.3.3. Evaluation Method

3.2.3.3.1. Linear Evaluation (=Task Generalization)

3.2.3.3.2. Fine Tuning (=Dataset Generalization)

3.3. Recommendation System

3.3.1. Content-based filtering

3.3.2. Collaborative filtering

3.3.2.1. Nearest Neighbor

3.3.2.2. Latent Factor

3.3.2.2.1. Matrix Factorization

4. Visualization

4.1. R

4.1.1. Plot Generation Tools

4.1.1.1. ggplot2

4.1.2. Documents

4.1.2.1. R Markdown

4.1.3. Web App

4.1.3.1. Shiny

4.1.3.1.1. Shiny Dashboard

4.1.3.1.2. flexdashboard

4.2. Visualization Design

5. Data Analysis Process

5.1. Data Preprocessing

5.1.1. Missing data

5.1.1.1. Imputation

5.1.1.1.1. Single Imputation

5.1.1.1.2. Multiple Imputation

5.1.2. Outlier

5.1.2.1. Remove outlier with box plot(1.5 IQR)

5.1.3. Feature Rescaling

5.1.3.1. Normalization

5.1.3.2. Standardization

5.2. Data Augmentation

5.2.1. Vision

5.2.1.1. Flip

5.2.1.2. Rotation

5.2.1.3. Scale

5.2.1.4. Crop

5.2.1.5. Translation

5.2.1.6. Gaussian noise

5.2.1.7. Shift

5.2.1.8. Shear

5.3. Evaluation

5.3.1. Cross Validation

5.3.1.1. k-fold Cross Validation

5.3.1.2. Leave-One-Out Cross Validation

5.3.2. Learning Curves

5.3.3. Bias-Variance Trade-off

5.3.3.1. Deep Double Descent

5.3.4. Evaluation Metrics

5.3.4.1. Confusion Matrix

5.3.4.1.1. Accuracy

5.3.4.1.2. Precision

5.3.4.1.3. Recall

5.3.4.1.4. F1 score(DICE)

5.3.4.2. ROC curve

5.3.4.2.1. AUC score

5.3.4.3. RMSE

5.3.4.4. Silhouette coefficient (for clustering)

5.4. ML guideline

5.4.1. Google ML guides

6. Data Engineering

6.1. Hadoop

6.2. Spark

6.3. Docker

6.4. Kubernetes

6.4.1. Kubeflow

6.5. ELK stack

6.5.1. Elastic Search

6.5.2. Logstash

6.5.3. Kibana

6.5.4. Filebeat

7. Mathematics

7.1. Linear Algebra

7.2. Statistics

7.2.1. Descriptive Statistics

7.2.1.1. Representative value

7.2.1.1.1. Mean

7.2.1.1.2. Median

7.2.1.1.3. Mode

7.2.1.1.4. Quartile

7.2.1.1.5. Percentile

7.2.1.2. Variance

7.2.1.2.1. Quartile

7.2.1.2.2. Deviation

7.2.1.2.3. Standard deviation

7.2.1.2.4. Coefficient of variation(CV)

7.2.1.3. Spearman's rank correlation coefficient

7.2.1.4. Correlation

7.2.1.4.1. Pearson's product-moment coefficient

7.2.1.4.2. Rank correlation coefficients

7.2.1.5. Statistical graphics (EDA)

7.2.1.5.1. Bar chart & Histogram

7.2.1.5.2. Line chart

7.2.1.5.3. Box plot

7.2.1.5.4. Scatter plot

7.2.2. Inferential Statistics

7.2.2.1. Estimation

7.2.2.1.1. Point Estimation

7.2.2.1.2. Interval Estimation

7.2.2.2. Testing of statistical hypothesis

7.2.2.2.1. One sample test

7.2.2.2.2. Two sample test

7.2.2.2.3. Paired sample test

7.2.2.2.4. n sample test (n >= 3)

7.2.2.2.5. 상관분석

7.2.2.2.6. 대응분석

7.3. Probability

7.3.1. Event

7.3.2. Random variable

7.3.3. 3 axioms of Probability

7.3.4. Probability Distribution

7.3.4.1. Uniform

7.3.4.2. Binomial

7.3.4.3. Normal(Gaussian)

7.3.4.3.1. Standardized normal distribution(z-distribution)

7.3.4.3.2. Skewness

7.3.4.3.3. Kurtosis

7.3.4.4. Poisson

7.3.4.5. Chi-squared

7.3.4.6. F

7.3.4.7. Student's t

7.3.5. Random/Stochastic Process

7.4. Optimization Theory

8. Collaboration

8.1. Git

8.2. Code Review

8.3. Coding Style Guide

9. Study Materials

9.1. Short Introduction/Guide

9.1.1. Facebook Field Guide to ML

9.1.2. Made With ML - Topics

9.2. General

9.2.1. Andrew Ng ML(Coursera)

9.2.2. Deep Learning Book

9.2.3. PRML

9.2.4. Andrew Ng <Machine Learning Yearning >

9.2.4.1. 한글번역

9.3. Statistics

9.3.1. Computer Age Statistical Inference

9.4. Vision

9.4.1. cs231n (Vision)

9.5. NLP

9.5.1. cs224d (NLP)

9.6. RL

9.6.1. cs294-112 (Deep RL)

9.6.2. David Silver RL

9.6.3. Sutton RL Book

9.6.4. Deep RL Bootcamp

9.7. CS294-158 (Deep Unsupervised Learning, Berkeley)

9.8. SOTA summary (2019 Jan)

9.9. Math

9.9.1. Linear Algebra

9.9.1.1. Essence of Linear Algebra (3Blue1Brown)

9.9.1.2. Computational Linear Algebra (Rachel Thomas)

9.10. Kaggle

9.10.1. Kaggle competition winner kernels

9.10.2. 이유한님 발표자료

9.10.3. 캐글 커널 스터디 커리큘럼

9.10.4. Kaggle Winning Solutions

9.10.5. How to Win a Data Science Competition: Learn from Top Kagglers | Coursera

9.11. Interview

9.11.1. 김태훈님 ML interview 준비 자료

9.11.2. 변성윤님 DS interview 질문 모음

9.12. Advanced courses