ML Project Steps
by Yogesh Linganna
1. 1. Requirements & Scope of Work
1.1. 1. Define Project Statement & Objectives 2. Define Project Milestones, Technical Stack and deliverables
1.2. Tooling: JIRA Software, Confluence, Word, Excel, etc
2. 2. Data Collection
2.1. 1. Data Discovery & Collection( Internal, External ) 2. Access Control 3. Compliance
2.2. Tooling: Web Scraping Libraries ( Beautiful soup, scrapy) API Interaction libraries Database interaction libraries Data Extraction Tools Data Storage and Management Tools ( Cloud Storage - S3, Data Warehouse - Redshift, Snowflake, BigQuery, Data Lakes - HDFS
3. 3. Data Preparation
3.1. 1. Cleaning - Remove invalid data, handle Outliers 2. Transformation 3. Labelling
3.2. Tooling: Pandas, NumPy, Apache Spark
4. 4.EDA - Exploratory Data Analysis
4.1. 1. Data Visualization 2. Identify patterns and trends 3. Univariate,Bivariate,Multivariate analysis
4.2. Tooling: Pandas,NumPy, matploblib, seaborn, Apache Spark, etc
5. 5. Feature Enginering
5.1. Feature creation, selection, scaling. Raw & Derived features.
5.2. Tooling: Numpy, scikit learn, pandas, etc
6. 6.Model Selection and Training
6.1. 1. Algorithm Selection - Regression, Classification, Clustering, etc 2. Train-Test-Split
6.2. Tooling: Jupyter, Statistical ML - (scikit learn, XGBoost) , Deep Learning - (PyTorch, Tensor Flow), etc
7. 7.Model Evaluation
7.1. 1. Model Evaluation Metrics (Accuracy, Prediction, Recall & F1 Score) 2. K-fold cross validation
7.2. Tooling: Jupyter, scikit learn, XGBoost, PyTorch,TensorFlow,etc
8. 8.Model Fine Tuning
8.1. 1. Hyperparameter tuning, 2. Transfer learning 3. Grid Seach CV(Cross Validation)
8.2. Tooling: scikit learn, XGBoost, Tensorflow, PyTorch
9. 9.Model Deployment
9.1. 1. Deploy model to environment 2. API or Web Application 3. Integrate with other systems
9.2. Tooling: Amazon Sagemaker, Azure ML, docker, FastAPI, etc
10. 10. Monitoring & Feedback
10.1. 1. Track model performance 2. Collect user feedback 3. Iterate and improve
10.2. Tooling: Amazon Sagemaker, mlflow,Azure ML,etc