Data Science for Everyone

Get Started. It's Free
or sign up with your email address
Data Science for Everyone by Mind Map: Data Science for Everyone

1. Introduction to Python

2. Introduction to Importing in Python

3. Intermediate importing in Python

4. Python Data Science Tool box Part 1

5. Python Data Science tool box part 2

6. Data Manipulation with Pandas

7. Joining data with pandas

8. Data science for everyone

8.1. What is data science ?

8.1.1. Data Science Workflow : Data Collection and Storage->Data Preparation ->Exploration and Visualization ->Experimentation and Prediction

8.2. Applications of Data Science

8.2.1. Traditional Machine learning, Internet of Thing (IoT), Deep Learning

8.3. Data Science roles and Tools

8.3.1. Data Engineer

8.3.1.1. Data Collection and Storage

8.3.1.1.1. Tools : SQL,Java,Scala or Python,Shell,Cloud Computing

8.3.2. Data Analyst

8.3.2.1. Data preparation, Exploration & Visualization

8.3.2.1.1. Tools: SQL, Spreadsheets (Excel or Google Sheets ), BI Tools (Tableau,Power BI, Looker), may have Python or R

8.3.3. Data Scientist

8.3.3.1. Data Preparation, Exploration & Visualization, Experimentation & Prediction

8.3.3.1.1. Tools: SQL, Python and/or R, pandas

8.3.4. Machine Learning Scientist

8.3.4.1. Data preparation,Exploration & Visualization, Experimentation & Prediction

8.3.4.1.1. Tools: Python and/or R Machine Learning libraries Tensorflow Spark etc

8.4. Data Sources

8.4.1. Company Data

8.4.1.1. Web data

8.4.1.2. Survey data

8.4.2. Open Data

8.4.2.1. Data APIs

8.4.2.1.1. Public data API

8.4.2.2. Public records

8.4.2.2.1. International organizations eg.World Bank,UN,WTO National statistical offices eg.censuses,surveys Government agencies eg. weather,environment,population

8.5. Data types

8.5.1. Quantitative data

8.5.1.1. Numbers, can be measured

8.5.2. Qualitative data

8.5.2.1. Descriptions, can be observed but not measured

8.5.3. Other data types

8.5.3.1. Image data, Text data, Geospatial data, Network data

8.6. Data storage and retrieval

8.6.1. location

8.6.1.1. parallel storage solutions (on-premise cluster)

8.6.1.2. the cloud Azure,AWS,Google Cloud, IBM db2

8.6.2. data type

8.6.2.1. unstructured

8.6.2.1.1. Email,Text,Video and audio files, webpages, social media

8.6.2.2. document database

8.6.2.2.1. NoSQL

8.6.2.3. Relational database

8.6.2.3.1. SQL

8.7. Data Pipelines

8.7.1. ETL

8.7.1.1. Extract

8.7.1.1.1. Source: National Weather API (every 30 min), Twitter API (real-time stream),smart home thermostats,smart light bulbs,smart door locks,smart meter

8.7.1.2. Transform

8.7.1.2.1. Joining data sources into one data set converting data structures to fit database schemas removing irrelevant data

8.7.1.3. Load

8.7.1.3.1. load for data science part

8.8. Data preparation

8.8.1. Data cleaning

8.8.2. tidy data

8.8.3. remove duplicates

8.8.4. unique id

8.8.5. missing values

8.9. Exploratory Data Analysis

8.9.1. Exploring the data

8.9.2. Formulating hypothesis

8.9.3. Asssessing characteristics

8.9.4. Visualizing

8.10. A/B Testing

8.10.1. Experiments in data science

8.10.1.1. Form a question Form a hypothesis Collect data Test the hypothesis with a statistical test Interpret results

8.10.2. A/B Testing steps

8.10.2.1. Pickinf a metric to track Calculating sample size Running the experiment Checking for significance

8.11. Time series forecasting

8.12. Supervised machine learning

8.12.1. Supervised machine learning: Predictions from data with labels and features

8.13. Clustering

8.13.1. divides data into categories

9. Intermediate Python