# DataScience

mind

Get Started. It's Free
DataScience

## 3. 2.Data Collection

### 3.1. Primary Research

3.1.1. Organizational Documents

### 3.2. Secondary Research

3.2.1. Search through Internet

## 4. Data Types

### 4.1. Structured

4.1.1. Numerical

4.1.1.1. Continuous (Qualitative)

4.1.1.1.1. Interval

4.1.1.1.2. Ratio

4.1.1.2. Descrete (Quantitative)

4.1.1.2.1. Count

4.1.1.2.2. cant represented with decimals

4.1.2. Catagorical

4.1.2.1. Binary

4.1.2.1.1. Which have Only two values

4.1.2.1.2. Ex : True or False, Right or Wrong, Default or not, Yes or not etc .

4.1.2.2. > 2 Catagories

4.1.2.2.1. Having more than two values

4.1.2.2.2. Ordinal

4.1.2.2.3. Nominal

### 4.2. Un Structured

4.2.1. Multimedia files etc which doesn't have any structure .

4.2.2. We need to give a structure to this Data

## 5. EDA(Exploratory Data Analysis

### 5.1. 4 - Moments of Business Decisions

5.1.1. 1st Moment

5.1.1.1. Measures of Central Tendencies

5.1.1.1.1. Mean

5.1.1.1.2. Median

5.1.1.1.3. Mode

5.1.2. 2nd Moment

5.1.2.1. Dispersion of Data

5.1.2.1.1. Variance

5.1.2.1.2. Standard deviation

5.1.2.1.3. Range

5.1.3. 3rd Moment

5.1.3.1. Skewness

5.1.3.1.1. Asymmetry in Probability Distribution

5.1.3.1.2. Positive/Right skewed - Longer tale on the right side

5.1.3.1.3. Negative/Left skewed - Longer tale on the left side

5.1.3.1.4. If both tales are equal it will be a normal distribution hence (skewness value = 0)

5.1.4. 4th Moment

5.1.4.1. Kurtosis

5.1.4.1.1. Sharper / Heavier tales - Positive kurtosis

5.1.4.1.2. Broader / Lighter tales - Negative kurtosis

### 5.2. Graphical Representations

5.2.1. Barplot

5.2.1.1. No business inferences can be drawn

5.2.2. Histogram

5.2.2.1. Shape of Probability Distribution

5.2.2.2. Normal Perfect Bell shaped curve, symmetric on both sides of central tendencies

5.2.3. Boxplot

5.2.3.1. Identify Outliers

5.2.3.2. Lower Extreme - Min value after removing the Outliers.

5.2.3.3. Lower Quartile - Q1

5.2.3.4. Median

5.2.3.5. Upper Quartile - Q3

5.2.3.6. Upper extreme - Max value after removing outliers.

5.2.3.7. (IQR = Q3 - Q1) Middle most 50% of the data

5.2.3.8. Upper Fence = Q3+1.5(IQR)

5.2.3.9. Lower Fence = Q1-1.5(IQR)

5.2.3.10. UF = Q3 + 3(IQR)

5.2.3.11. LF = Q1 - 3(IQR)

## 8. Probability Distribution

### 8.2. Continuous PD

8.2.1. Smooth curve

8.3.1. Bars