7. PreProcessing - CLEANING - example

Get Started. It's Free
or sign up with your email address
7. PreProcessing - CLEANING - example by Mind Map: 7. PreProcessing - CLEANING - example

1. get the Data!

1.1. path

1.1.1. R:

1.1.2. Python

1.2. Call

1.2.1. Python

1.2.2. R:

2. See data first

2.1. R

2.1.1. use head / tail (maybe with more rows than the default)

2.2. Pandas

2.2.1. You can use head/tail too, but pandas shows both by default when printing a data frame

3. clean data

3.1. bad

3.2. recover column names

3.2.1. Python

3.2.1.1. result: **better1**

3.2.2. R

3.3. subset the data to drop unneeded columns

3.3.1. filtering

3.3.1.1. new columns have issues

3.3.1.1.1. Python

3.3.1.1.2. R

3.3.2. filtering R

3.4. Clean column names

3.4.1. currently

3.4.2. **FIRST**: No text in parentheses

3.4.2.1. Python

3.4.2.2. R

3.4.3. **Option 1**: Underscores instead of _blank spaces_.

3.4.3.1. Python

3.4.3.2. R

3.4.4. **Option 2**: Using CamelCase.

3.4.4.1. Python

3.4.4.2. R

3.4.5. **Option 3**: Shorthening using Acronyms

3.4.5.1. Python

3.4.5.1.1. final result: **better3**

3.4.5.2. R

3.5. Fix Data Contents

3.5.1. Missing Values are useful

3.5.1.1. Explore missingness

3.5.1.1.1. Python

3.5.1.1.2. R

3.5.1.2. Make decisions after exploration

3.5.1.2.1. 1) Bye ROWS with no data in variables (the ones in acronyms)

3.5.1.2.2. 2) Drop rows where the **ID** is missing (_Country_)

3.5.1.2.3. 3) Keep rows with some important values (for this case)

3.5.2. Common / preventive

3.5.2.1. bye Leading and Trailing spaces

3.5.2.1.1. R

3.5.2.2. check if numeric values need cleaning

3.5.2.2.1. R

3.5.2.3. reset indexes

3.5.2.3.1. R