1. get the Data!
1.1. path
1.1.1. R:
1.1.2. Python
1.2. Call
1.2.1. Python
1.2.2. R:
2. See data first
2.1. R
2.1.1. use head / tail (maybe with more rows than the default)
2.2. Pandas
2.2.1. You can use head/tail too, but pandas shows both by default when printing a data frame
3. clean data
3.1. bad
3.2. recover column names
3.2.1. Python
3.2.1.1. result: **better1**
3.2.2. R
3.3. subset the data to drop unneeded columns
3.3.1. filtering
3.3.1.1. new columns have issues
3.3.1.1.1. Python
3.3.1.1.2. R
3.3.2. filtering R
3.4. Clean column names
3.4.1. currently
3.4.2. **FIRST**: No text in parentheses
3.4.2.1. Python
3.4.2.2. R
3.4.3. **Option 1**: Underscores instead of _blank spaces_.
3.4.3.1. Python
3.4.3.2. R
3.4.4. **Option 2**: Using CamelCase.
3.4.4.1. Python
3.4.4.2. R
3.4.5. **Option 3**: Shorthening using Acronyms
3.4.5.1. Python
3.4.5.1.1. final result: **better3**
3.4.5.2. R
3.5. Fix Data Contents
3.5.1. Missing Values are useful
3.5.1.1. Explore missingness
3.5.1.1.1. Python
3.5.1.1.2. R
3.5.1.2. Make decisions after exploration
3.5.1.2.1. 1) Bye ROWS with no data in variables (the ones in acronyms)
3.5.1.2.2. 2) Drop rows where the **ID** is missing (_Country_)
3.5.1.2.3. 3) Keep rows with some important values (for this case)
3.5.2. Common / preventive
3.5.2.1. bye Leading and Trailing spaces
3.5.2.1.1. R
3.5.2.2. check if numeric values need cleaning
3.5.2.2.1. R
3.5.2.3. reset indexes
3.5.2.3.1. R