Lancez-Vous. C'est gratuit
ou s'inscrire avec votre adresse e-mail
FORMATTING par Mind Map: FORMATTING

1. You are **assuming** data is clean

1.1. See DATA TYPES

1.1.1. R: **str()**

1.1.2. Python: **.info()**

2. Format numeric data

2.1. R: **as.numeric()**

2.1.1. If NAs created, STOP, Explore and CLEAN

2.2. Python: **pd.to_numeric()**

2.2.1. always use RAISE

2.2.1.1. If NAs created, STOP, Explore and CLEAN

2.3. when numeric values are clean...

2.3.1. formatting numeric data is the easiest!

3. Format dates

3.1. avoid date inference

3.2. Be aware of the date/time symbols

3.2.1. Year

3.2.1.1. %y (24)

3.2.1.2. %Y (2024)

3.2.2. Month

3.2.2.1. %m (00-12)

3.2.2.2. %b (Jan, Dec)

3.2.2.3. %B (January, December)

3.2.3. Day

3.2.3.1. %d (01-31)

3.2.3.2. %a (Mon, Tue)

3.2.3.3. %A (Monday, Tuesday)

4. Format text

4.1. decide

4.1.1. capitalization

4.1.2. normalization

4.1.2.1. more advanced topic

4.2. columns are a particular case

4.2.1. simplicity

4.2.1.1. easy to reference

4.2.1.2. avoid your own mistypings

5. Format categorical data

5.1. Nominal

5.1.1. you could keep them as they come

5.1.1.1. or just change the data type

5.1.2. never nominal as ordinal

5.2. Ordinal

5.2.1. verify ordering in categories

5.2.2. Homogenize range of ordinal levels

5.2.2.1. same min

5.2.2.2. same max

5.2.3. recoding

5.2.3.1. integers in a column

5.2.3.2. levels as labels

5.3. This may be complicated to export to be used in a different program