Show full map

7. PreProcessing - CLEANING

Other

Profesor Magallanes

Get Started. It's Free

or sign up with your email address

Similar Mind Maps Mind Map Outline

7. PreProcessing - CLEANING by Profesor Magallanes Mind Map: 7. PreProcessing - CLEANING

1. Planing

1.1. See data first

1.1.1. head

1.1.2. tail

1.2. Check data dictionary if available

1.3. Familiarity with the data?

1.3.1. Best guess on source of messiness

1.3.1.1. Human

1.3.1.1.1. mistyping

1.3.1.1.2. language

1.3.1.1.3. lack of standards

1.3.1.2. Computer

1.3.1.2.1. regional configuration

1.3.1.2.2. miscalibration of censors

1.3.1.2.3. misuse of functions (defaults)

2. Pay attention

2.1. Identifier uniqueness

2.1.1. simple

2.1.2. composite

2.2. Column names

2.2.1. need to shrink?

2.2.2. spaces between words?

2.2.2.1. no spaces

2.2.2.2. another character

2.2.3. leading and trailing spaces?

2.3. Cell values

2.3.1. text

2.3.1.1. leading and trailing spaces?

2.3.1.2. characters beyond alphabet?

2.3.2. categories

2.3.2.1. ALWAYS verify with frequency table

2.3.2.1.1. possible mistypings

2.3.2.2. the representation of missing values

2.3.3. numbers

2.3.3.1. presence of characters different than numbers due to number format

2.3.3.1.1. currency?

2.3.3.1.2. units of measurement?

2.3.3.2. leading and trailing spaces when read as text

2.3.3.3. the representation of dates

2.3.3.4. the representation of missing values

3. common operations

3.1. subsetting / filtering / skipping

3.2. basic exploration

3.2.1. look for characters that contaminate the real interpretation of the value

3.2.1.1. number

3.2.1.1.1. see if something different than a number or a dot is present in the value

3.2.1.2. text

3.2.1.2.1. see if something different your alphabet is present

3.3. ad-hoc programming

3.3.1. replace

3.3.2. extract

3.3.3. split

3.3.4. strip / trim

3.3.5. using regular expressions

or Sign Up