马上开始. 它是免费的哦
注册 使用您的电邮地址
4. Cleaning data 作者: Mind Map: 4. Cleaning data

1. Source of messiness

1.1. Human

1.1.1. mistyping

1.1.2. language

1.1.3. lack of standards

1.2. computer

1.2.1. regional configuration

1.2.2. miscalibration of censors

1.2.3. misuse of functions (defaults)

2. Planing

2.1. See data first

2.2. Keep columns needed

3. Pay attention

3.1. Identifier uniqueness

3.2. Column names

3.2.1. many characters?

3.2.2. weird symbols

3.2.3. spaces between words

3.2.4. leading and trailing spaces

3.3. Cell values

3.3.1. text

3.3.1.1. same as column names

3.3.2. categories

3.3.2.1. same as column names

3.3.2.2. ALWAYS verify with frequency table

3.3.2.3. the representation of missing values

3.3.3. numbers

3.3.3.1. presence of characters different than numbers due to number format

3.3.3.2. the representation of missing values

4. CODE