1. Unit 1
1.1. Representation
1.1.1. Histogram
1.1.1.1. x: intervals
1.1.1.2. y: frequency
1.1.2. Box Plot
1.1.2.1. 5# summary (min, Q1, med, Q3, max)
1.1.3. Stem and Leaf
1.1.3.1. stem + leaf table
1.2. Analysis
1.2.1. center
1.2.1.1. mean
1.2.1.2. median
1.2.2. spread
1.2.2.1. IQR
1.2.2.2. Range
1.2.2.3. deviation (standard, population)
1.2.3. shape
1.2.3.1. symmetrical
1.2.3.2. skewed
1.2.3.3. uniform
1.2.3.4. single/double peak
1.2.4. Outliers
1.2.4.1. 1.5 x IQR
1.3. Interpretation
1.3.1. Low IQR/range/deviation → consistent
1.3.2. Stating a subjective opinion based on the analysis of the data
2. Unit 2
2.1. Representation
2.1.1. Scatterplot
2.1.2. LSRL
2.1.2.1. y=a+bx
2.2. Analysis
2.2.1. Strength
2.2.1.1. correlation coefficient (r)
2.2.1.2. R^2 (variability)
2.2.2. Outliers
2.2.3. Direction
2.2.3.1. slope
2.2.3.2. y intercept
2.2.4. Form
2.2.4.1. residual plot
2.3. Interpretation
2.3.1. r^2
2.3.1.1. put in percent
2.3.1.2. __% of the variability of the number of (y variable) can be explained by a linear relationship with the (x variable) the other __% can be explained by other factors such as _______
2.3.1.3. close to 1 = strong relationship
2.3.2. slope
2.3.2.1. For every increase of (x variable) we expect the (y variable) to increase/decrease by (slope)
2.3.3. y intercept
2.3.3.1. when (x variable) is 0, we expect the (y variable) to be _____. This does/doesn't make sense because ______.
2.3.4. r
2.3.4.1. the strength of the correlation is (strong/weak) because it is/isn't close to 1.
2.3.5. residual plots
2.3.5.1. Based on the residual plot, a linear form is/isn't appropriate because of the random scatter.
2.3.5.2. OUTLIERS