Validity and Reliability

Get Started. It's Free
or sign up with your email address
Validity and Reliability by Mind Map: Validity and Reliability

1. Validity: Does the test measure what it is supposed to measure?

1.1. What is "validity evidence"? Validity evidence is just that-- evidence that demonstrates that a test measures what it says it measures.

1.2. Types of Validity Evidence

1.2.1. Content Validity Evidence

1.2.1.1. Content validity evidence is established by inspecting test questions to see whether they correspond to what the user decides should be covered by the test.

1.2.1.2. Problem associated with this type of validity evidence.

1.2.1.2.1. Content validity evidence is difficult if concept being tested is a personality or aptitude trait, because it can be hard to figure out what a relevant question would be.

1.2.1.2.2. It provides information about whether the test looks valid, but sometimes a test can look valid but measure something completely different from what is intended.

1.2.2. Criterion-Related Validity Evidence

1.2.2.1. Criterion-related validity evidence refers to scores from a test which are correlated with an external criterion. There are two types: concurrent criterion-related validity evidence, and predictive validity evidence.

1.2.2.1.1. Concurrent criterion-related validity evidence refers to assessments that can be administered at the same time as the measure to be validated.

1.2.2.1.2. Predictive validity evidence deals with how well the test predicts future behavior of the examinees (particularly useful for aptitude tests).

1.2.2.2. Problem associated with this type of validity evidence.

1.2.2.2.1. Criterion-related validity evidence assumes that there is some kind of external criterion by which an educator can anchor or validate the test.

1.2.3. Construct Validity Evidence

1.2.3.1. Construct validity evidence refers to demonstrating the validity of a test based on whether its relationship to other information corresponds well with a theory.

1.2.3.2. Problem associated with this type of validity evidence.

1.2.3.2.1. This type of evidence doesn't allow one to compare second measures to the measure being assessed for validity. It also doesn't allow one to measure future behavior.

2. Reliability: Does the test yield the same or similar score rankings (all other factors being equal) consistently?

2.1. In other words, reliability refers to the consistency with which a test yields same rank for individuals who take the test more than once.

2.2. Types of Reliability Methods

2.2.1. Test-Retest (or Stability)

2.2.1.1. Test-Retest is a method of estimating reliability that is exactly what it sounds like. An assessment is given twice, and the correlation between the first and second set of scores is determined.

2.2.1.2. Problem with this method

2.2.1.2.1. Usually memory or experience is involved the second time a test is taken, which means a student may score differently because the student will have changed in some way (e.g., student has looked up answers missed in the first test; student has forgotten correct answers from the first test).

2.2.2. Alternative Form

2.2.2.1. Alternative form refers to two equivalent forms of a test; forms can be used to obtain an estimate of reliability of scores from test (correlation between two score sets is determined).

2.2.2.2. Problem with this method

2.2.2.2.1. It takes a lot of time and effort to create one good test, let alone two!

2.2.3. Internal Consistency

2.2.3.1. If the test is designed to measure a single, basic concept, it's reasonable to assume that people who get one item right will be more likely to get other similar items right. In other words, items out to be correlated with each other; test ought to be internally consistent.

2.2.3.2. Problem with this method

2.2.3.2.1. This method should only be used if the whole test consists of similar items assessing a single concept.

3. Validity and reliability are vital components of learning and assessment, as they provide the methods and tools necessary to vet various forms of measurement. Vetting these various forms of measurement helps to ensure that the tests are measuring what they are supposed to be measuring, and that they consistently yield the same (or similar) scoring (assuming other factors are all equal).