1. Content Validity Evidence is established by comparing test items with instructional objectives to determine whether the items match or measure the objectives.
2. Criterion Related Validity Evidence is used to predict future or current performance - it correlates test results with another criterion of interest.
2.1. New node
3. Concurrent Criterion related validity Evidence refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. Returning to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.
4. Predictive Validity Evidence fers to the degree to which the operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future. Again, with the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.
5. Construct Validity Evidence refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. Returning to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.
6. Validity of assessment has to do with how closely the test measures what it was meant to measure. If a test does not show evidence of validity then it is useless. This evidence of validity can be discussed in three categories:
7. Test- Restart or Stability is a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time.
8. Alternate Forms or Equivalence Like the retest method, this method also requires two testings with the same people. However, the same test is not given each time. Each of the two tests must be designed to measure the same thing and should not differ in any systematic way. One way to help ensure this is to use random procedures to select items for the different tests. The alternative form method is viewed as superior to the retest method because a respondent’s memory of test items is not as likely to play a role in the data received. One drawback of this method is the practical difficulty in developing test items that are consistent in the measurement of a specific phenomenon
9. Internal Consistency is a measure of reliability used to evaluate the degree to which different test items that probe the same construct produce similar results.
10. Split-Half methods This method is more practical in that it does not require two administrations of the same or an alternative form test. In the split-halves method, the total number of items is divided into halves, and a correlation taken between the two halves. This correlation only estimates the reliability of each half of the test. It is necessary then to use a statistical correction to estimate the reliability of the whole test. This correction is known as the Spearman-Brown prophecy formula (Carmines & Zeller, 1979
11. Kuder- Richardson Methods determine the extent to which the entire test represents a single, fairly consistent measure of a concept.
12. Reliability refers to the stability of a test score over repeated administrations. A reliable test will yield stable scores over repeated administrations, assuming the trait being measured has not changed.