9.7 Reliability in Selection Testing
Both reliability and validity have been discussed in this book, but this chapter will take a deeper look at each. A hiring manager cannot use a test in the hiring process without first confirming the reliability and validity of the test. Tests used must be consistent and measure what they are intended to measure. Professionally developed tests should come with reports on validity evidence, including detailed explanations of how validation studies were conducted.
Reliability
Reliability refers to how dependably or consistently a test measures a characteristic. It means that we expect a test to provide approximately the same information each time it is given to the same person. Think of it as the test’s dependability or consistency. If you took a multiple choice test on this chapter, it would be expected that your score would be approximately the same if you were to redo the test, provided you did not do any additional studying—the smaller the difference between the scores, the more reliable the test.
There are two ways to estimate a test’s reliability:
- Consistency over time (test-retest reliability)
- Consistency among different raters (inter-rater reliability).
Test-retest reliability: If you take the same test multiple times under the same conditions, you should get similar scores each time.
Intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be knowledgeable next week. This means that any reasonable measure of intelligence should produce roughly the identical scores for this individual next week as today.
Assessing test-retest reliability requires using the test on a group of people at one time, using it again on the same group of people later, and then looking at the test-retest correlation between the two sets of scores. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.
Inter-rater reliability: Many behavioural measures involve significant judgment by an observer or a rater. Inter-rater reliability is how different raters are consistent in their decisions. If other people are scoring the test, they should give similar scores. This is called inter-rater reliability.
Example
Imagine you and a friend are grading an essay. If you both give similar scores, the grading method is reliable. Reliable tests give both employers and candidates confidence in the selection process. If you know a test is trustworthy, you can be confident the score accurately reflects the candidate’s abilities, and it ensures all candidates are evaluated consistently.
“Reliability and Validity of Measurement” from Research Methods in Psychology – 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.—Modifications: Used Test-retest reliability, edited, summarized; Used first two sentences of Interrater reliability, added example.