9.8 Validity in Selection Testing

Melanie Hapke

9.8 Validity in Selection Testing

Validity

Validity is the extent to which the scores from a measure represent the characteristic they are intended to measure. A measure can be highly reliable but has no validity whatsoever. As an oversimplified example, imagine someone who believes that index finger length reflects self-esteem and thus tries to measure self-esteem by holding a ruler up to the index finger. Although this measure would have excellent test-retest reliability, it would have no validity. The fact that one person’s index finger is a centimetre longer than another’s would indicate nothing about which one had higher self-esteem. Simply stated, a test is valid if it measures what it is supposed to measure.

Types of validity

Here, we consider three primary kinds: face validity, content validity, and criterion validity.

Face Validity

Face validity is the extent to which test takers view the content of the test as appropriate for its intended purpose.

Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. Accordingly, a questionnaire that included these items would have good face validity. On the other hand, the finger-length method of measuring self-esteem seems to have nothing to do with self-esteem and, therefore, has poor face validity. Face validity is fragile evidence that a measurement method is measuring what it is supposed to, as it is based on the opinions of the test taker and not those of experts.

In content validity, which we discuss next, subject matter experts judge the content of the test as appropriate for its intended purpose.

Content Validity

Content validity is the extent to which a measure “covers” the characteristic of interest.  Is the test fully representative of what it aims to measure?

Imagine you’re taking a final exam in a history class. For the exam to have high content validity, it should include questions about all the essential topics covered in the course, not just one chapter. Think of content validity as a buffet. A buffet with high content validity would offer a variety of dishes that represent all the major food groups, not just desserts.

Content validity answers the question, does the test cover all the relevant parts of the subject or skill it’s supposed to measure? Test items should be appropriate to and measure directly the essential requirements and qualifications for the job.

Construct Validity

Construct validity ensures that a test measures what it’s supposed to measure. It ensures that the test truly assesses the concept or trait it claims to assess and that this characteristic is essential to successful job performance.

Suppose you have a test designed to measure creativity. To have high construct validity, the tasks or questions should assess creativity, not something else like memory or vocabulary.

A concept or construct is what you want to measure. It could be intelligence, motivation, leadership ability or job-specific skills. So, if you have a test to measure creativity, the concept (or construct) is creativity.

Criterion Validity

Criterion validity measures how well a test predicts or matches real-life success, such as job performance or academic success. It shows whether the test is useful for making accurate decisions about people.

Criterion validity is the extent to which scores on a measure correlate with other variables (known as criteria) to which one would expect them to be correlated.

Employees’ scores on a new sales aptitude test should positively correlate with their monthly sales figures. If the scores positively correlate with their sales performance, then this would prove that they accurately represent people’s sales aptitudes. But if it were found that people scored equally well on the sales aptitude test regardless of their monthly sales figures, then this would cast doubt on the measure’s validity.

A criterion can be any variable one has reason to think should be correlated with the measured construct, and there will usually be many of them. For example, one would expect sales aptitude test scores to be positively correlated with high customer satisfaction ratings and the number of new clients acquired and negatively correlated with low-performance reviews by supervisors.

Concurrent Validity

Concurrent validity determines how well a test correlates with current outcomes or performances. For example, if employees who score high on a sales aptitude test are already high-performing sales representatives, the test has high concurrent validity.

Suppose applicants who scored high on the sales aptitude test become high-performing sales representatives later. In that case, the test has high predictive validity (because scores on the measure have “predicted” a future outcome or performance).

High criterion validity gives employers confidence that the test results can be trusted and used to make better hiring decisions.

“Reliability and Validity of Measurement” from Research Methods in Psychology – 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.—Modifications: Used section Validity, edited, summarized; Changed Content validity & Criterion validity examples; Added Construct validity and example for Concurrent validity; Removed Discriminant validity.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License