Giving English Language Learners a Fair Shot at Vocabulary Testing

A child has earphones in and is using a tablet device.

Giving English Language Learners a Fair Shot at Vocabulary Testing

Fairness is at the core of the Ortiz Picture Vocabulary Acquisition Test™ (Ortiz PVAT™; Ortiz, 2018). Fairness, in the context of assessments, refers to a test’s capacity to provide reliable and valid scores for all relevant test-takers across all relevant subgroups. At MHS, we’re proud of our commitment to a Fairness Framework that guides our decisions as we develop our measures.

The inspiration for the Ortiz PVAT stemmed from Dr. Samuel O. Ortiz’s observation and experience that individuals with different language backgrounds often are mistakenly believed to have a Language Disorder, when in fact, it’s a matter of differences; they are new to English and are still acquiring that vocabulary. Existing vocabulary and language comprehension tests tend to compare a student’s scores to native English speakers. This comparison is unfair if the student hasn’t yet had the same opportunity to learn English. Given that the Ortiz PVAT was designed to measure the vocabulary acquisition of both native English speakers and English learners, fairness was both an overarching design consideration, as well as a fundamental goal for these populations.

The Ortiz PVAT is a test of receptive vocabulary (that is, the ability to comprehend but not necessarily produce, or express, language). The test presents a test-taker with four images on the screen, and they hear a word. Their task is to select the image that best matches the word. In the sample question below, test-takers would hear the word “spherical” spoken when these images appear on screen.

Figure 1. Sample Ortiz PVAT Item

A sample question from the Ortiz PVAT displaying four images of objects. Only one object can be seen as "spherical."

The Ortiz PVAT demonstrated fairness evidence in multiple ways, and some of the highlights are described here. First, we’ll review the design considerations, and second, we’ll examine the psychometric results.

How was the Ortiz PVAT designed to maximize fairness?

  1. Flexible ways of responding. Test-takers can indicate they know the correct response through a variety of methods. They can touch the screen (if using a touchscreen device, like a tablet), they can click a mouse (if using a desktop or laptop computer), they can point to the image, and the test administrator can make the selection for them, or they can say the number of response option they’d like the administrator to select. A verbal response (expressive language ability) is not required to complete the Ortiz PVAT.
  2. Culturally sensitive item content. All target words and images on the test underwent A thorough review from a panel of experts. Culturally and linguistically diverse subject-matter experts reviewed the test questions for clarity and appropriateness, with extra attention paid to the potential for bias. Images were removed or revised if experts identified concerns (e.g., the image was too specific to a region or culture).
  3. Standardized presentation. All target words are read aloud by a voice actor, carefully selected for clarity and annunciation. The test administrator does not have to read stimuli aloud to the test-takers; instead, all test-takers will hear all words spoken by the same voice actor, ensuring the test experience is standardized and therefore, comparable, and equitable across students.
  4. Dynamic and custom administration. The Ortiz PVAT has a set of screening items at the beginning of all test sessions that helps determine the appropriate starting point for each test-taker. Additionally, the test has a ceiling rule and will automatically end the test session when it has been determined that a test-taker has reached their limit. Through these methods, test-takers are treated fairly during the testing process, as it reduces the burden on a test-taker and ensures an efficient test experience that is customized to their unique needs.
  5. Dual norms. The Ortiz PVAT offers two norms: (1) English Speaker norms, and (2) English Learner norms. Test-taker scores are calculated in comparison to same-age peers, as well as in comparison to peers with the same amount of exposure to English. This unique feature enables individuals with limited exposure to be compared against other individuals with the same amount of exposure, such that their scores reflect a fair comparison to true peers, rather than mistakenly comparing individuals with limited exposure to native speakers.

The development of dual norms is not merely a concession to the idea that English learners require their own true peer group but is also intended to recognize that English Learner norms are inherently different. The construction of dual norms for the Ortiz PVAT enables non-discriminatory evaluation of vocabulary acquisition in both native English speakers and English learners alike. This feature is a significant contribution to fairness in assessment.

Do some groups score better on the Ortiz PVAT?

The Ortiz PVAT underwent psychometric investigation to ensure it met the standards for psychological and educational tests in terms of fairness. Here, we’ll discuss the test’s ability to generalize across various groups.

  • No differences were found when comparing male and female test-takers. Average scores were within 1 point of each other, and the way the test questions performed did not vary by gender.
  • No differences were found when comparing racial/ethnic groups. Average scores for White, Black, and Hispanic test-takers were within 1 point of one another, and the way the test questions performed did not vary by race/ethnicity. The likelihood that a test-taker would score in the Average range (90 or greater) was nearly equivalent across racial/ethnic groups.
  • No differences were found when comparing groups based on the language they speak at home. In the graph below, scores from the English Learner normative sample are presented for four groups with varied language backgrounds: (1) Spanish & Spanish Creole, (2) Indo-European languages, (3) Asian & Pacific Islander languages, and (4) Other languages. Differences in scores between these groups were minimal; all groups scored very close to the overall average (100), and the differences from one group to another were within 3 points. The scores are not statistically significantly different, and the size of the difference overall is negligible. The similarity of the scores reinforces that the Ortiz PVAT is a measure of English receptive vocabulary, and one’s native language does not influence it, as the acquisition of English follows a similar pattern for all learners.

Figure 2. Graph of Ortiz PVAT Scores

A bar graph displaying scores of Ortiz PVAT test-takers, comparing groups based on languages spoken at home.

Results from several analyses provide support for the generalizability of Ortiz PVAT scores. No evidence of test bias was found based on gender, race/ethnicity (for the English Speaker sample), or language spoken (for the English Learner sample). Together, these results provide strong evidence for the fairness of the Ortiz PVAT.

Learn more about MHS’ Ortiz PVAT.

References

Ortiz, S. (2018). Ortiz Picture Vocabulary Acquisition Test. Multi-Health Systems Inc.

Share this post