Is the EQ-i 2.0® a Fair Assessment?
For many years, people who use tests in the talent development industry have been asking “Is this a reliable test?” and “Is this a valid test?” when deciding which tests are best to use. Recently, this discussion has expanded to include the important question of “Is this a fair test?”. People want to ensure that the tests they use are built on a foundation of fairness and that the interpretation of scores are valid for all test-takers and free of bias. This goal means we want to make sure a person’s score on a test reflects the construct we are trying to measure (e.g., emotional intelligence) and not a reflection of characteristics of the person, such as their gender, race, ethnicity, or age.
Our analysis examined possible sources of measurement bias
We analyzed data from the Emotional Quotient-Inventory 2.0® (EQ-i 2.0®) North American General Population normative sample and looked at multiple indicators of fairness from a statistical perspective. Specifically, we aimed to look at possible sources of measurement bias by answering the following questions:
- Does the factor structure of the EQ-i 2.0 model (i.e., one total score, 5 composite scores, and 15 subscales scores) change depending on whether you are looking at different groups of people, such as older versus younger adults? If the structure doesn’t differ, this is called Measurement Invariance and indicates that the test measures the same construct in the same way regardless of demographic group membership.
- Do people who have the same amount of Emotional Intelligence end up with similar scores on subscales and rate items similarly, regardless of their gender, age, race, and ethnicity? This question is investigated with a technique called Differential Test Functioning. If groups do not differ in how the assessment operates or functions for them, then there’s no differential functioning and the test does not display measurement bias.
- Do average scores on the EQ-i 2.0 meaningfully differ depending on what group a person is a part of (e.g., male vs. female)? You might expect some group differences between different groups’ average (or mean) scores (e.g., you might expect women to score higher than men on some subscales). We want to make sure the test only reveals differences that have been researched and exist in the world and therefore are not a by-product of the test. If groups do not differ in unexpected ways, then the test is suitable for diverse groups and scores can be interpreted safely.
Our key findings
When looking at the EQ-i 2.0 and answering these three questions, we find that:
- Gender: The EQ-i 2.0 model does not differ by gender (i.e., comparing men and women; note that samples of individuals who identify as non-Binary were too small to permit analyses), and people who have the same EI level end up with similar scores, regardless of their gender. When comparing average scores for men and women (i.e., investigating that third question outlined above), we see that there are some small differences for some subscales. Still, these differences are small, except for the Empathy subscale, which displayed a moderately sized difference between scores for men and women. On average, women tended to score 8 points higher than men on the Empathy subscale; this finding is in line with previous research and only has a small impact on overall scores.
- Age: The EQ-i 2.0 model does not differ by age, and the EQ-i 2.0 test functions similarly for all ages. When comparing different age groups, we found that those over 30 years of age tended to score higher than individuals aged 18 to 29 years old, particularly in the areas of Independence and Problem Solving.
- Race/Ethnicity: The EQ-i 2.0 model does not differ by race/ethnicity (specifically, comparisons between Black and White, Hispanic, and White, and Asian and White individuals.) People who have the same EI level end up with similar scores, regardless of their race/ethnicity. Only small to moderate differences were found between groups when comparing average scores. Black and Hispanic individuals tended to score slightly higher than White individuals, typically scoring about 5-6 points higher on some subscales. In comparison, Asian individuals score about 3 points lower than White individuals for some subscales.
EQ-i 2.0 meets or exceeds the empirical fairness requirements
Our results suggest strong evidence that the EQ-i 2.0 meets or exceeds the empirical fairness requirements as outlined in the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014). Fairness is a critical topic to consider when choosing a test for any talent management initiative. These results suggest that scores on the EQ-i 2.0 are a true reflection of a person’s emotional intelligence level and not a reflection of demographic characteristics (e.g., gender, race/ethnicity, or age). MHS is committed to developing products and solutions that are fair and equitable, and having evidence that the EQ-i 2.0 test exceeds standards of fairness is a foundational step in helping us achieve this commitment.
Answering these questions about the statistical side of test bias is essential in making sure that a test is built to be as fair as possible. However, it is also important to remember that a fair test can still be administered or used in an unfair way. To ensure that you are using the EQ-i 2.0 in a fair way, make sure to follow the guidelines for administration in the EQ-i 2.0 User’s Manual and always consider how you are applying the results and whether there is the opportunity for bias to creep in.