Do the Level of Service tools provide long-term insights?Eva Ribbers
Predictive validity refers to an assessment’s ability to accurately predict an outcome. In the context of risk/needs assessments, accuracy is determined through a validation process. Validation tests if an assessment’s estimated risk/needs score for an individual corresponds to actual behavior or outcomes. In order to test the prediction, meaningful outcome data must be available (see Bureau of Justice Assistance for a further explanation of the concept of risk validation). A common way to assess the predictive validity of risk/needs assessment tools is to look at recidivism data (that is, an individual’s likelihood to re-offend), often within 1 to 2 years post-conviction.
What do we know about the predictive validity of the Level of Service tools beyond this relatively short follow-up period? It might be more difficult to predict long-term behavior, and additionally, it might take a while for recidivism (e.g., reconviction) to be reflected in official records. Below, we’ll examine the Level of Service tools’ ability to predict long-term outcomes across geographical regions, demographic groups, unique populations, and types of offending. But first, we’ll provide a brief introduction to the tools and relevant definitions.
What are the Level of Service Tools?
The Level of Service tools encompass various generations of risk/needs assessment tools. Throughout this blog post, we’ll reference the first revision of the original Level of Supervision Inventory (LSI1); the Level of Service Inventory-Revised (LSI-R2); its successor, the Level of Service/Case Management Inventory (LS/CMI3); and its pilot version, the Level of Service Inventory-Ontario Revision (LSI-OR4). The Level of Service tools all assess at least a composite of the “central eight” factors that are the most closely related to the occurrence of criminal activity5, and some of the tools include additional sections that assess (non-)criminogenic needs, responsivity, and other considerations that may influence how an individual’s case is managed. The general risk/needs score that results from the assessment of those “central eight” domains is classified from “Very Low” to “Very High” and these scores are used to predict future offending in the validation process.
Defining Recidivism, follow-up periods and types of convictions
Not all researchers operationalize recidivism in the same way. However, in general, recidivism can be defined as any identified reconviction during a specific follow-up period6.
For those on probation, the follow-up period often begins as soon as the probation sentence starts, whereas, for those in custody, the follow-up period begins as soon as someone is released from a detention center7. In other words, the follow-up period tends to begin as soon as someone has the opportunity to re-offend in the community8.
Many researchers further differentiate between convictions for different types of offenses, such as general offending, violent offending, and sexual offending. Other researchers rank offense severity. Additionally, some research assessed predictive validity at different times (e.g., across two different follow-up periods). We will describe the findings of various studies in general terms below, and you can consult the original research to understand the nuances within a given study.
Long-Term Predictive Validity of the Level of Service tools
Around the World (USA, Canada, Germany)
In Ontario, Canada, a four-year follow-up study included 26,450 individuals who had been assessed with the LSI-OR and were either released or began a term of probation in 2004. The study found that higher LSI-OR scores were related to higher offense severity for all sentence types and for all individuals regardless of ethnicity and gender9. Similarly, a four-year follow-up study with 154 16 to 18-year-old White male offenders from Northern Ontario found a significant relationship between the LSI-OR total score and general and violent recidivism10. In other words, those who reoffended with the four-year period had significantly higher LSI-OR total scores than those who did not reoffend.
Internationally, the tools offer similarly strong ability to predict who will re-offend. For example, in Germany, Dahle11 examined recidivism across short (two-year), medium (five-year), and long (ten-year) periods since release for 307 individuals who were imprisoned in 1976 in former West Berlin. They found that the LSI-R predictive accuracy was comparable to international findings, despite the vastly different political situation at that time.
Additionally, research on reassessment in the United States has demonstrated that a reduction in risk score leads to reduced recidivism. Vose and colleagues12 conducted a study with 2,849 individuals on probation or parole in Iowa who had been administered the LSI-R on at least two occasions during a five-year follow-up period (on average, one year passed between the offender’s initial assessment and reassessment). In general, the higher the risk score, the more likely a parolee or probationer was to re-offend. They also noticed that change in risk level from the initial assessment to the reassessment was correlated with recidivism. In other words, the LSI-R was not only found to be an effective and accurate predictor of recidivism, but the study also demonstrated that a reduction in the total risk/needs score resulted in lower rates of recidivism, whilst an increase in the risk score resulted in higher rates of recidivism.
Reassessment over time is critical because dynamic factors have the potential for change. While an initial assessment helps identify what criminogenic needs should be addressed, if someone is placed in a program that addresses those needs, one could expect a reduction in risk. A reassessment can then determine if there has indeed been a change in risk, and if so, the person’s treatment and supervision services may need to be amended.
Special Populations (Forensic Inpatient Populations)
The predictive validity of the LSI-R has been replicated in other settings and populations as well. For example, the LSI-R had good predictive validity across an average follow-up period of eight years for 525 female forensic inpatients in Germany who had been discharged from forensic psychiatric care13. Additionally, the likelihood of reoffending decreased with higher age and longer treatment duration. Interestingly, the inclusion of additional gender-responsive risk factors (i.e., parental stress, intimate relationship dysfunction, economic marginalization, mental health issues, and experience of violence during adulthood) only improved classification accuracy by 2.2%. Because these gender-responsive factors did not meaningfully improve accuracy, such factors might have more implications for clinical practice (e.g., identifying suitable treatment options) than for the initial risk/needs assessment.
Specific Demographics (Indigenous Individuals, Ethnic Minority Groups, Females)
The use of a general risk/needs assessment tool is often recommended due to ease of training, consistency, comparability of results, cost-efficiency, and simplicity or resource limitations of the governing body. However, some researchers have expressed concerns that common risk/needs assessment instruments may not adequately measure risk or provide appropriate intervention targets for subpopulations who have not been sufficiently considered in the construction of the tool14. To ensure a fair and unbiased assessment while maintaining the benefits listed above, it is critical that general risk/need assessment tools are suitable for as many subgroups as possible.
Wormith and colleagues14 compared the predictive validity of the LS/CMI across a sample of 1,692 Indigenous individuals and 24,758 non-Indigenous individuals convicted in Ontario, Canada, over an average follow-up period of four years. The LS/CMI general risk/needs score was highly correlated with general recidivism in both the Indigenous and the Non-Indigenous offender samples. Individuals serving a custodial sentence were more likely to re-offend than those on probation or a conditional sentence. Notably, in this study, the assessment was equally effective at predicting general recidivism between the two groups, while violent recidivism differed. Although significant for both, the LS/CMI was more accurate at predicting violent re-offenses for the non-Indigenous group compared to the Indigenous group.
Ethnic Minority Groups
In the United States, Jiminez and colleagues15 conducted a five-and-a-half-year follow-up study based on 19,344 individuals on probation who had been charged with a felony or serious misdemeanor in Nebraska and found that the LS/CMI total risk score demonstrated moderate predictive validity. The authors grouped those probationers who self-identified as White European Americans of non-Hispanic descent into a category called ‘nonminority status’ and probationers who self-identified as not White Europeans (Black African Americans, Asian Americans, Native Americans, or White but of Hispanic descent) into a category called ‘minority status’ to test if prediction accuracy would differ. However, their study revealed there was no significant difference in the predictive accuracy of the LS/CMI total risk/needs score between the minority and nonminority status groups. The authors concluded that the predictive validity of the LS/CMI resembled findings from other jurisdictions in the United States, and, additionally, the LS/CMI predicted risk equally well for men and women.
A large body of research has examined whether risk/needs scores are equally predictive for men and women, and findings have generally supported using traditional risk/needs instruments with females. In a Canadian study with a sample of 101 females and a post-release follow-up period of seven years, both the LSI-R and the LS/CMI were found to perform well, with predictive validity replicated across several outcome measures such as prevalence and frequency of various types of offending, offense severity, and sentence length16. Similarly, in a study with a sample of 136 Canadian males and females supervised in the community, which simplified the risk/needs level of the LS/CMI into three categories (Low, Moderate and High), researchers were able to predict general recidivism of both males and females very well over a three-to-four-year period; those with higher risk scores reoffended faster and at a higher rate17. This trend was especially true for women. Interestingly, reassessment was associated with a lower risk of general recidivism, which indicates that reassessment offers a good opportunity to improve risk/needs management based on changes in a person’s situation. Additionally, general recidivism was lower in cases with stronger adherence to the Risk/Need/Responsivity (RNR) model, demonstrating the importance of adherence to these principles for enhanced client outcomes.
Similarly, in New Jersey, a three-year follow-up study with a group of 450 female and 450 male parolees demonstrated that the LSI-R predicted recidivism significantly above chance level for both the entire sample combined, as well as for the separate gender groups, with higher risk LSI-R classification predicting higher proportions of each group experiencing a re-arrest, reconviction, or technical parole violation18. The authors of this study expressed some concern with the overall strength of the relationship between LSI-R scores and recidivism and speculated that this relationship might be a result of local practices, as parole officers did not participate in the assessment process themselves and lacked access to complete and up-to-date LSI-R information, so results were not used to gear clients towards rehabilitative resources. The authors fittingly recommended that parole officers should be able to use an instrument such as the LS/CMI to better consider dynamic needs and responsivity during supervision18.
Type of Offending (General vs. Violent vs. Sexual Recidivism)
A common argument against the use of general risk/need assessments for sexual offenses is that they do not measure sexual deviance, which is an important predictor of recidivism. Further, risk/needs assessments tend to focus on criminogenic needs factors that may not be relevant to sex offenses (criminal attitudes). Some research has shown that those who have sexually offended have lower average scores on risk/needs assessment tools compared to the general population. However, as recidivism rates are found to be lower for those who have sexually offended, those risk assessment scores might just be in sync with the outcome19. In a four-and-a-half-year follow-up study, Wormith and colleagues19 demonstrated that the LS/CMI predicts general recidivism of individuals who have sexually offended with the same accuracy as violent and sexual recidivism and similarly to the general population.
The Level of Service Tools have strong evidence of long-term predictive validity
Research has generally demonstrated strong long-term predictive validity for the Level of Service tools’ risk/needs scores across different follow-up periods, populations, and offense types. Studies spanned three-year to 20-year follow-up periods, and the relationship between risk score and re-offending remained strong. These findings are critical in providing evidence for the validity of the Level of Service tools. Interestingly, the risk of re-offending decreased with age, treatment duration, and re-assessment. It is possible that more static items, such as offense history, are a valid but time-dependent indicator of likelihood to re-offend. These findings demonstrate the importance of re-assessment to acknowledge and address that dynamic risk/needs factors are subject to change.
 Andrews, D. A. (1982). The Level of Supervision Inventory (LSI): The first follow-up (Report). Ontario
Ministry of Correctional Service.
 Andrews, D. A. & Bonta, J. L. (1995). The Level of Service Inventory–Revised: User’s Manual. Multi-
Health Systems Inc.
 Andrews, D. A., Bonta, J. L., & Wormith, J. S. (2004). Level of Service/Case Management Inventory
(LS/CMI™): An Offender Assessment System: User’s Manual. Multi-Health Systems Inc.
 Andrews, D. A., Bonta, J., & Wormith, J. S. (1995). Level of Service Inventory–Ontario Revision (LSI-OR):
Interview and scoring guide. Ontario Ministry of the Solicitor General and Correctional Services.
 Andrews, D. A., Guzzo, L., Raynor, P., Rowe, R. C., Rettinger, L. J., Brews, A., & Wormith, J. S. (2012). Are
the major risk/need factors predictive of both female and male reoffending?: A test with the eight domains of the Level of Service/Case Management Inventory. International Journal of Offender Therapy and Comparative Criminology, 56(1), 113–133. https://doi.org/10.1177/0306624X10395716
 Girard, L., & Wormith, J. S. (2004). The predictive validity of the Level of Service Inventory-Ontario
Revision on general and violent recidivism among various offender groups. Criminal Justice and Behavior, 31(2), 150–181. https://doi.org/10.1177/0093854803261335
 Giguère, G., James, J., & Proulx, J. (2021). Validity of the LS/CMI for the prediction of recidivism among
male and female offenders. Journal of Crime and Criminal Behavior, 1(1), 101–120.
 Girard, L., & Wormith, J. S. (2004). The predictive validity of the Level of Service Inventory-Ontario
Revision on general and violent recidivism among various offender groups. Criminal Justice and Behavior, 31(2), 150–181.https://doi.org/10.1177/0093854803261335
 Hogg, S. M. (2011). The Level of Service Inventory (Ontario Revision) scale validation for gender and
ethnicity: Addressing reliability and predictive validity. University of Saskatchewan.
 Nowicka-Sroga, M. (2004). The Level of Service Inventory-Ontario Revision: A recidivism follow-up
study within a sample of male young offenders. University of Ottawa.
 Dahle, K.-P. (2006). Strengths and limitations of actuarial prediction of criminal reoffence in a German
prison sample: A comparative study of LSI-R, HCR-20 and PCL-R. International Journal of Law and Psychiatry, 29(5), 431–442. https://doi.org/10.1016/j.ijlp.2006.03.001
 Vose, B., Smith, P., & Cullen, F. T. (2013). Predictive validity and the impact of change in total LSI-R
score on recidivism. Criminal Justice and Behavior, 40(12), 1383–1396. https://doi.org/10.1177/0093854813508916
 Wolf, V., Mayer, J., Steiner, I., Franke, I., Klein, V., Streb, J., & Dudeck, M. (2023). The predictive
accuracy of the LSI-R in female forensic inpatients—Assessing the utility of gender-responsive risk factors. International Journal of Environmental Research and Public Health, 20(5), 4380. https://doi.org/10.3390/ijerph20054380
 Wormith, J. S., Hogg, S. M., & Guzzo, L. (2015). The predictive validity of the LS/CMI with aboriginal
offenders in Canada. Criminal Justice and Behavior, 42(5), 481–508. https://doi.org/10.1177/0093854814552843
 Jiminez, A.C., Delgado, R.H., Vardsveen, T.C., & Wiener, R.L. (2018). Validation and application of the
LS/CMI in Nebraska Probation. Criminal Justice and Behavior, 45(6), 863-884. https://doi.org/10.1177/00938548187632
 Stewart, C. A. (2011). Risk assessment of federal female offenders. University of Saskatchewan.
 Dyck, H. L., Campbell, M. A., & Wershler, J. L. (2018). Real-world use of the risk–need–responsivity
model and the level of service/case management inventory with community-supervised offenders. Law and Human Behavior, 42(3), 258–268. https://doi.org/10.1037/lhb0000279
 Ostermann, M., & Herrschaft, B. A. (2013). Validating the Level of Service Inventory-Revised: A
gendered perspective. The Prison Journal, 93(3), 291- 312. https://doi.org/10.1177/0032885513490278
 Wormith, J. S., Hogg, S., & Guzzo, L. (2012). The predictive validity of a general risk/needs assessment
Inventory on sexual offender recidivism and an exploration of the professional override. Criminal Justice and Behavior, 39(12), 1511–1538. https://doi.org/10.1177/0093854812455741