In today’s criminal justice system, there are more than 400 algorithms on the market that inform important legal decisions like sentencing and parole. Much like insurance companies use algorithms to set premiums, judges use risk assessment algorithms to estimate the likelihood someone will become a repeat offender when they render prison sentences. Generally speaking, lower-risk offenders can and do receive shorter prison sentences than higher-risk offenders.
Scientists and legal advocates have criticized the use of these algorithms as racially biased, opaque in how they operate and too generic for a criminal justice system that is supposed to treat everyone individually. Yet few people are paying attention to how these algorithms get this way—how they are being developed and validated before use. In the case of child pornography offenders, one algorithm is widely used by psychological experts in the criminal justice system with little thought to its development and, more importantly, its accuracy. The use of an unvalidated algorithm with unknown accuracy is dangerous, given the serious consequences associated with child pornography offenses.
The algorithm is called the Child Pornography Offender Risk Tool (CPORT). The State of Georgia uses the CPORT to determine which convicted sexual offenders should be placed on the public sexual offender registry, and experts commonly testify at sentencing hearings across the country about the results of the CPORT risk assessment. One might assume there is robust scientific evidence validating the CPORT on offenders in the United States. That assumption is incorrect.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Last year, we published a detailed methodological critique of the CPORT. Among other things, we noted that the sample used to develop the instrument was extremely small. The CPORT was developed by studying 266 child pornography offenders from Ontario, Canada, who were released from custody between 1993 and 2006. Within five years of release, 29 of the offenders were charged or convicted of a new sexual offense.
Developing an algorithm based on 29 recidivists is troubling because small sample sizes make statistical models unstable and not generalizable to the broader population of child pornography offenders. Other well-known risk factors, such as access to children or preoccupation with child pornography, were not predictive risk factors in this sample and thus were not included in the CPORT.
What’s more, the development data for the CPORT are potentially outdated given the enormous differences in technology that are used to access, store, and transmit child pornography since 2006—when the CPORT developmental sample was collected. Cell phones and other Internet technology did not come into widespread use until after 2006, significantly changing and expanding the way online child pornography offenses occur. Access to the internet is a common characteristic of child pornography offenders, but it is not included in the CPORT.
By contrast, the Public Safety Assessment algorithm, which judges use to determine the risk that someone accused will commit another crime while awaiting trial, was created by analyzing data from thousands of defendants from more than 300 jurisdictions across America. Importantly, it was validated in the local jurisdiction before use. Such large-scale and diverse testing is a keystone of valid risk assessment: even the most promising and well-known models have been shown to break down when applied to a new dataset.
Unlike the Public Safety Assessment algorithm, The CPORT researchers conducted a “validation study” with 80 offenders from the same jurisdiction in Ontario, Canada. This sample had only 12 recidivists! Its baffling results demonstrate the peril of relying on small samples: the CPORT scores were not predictive of recidivism when limited to cases with complete information, but they were predictive when cases with missing information were included. In other words, the algorithm “worked” when missing relevant information but not when it was limited to cases with complete information.
We also reviewed the studies conducted by other researchers—a vital step because studies conducted by test developers tend to have better results. Test developers have a vested interest in the promotion and success of their instrument, and this can consciously and unconsciously affect their results. But even these independent studies suffer from a lack of scientific rigor. For example, one study from Spain had only six recidivists, and the study was missing information in 97 percent of the cases. None of the studies had been conducted on U.S. offenders.
We concluded, based on an exhaustive and detailed analysis of the existing research base, that “it [is] inappropriate to use the CPORT on child-pornography-exclusive offenders in the United States at this time.” In contrast, despite noting that “it is unclear how well the scale will perform in different samples/settings, and there is as of yet insufficient data to produce reliable recidivism estimates,” the CPORT development team stated that “the scale is ready for use, [but] it should be used cautiously given the limited research base behind it.”
After the publication of our article, researchers at the Federal Probation and Pretrial Services Office (PPSO) tested the CPORT on a sample of 5,700 U.S. Federal child pornography offenders who were released from custody between 2010 and 2016. Within five years, 5 percent were rearrested for a new sexual offense. When put to the test, the CPORT demonstrated “mediocre prediction” performance that “did not approach those [values] reported by the CPORT’s developers.” As a result, PPSO decided not to use the CPORT to inform decisions about the level of supervision necessary for child pornography offenders on parole.
Despite the PPSO findings, our critique, and the lack of validation in any U.S. sample, the CPORT development team maintains that “The CPORT is defensible to use for assessing risk” and is promoting its use.
The use of unvalidated algorithms—like the CPORT—poses a significant threat to public safety and defendants' liberty. Inaccurate predictive algorithms offer the appearance of scientifically based precision and accuracy. But that appearance is illusory, and, in actuality, legal decisions based upon them lead to significant errors with dire consequences: non-dangerous offenders are locked up longer than necessary while dangerous offenders are released to commit future offenses.
Continued use of unvalidated risk assessment instruments also stymies research on alternative algorithms. Evidence shows that “homegrown” risk assessment algorithms developed on local data can be more accurate in predicting recidivism for individuals from their jurisdiction than “off the shelf” algorithms like the CPORT. However, the time and resources required to create locally developed algorithms are far outweighed when policymakers can take something already created and use it immediately.
Until and unless a risk assessment algorithm is developed and successfully validated with data in the jurisdiction in which it is to be applied, the use of risk assessment algorithms puts us all at risk.
This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.