Select Page

Chart representing the difference in sentencing (in days) between white and non-white defendants for offense code 3550 (marijuana possession < 1/2 ounce) in NC for period between 2008-2013.

How to interpret this chart: Judges on the far right are “harsher” on non-whites than on whites, while judges on the far left are more lenient on non-whites than on whites.  The y-axis represents the # of days of difference between sentences for blacks and non-blacks.  This means that at the far right of the chart, judge CU in Iredell County awarded sentences to non-whites that were on average 14 days longer than to whites, while judge AEM in Durham County awarded sentences to non-whites that were on average 14 days shorter than to whites. Judges in the middle of the graph had virtually no difference in sentence length between whites and non-whites.


About the author of this article:

Melinda Thielbar, Ph.D. is a Research Statistician Developer in the JMP Division of SAS Institute, which means she both researches new statistical methods and develops statistical software. She currently specializes in consumer research data, though she has worked in many fields, including power systems optimization, fraud detection in government programs, and training evaluation. Melinda holds a PhD in statistics and a master’s in economics. The views represented here are hers alone and are not endorsed by SAS or JMP.


A lot of questions…

In a previous post, Jim Young showed an exploratory analysis of marijuana arrests (offense code 3550) in Mecklenburg County. We wanted to do a similar analysis of how people are sentenced once they’ve been arrested for this offense code. Are black defendants treated the same as white defendants? Do they receive similar sentences? Are they as likely to receive suspended sentences or community service in lieu of serving time?


We also wanted to know if some judges were harsher (or more lenient) with black defendants, and we wanted to be able to distinguish any “judge effect” from an overall discrepancy between white and black defendants. We used a statistical model so we could look at of these effects together instead of viewing each effect in isolation (details at the end of the post).


Our goal for this analysis is to help start a conversation about race in the North Carolina court system—not to be the final word about whether our courts are perfectly fair. 


“Overall, white and non-white defendants receive similar sentences” 

Our statistical model showed that whites and non-whites received about the same sentence when averaged overall. An average white defendant in North Carolina can expect to receive an 8.2-day sentence. An average non-white defendant can expect to receive a 7.8-day sentence.



What we still don’t know

Our data did not contain a measure of the defendant’s prior record, and we would expect that to be very important in sentencing. There were two notable cases (both black defendants) where the sentences were more than 4 years for no discernable reason. The judges who saw those cases had very high estimated bias against black defendants. We excluded these cases from the analysis, a common practice in statistical analysis called excluding outliers. Outliers may create strange estimates in a statistical analysis, but they also can point to important patterns that can’t be discerned in standard analysis. In the future, we would like include the defendant’s record. This would help to explain some of the differences in sentence we can’t account for now, and allow us to produce more precise estimates.


Data Preparation and Models Used


Software used

The graphs and statistical analyses were made using a point-and-click software package called JMP. JMP was selected because

  1. It is easy to use
  2. It can perform complicated analyses and make visuals based on the results quickly and easily
  3. Melinda Thielbar is a software developer who works on the JMP product 🙂

Any statistical package that includes a way to estimate a zero-inflated Poisson model and implements the LASSO (discussed below), would have produced the same estimates. Other packages that could have been used include R, SAS, and Python.


In many instances, the judge did not see enough cases for us to produce a reliable estimate

Statistical analysis relies on making multiple measurements under the same circumstances. There were many judges who only saw a handful of cases related to this arrest code (code 3550), or saw only defendants who were of one race or another. This was not enough for a reliable estimate of average sentence. Those judges were all lumped together into an extra category called “Not Enough Data”. The Not Enough Data judge was associated with 10,000 cases. This category includes judges from all over North Carolina and could be treated as a proxy for an “average” or considered a category that represents many of the cases that were present in the data. The difference in sentence length between white and non-white defendants for this category was practically 0 (about -0.41), which further supports our first case: on average, white and non-white defendants are treated the same, at least when it comes to sentence length.


Most People Are Sentenced to 0 Days

Of the 44,000 cases in the database, a little over 20,000 were sentenced to 0 days (i.e. given a suspended sentence or allowed to perform community service). The standard models most people would use to predict sentence length wouldn’t do well with data like this, so we used a special kind of statistical model called a Zero-Inflated Poisson (or ZIP model).


While 44,000 cases may seem like a lot, estimating separate effects for race by each judge uses a lot of information and can create some problems with standard statistical techniques. To combat this, we used a technique developed at Stanford to deal with data sets that have a lot of possible effects compared to the number of rows in the data. This technique, called the LASSO, shrinks estimates for unimportant effects to 0.


Most Defendants Are Either Black or White

Our description of the analysis refers to white and “non-white” defendants. Of the 44,000 cases analyzed, approximately 26,000 defendants were black, and about 16,000 defendants were white. Because of the small number of cases for other races, we re-coded the original race variable to two levels: white or non-white.



Want more information like this?

We are working on a much larger data set encompassing roughly 35
million court records.  It’s our hope to run similar statistical analyses on that larger set and publish results right here, in a future post.  Subscribe to our blog (enter your email on the right side of this post) to receive future articles like this.