An Examination of Fatal Force by Police in the US | by Navid Mashinchi | Nov, 2020


Background: Fatal shootings by police as a topic is heavy in the news. For several years before the current interest in shootings, they’ve been on a small rise. But more importantly, the change seems relatively so small compared to the increasingly disproportionate attention it receives.

Image by author

Helping propel this issue to the national spotlight are the connections between race and mental illness that have been part of many of the reported instances. With so much attention on this topic, leaders have been looking for more data to help make sense of what is happening and make decisions.

Objective: Our team wanted to examine the factors that play into the horrible event of a fatal shooting. Which ones carry more weight that lead to fatal shootings and are perhaps predictive in nature? Are they race? State location? Mental Illness? Based on our findings from the dataset of the available variables, we are looking to predict the deceased’s race or mental illness status.

Methods: First, we needed a good dataset. We found “The Washington Post” newspaper has a dataset available on GitHub that is a compilation of all police shootings from January 2015 until the present. According to their site it is actively updated. They also took (and continue to take) the time to research new entries to ensure accuracy of the reporting.

We then used this dataset with over 5700 data points in our Jupyter/Python setup. We cleaned and analyzed the data using various commands and structural aspects in the pandas module while also counting and visualizing it as well. In order to clean the data, some columns had to be cleaned up and changed types (i.e. the dates’ years had to be pulled out, strings set as categories). We then ran some models using a OneVsOneClassifier on our race and a random forest model on mental illness to investigate the dataset’s capability to predict trends and connect potential causality.

We included several features or independent variables like race, gender, state location and mental illness among a few others. We will use these to assess which ones may or may not be related to fatal police shootings.

Results: White, non-Hispanics were 45.49% of the victims and Black, non-Hispanics were second at 23.72% and Hispanics were third at 16.53% of all shootings from 2015 to mid 2020.

Mental health showed a majority of victims across the board were not mentally ill with 30.33% of all victims being white males with no illness and 19.50% of all victims being black males with no illness.

Our random forest model did show some relative decent accuracy (77%) in predicting whether a victim was mentally ill or not.

Conclusions: Initially, it appears that race isn’t as crucial to this discussion as previously thought. White, non-Hispanics were killed more, even by percentage (45%) than other races and Black, non-Hispanics were second (at 23%). Mental Illness is not as prevalent in those who were shot yet based on our data, we are able to determine if a victim was mentally ill or not.

Perhaps interestingly, we found that there aren’t enough data points to predict if a certain race was associated or correlated with the rest of the feature variables we were investigating. Since there were so many white and black races accounted for, some of the other races had too small of a count to allow for the model to learn and be able to predict. We conclude to predict this we would need more data.

This lack of data to determine correlation with race to features inhibits the ability as well to determine with confidence whether the victim was killed because of one of the feature variables.

Also, using the one dataset here is not enough data to do a full-on rigorous study of this issue to see police bias in fatal shootings. Other variables would need to be accounted for such as (but not limited to) economic level and the victim’s prior amount of charges (including severity i.e. violent, nonviolent).

Keywords: police, shooting, fatal, race, mental, health

With a nation so divided currently along several lines, we thought looking at the data around police shootings involving death and the associated factors would be helpful. The perceptions of those who follow media reports on the topic are usually quick to point fingers in several directions and then draw conclusions. Some of those conclusions may or may not be warranted considering the anecdotal nature of the live stream news feed and the emotional pull this topic has on people.

In researching this topic, it seems a common theme that there isn’t enough data, research, and science in this area to help leaders in our country start formalizing plans and strategies to help aid in this problem. As seen simply in the 2019 title: Wanted: better data on police shootings to reduce mistrust between the police and the communities they serve”.

Photo by mana5280 on Unsplash

Then more recently in the wake of the George Floyd riots, “It remains unclear which law-enforcement practices are actually best, largely because of a lack of data and science. “We’re operating in the dark about what are the most effective strategies, tactics and policies to move forward with,” Robin Engel, director of the Center for Police Research and Policy in Cincinnati, Ohio says.”

The recent article, while bringing up the frustration of not having enough data, did reference a few studies that had been done in recent years in this field. These studies did show leaning toward racial bias. For instance, an ‘Answering the Call’ study that looked at how white police officers responding to black neighborhoods drew their weapons more often than black officers in similar neighborhoods.

This same article roughly backed up our data when it also reported that approximately 1,000 civilians a year are killed by law enforcement officers. -Our dataset has about 5700 people listed as being shot in a little less than 5 years time.

Here we hope to shed some light onto this topic. While our findings are brief, perhaps they can help others to better understand the events that are unfolding around them.

  • id → int64
  • name → object
  • date → object
  • manner_of_death → object
  • armed → object
  • age → float64
  • gender → object
  • race → object
  • city → object
  • state → object
  • signs_of_mental_illness → bool
  • threat_level → object
  • flee → object
  • body_camera → bool
  • longitude → float64
  • latitude → float64
  • is_geocoding_exact → bool

Based on the dataset above, the following columns were changed to categorical types:

  • Date → Change to date type.
  • manner_of_death → Category
  • armed → Category
  • gender → Category
  • race → Category
  • city → Category
  • state → Category
  • threat_level → Category
  • flee → Category

Next, we checked the number of null values:

  • id → 0
  • name → 218
  • date → 0
  • manner_of_death → 0
  • armed → 213
  • age → 262
  • gender → 2
  • race → 597
  • city → 0
  • state → 0
  • signs_of_mental_illness → 0
  • threat_level → 0
  • flee → 302
  • body_camera → 0
  • longitude → 282
  • latitude → 282
  • is_geocoding_exact → 0

For each of the above columns with null values we had to think about how we would approach this. We had 3 options:

  1. Get rid of the particular row.
  2. Get rid of the entire column.
  3. Set the null value to some value. For example, if numeric we might consider zero, mean, median, or else we can use a string.

So in example, we saw two null values for ‘gender.’ For the first instance, the particular victim was listed as bisexual, nonbinary and intersex. Since we were dealing with only female and male, we decided to drop that particular row. The second instance we also dropped because the particular row had a lot of missing values. Name, age, race and gender were not available. It seemed reasonable that we would not get a lot of information out of that particular instance.

To make the dataframe more readable for the reader, we replaced the missing name with a string value of “Name not available”. Similarly, missing armed values with “undetermined”, race and flee with “U”. The null age values were filled with the mean of its associated gender and race.

We replaced the race column categories with full strings for now to make it visually more clear, when we conducted the explanatory analysis:

  • ‘W’ → ‘White, non-Hispanic
  • ‘B’ → ‘Black, non-Hispanic’
  • ‘A’ → ‘Asian’
  • ‘N’ → ‘Native American’
  • ‘H’ → ‘Hispanic’
  • ‘O’ → ‘Other’
  • ‘U’ → ‘Unknown’

For the missing longitude and latitude, we used “Nominatim” to find the coordinates based on the city and state. There were two instances where we had to do a manual lookup.

We added in the day of the week into the dataframe, based on the date column and a new column was created. We decided to add the day column to see if there was a trend on when individuals got shot throughout the week.

This ended up leaving us with a cleaned-up dataset to use for analysis in our results with the following information:


  • We have a total of 5762 victims.
  • The mean age of the victim is 37.14657.
  • The minimum age is 6.
  • The maximum age is 91.
  • 25% percentile for our victim’s age is 27.
  • 50% percentile for our victim’s age is 36.
  • 75% percentile for our victim’s age is 45.

We also brought in a secondary dataset to help analyze the amount or count of shootings per state. Normalizing based on a per 100,000 person rate to compare fatal shootings per state seemed appropriate as initial results showed for example California really high. So, we brought in the population amounts by pulling a table from ‘US States — Ranked by Population 2020’ website. The count of shootings then was divided by the matching state’s population from this table. Then because that number was so low, we made it more readable by multiplying by 100,000. This means the number of shootings per 100k people. This was then put in a new dataframe to use later.

Image by author

To the left, we see the ranking and percentage shot by race:

  • 45.49% were White
  • 23.72% were Black
  • 16.53% were Hispanic
  • 10.4% were unknown
  • 1.64% were Asian
  • 1.4% Native American
  • 0.81% Other

With race being a big question going into this project, seeing the number of victims based on race seemed to be perfectly logical. Here we saw that within all of the dataset White was killed the most and then black was almost half of the white percentage. This graph was done initially showing a straight count but then the percentage amount was used for clarity. Following now in the same line of questioning, we took a look at the physical location by state with respect to the races that were shot in those states:

Image by author

We already know from the above catplot “Percentage by race” the racial percentages.

Additionally, what the above United States plot shows us is that the majority of victims that are Hispanic have been shot by the police in the following States: Texas, New Mexico and California. This becomes evident when we pay attention to the purple dots.

When looking at the dots with the pink colors. We can see that most of those are more centrally located in the country. We could possibly conclude that most of the Native Americans are in most danger in the central part of the country.

As we can see with no surprise now, the majority of the victims are white. The green dots are spread all over the country.

What is really interesting to point out is the majority of the yellow dots are on the East Coast. We see most of the dots on the right side of the country rather than the left side. This could be an indicator that black people are more in danger on the east coast when they interact with the police. We can definitely see fewer yellow markers on the west coast.

Image by author
Image by author

Next, we wanted to see if shootings might have some kind of relation based on state. First, looking at the plot “Shooting By State”, we can see that a large majority of the victims were shot in California, Texas, and Florida, which makes sense as they are the top three most populated states. Whereas, in some of the smaller states, it shows there are fewer victims. Therefore, we decided to calculate the per capita number of victims by dividing the fatal shooting count per state by the state population size and then multiply that by 100,000.

This resulted in the second plot (“Percent shot per 100k by state”). Surprisingly, our new top three results Alaska, New Mexico, and Oklahoma are relatively smaller states. It shows that there are more shootings on a relative basis in these states compared to the larger states, such as California. Quick note, New York is the fourth most densely populated state yet it’s actual count and percentage count are considerably lower than the top three CA, TX or FL.

Image by author

This heatmap plot shows the level of concentration based on the shootings per 100,000 rate as seen in the previous bar graph. Here, we see very visually where the most victims are when normalized based on population. Also, we can see that being in the mid-southwest around New Mexico and Oklahoma are not great places per capita for police shootings. Alaska with its size still seems a possible outlier here.

Interesting to note that while many dots (from the racial map) showed up on the East Coast, it’s shading is fairly light. So, the East Coast being more densely populated may have skewed our results. Or perhaps there is a regional psyche going on here?

Image by author

After looking into the map forms of analysis we looked at the mental illness. In this plot shown above, we can see that the distribution of gender, race and signs of mental illness are unbalanced. As stated before, a majority of the victims are white, non-Hispanic male.

Within this dataset, we have 30.33% of all victims representing white non-Hispanic, male with no signs of mental illness, 19.50% representing black, non-Hispanic male with no signs of mental illness, and 13.19% representing Hispanic male with no sign of mental illness.

One of our other explored variables was the idea of body cam footage and how that might determine whether an individual was shot. However, this idea was scrapped in our final conclusions because so many officers did have the camera off making the data skewed and hard to have that as a determining variable.

Image by author

Based on this plot, a majority of the shooting occurs in the Southeast Coast, followed by the West Coast and the Northeast. Whereas there are a few occurrences in the central part of the country. In general, the police shootings occur with the police officer’s body camera being turned off, shown in red.

Image by author

The age distribution by race has a strong representation across all between the ages of 22 and 47, with a mean of 37. Please note, we replaced the 262 null values in the “age” column with the mean values based on race and gender. The table below shows the values used for the 262 null values:

Image by author

Also, our dataset contains ages ranging between 6 and 91. From this plot, we can also see that there are a considerable amount of outliers for each racial group. Shown in the table below of the number of individual, the min, max, and mean, and the number of outliers of each race:

Image by author

From this table, there are a total of 94 outliers in this dataset.

Modeling (Signs of Mental Illness) — Summary:

As most of our features are categorical variables, and because Machine Learning algorithms prefer to deal with numbers rather than strings, we had to convert our features to numerical values. We converted the following columns to binary numbers:

  • body_camera. False = 0 — True = 1.
  • gender. Male = 1 — Female = 0.
  • signs_of_mental_illness. False = 0 — True = 1.
  • manner_of_death . Shot = 0 — Shot and Tasered = 1.

Afterward, the following columns were converted to categorical. As we continued to feature engineer our data, the categorical was one-hot encoded using the pandas get_dummies method.

  • armed
  • race
  • threat_level
  • flee
  • day

The reason why we one-hot encoded those particular columns is the fact that those have multiple classifications. In order to avoid any bias from the machine we one-hot encoded those columns. Next, the following columns were dropped, since they didn’t help us in our modeling:

  • id
  • name
  • date
  • city
  • state
  • longitude
  • latitude

Once we finished massaging the data in order to prepare it for our machine learning algorithms, we needed to decide what kind of models to use. The goal was to figure out if the victim had signs of mental illness (True/False). We concluded that we were dealing with a binary classification problem, thus we decided to implement the following models:

  • Logistic Regression
  • SVC (Support Vector Classification)
  • SGD (Stochastic Gradient Descent)
  • Decision Tree
  • Random Forest

Before implementing the models, we first needed to split our data into training and test sets. We decided to use the ratio of 80 to 20. Setting 80% to the training set and 20% to the testing set. After we split our data, we created our X and Y dataframes for our training and test set.

Next, we implemented each of the algorithms mentioned above. Using the default parameters and a random state of 42, each model provided the following accuracy scores:

Image by author

We knew that our features needed to be scaled when using Support Vector Machines. We hadn’t done so for our SVC and SGD. Therefore, we scaled our features using sklearn’s StandardScaler and we re-ran the models above to see how the accuracy score changed. We got the following table of results after running the algorithms with the scaled features:

Image by author

We didn’t see much of a difference in the accuracy score, however we continued with the above table. We picked the top 2 performing models from above and conducted a cross validation on them. Once we conducted the cross validation with a k-fold of 10 and scoring value of ‘accuracy’, we got the following results:

Image by author

Clearly, we could see that our performing models were overfitting. We took the better performing model and started fine tuning it. The Random Forest performed better in comparison to the Decision Tree when cross validated.

We now shifted our focus to tuning our parameters using RandomizedSearchCV from sklearn. We tried to tune the following parameters and search for the best parameter values.

  • “bootstrap”: [True,False],
  • “max_depth”: [int(x) for x in np.linspace(start = 10, stop = 110, num =11)],
  • “max_features”: [“auto”,”sqrt”],
  • “min_samples_split”: [2,5,10],
  • “min_samples_leaf”: [1,2,4],
  • “n_estimators”: [int(x) for x in np.linspace(start = 200, stop = 2000, num =10)]

The result of our RandomizedSearchCV returned the following output:

Best Params:

{‘n_estimators’: 200, ‘min_samples_split’: 10, ‘min_samples_leaf’: 2, ‘max_features’: ‘sqrt’, ‘max_depth’: 50, ‘bootstrap’: True}

Best Estimator:

RandomForestClassifier(max_depth=50, max_features=’sqrt’, min_samples_leaf=2,

min_samples_split=10, n_estimators=200, random_state=42)



By using RandomizedSearchCV, we were able to improve our model. Last but not least we evaluated the model with the best estimators on our test set. With the test model, we got an accuracy score of 0.7715099155703908.

So, what does this mean? This means that our value from the variable ‘final_model_accuracy’ above, is the percentage of correctly predicted labels. There is always room for improvement in regard to improving the model, such as collecting more data, getting rid of potential outliers and/or changing the number of features our model takes in. However, as of now we can be satisfied with the result of our model.

Modeling (Race) — Summary:

  • Asian
  • Black, non-Hispanic
  • Hispanic
  • Native American
  • Other
  • Unknown
  • White, non-Hispanic

We started feature engineering our model by first using our cleaned source data. As in our first attempt, most of our features are categorical variables, so we converted our features to numerical values. The following columns converted to binary numbers:

  • body_camera. False = 0 and True = 1.
  • gender. Male = 1 and Female = 0.
  • signs_of_mental_illness. False = 0 and True = 1.
  • manner_of_death. shot = 0 and shot and Tasered = 1.

Afterward, the following columns were converted to categorical and implemented one-hot encoding on the following columns using the pandas get_dummies method to avoid any biases for the machine:

Next the following columns were dropped since these columns did not help our modeling:

  • id
  • name
  • date
  • city
  • state
  • longitude
  • latitude

We also had to convert our race column to numerical values. In this modeling exercise, our target was race and we needed to map each string to a number.

  • {‘White, non-Hispanic’:0,’Unknown’:1,’Other’:2,’Native American’:3,’Hispanic’:4,’Black, non-Hispanic’:5,’Asian’:6}

Once we were done massaging our model, we split our data into a training and test set. Again, setting 80% of our data to the training set and the remaining 20% to the test set. After splitting our dataset, we scaled our features using StandardScaler().

The algorithms that we used for our multi classification problem were the following:

  • OneVsRestClassifier
  • K-Nearest Neighbors
  • Random Forest
  • Neural Network

We made a prediction on the training set and evaluated each model using an accuracy score. Also used a classification report to see the overall performance of each model.

OneVsRestClassifier Classification Report:

Image by author

K-Nearest Neighbors Classification Report:

Image by author

Random Forest Classification Report:

Image by author

Neural Network Classification Report:

Image by author

Summary Round 1 — Scores:

Image by author

Based on the first-round results, we concluded that all the models performed poorly. Thus, explaining a decrease in accuracy when doing cross validation was perhaps from over-fitting with an unbalanced dataset. As a result, none of the models were beneficial to predict race.

After multiple attempts, we decided to move on and have a closer look at our data. Plotting a count plot of shootings per race, we can see that our data is unbalanced with a high of White, non-Hispanic (0) and Black, non-Hispanic (5) and low numbers in the other categories. As shown in the race distribution plot:

Image by author

Using an algorithm called SMOTE to deal with the unbalanced data, allowed us to upsample the data and augment the data with artificial new data. This particular strategy takes repeating samples from the minority classes (1–6) to increase the data points and balance out the dataset. See below plot and the numbers when upsampling.

Image by author

As shown above, the data distribution is balanced, and we reran our models to see if there was any improvement. Please note that SMOTE was only used on the training data, and not the test data. We got the following classification reports:

OneVsRestClassifier Balanced:

Image by author

K-Nearest Neighbors Classification Balanced:

Image by author

Random Forest Balanced:

Image by author

Neural Network Balanced:

Image by author

Summary Round 2 — Scores:

Image by author

From the classification reports and the table above, there is a performance improvement in each model with the balanced dataset. Further we can see that there is not a drastic score change in cross validation as in round 1.

Applying the best performing model, OneVsRestClassifier, on our test data, we thought that the good numbers from the table above, would be a good indicator that our model would perform well. However, the test data provided us with a poor accuracy score of 0.37. This was not what we were hoping for.

We tried different strategies when it came to upsampling our data. We increased the minority classes by 0.15; the model did not show any improvement. We also used a gridsearch on the OneVsRestClassifier model and it showed no difference. As a result, we concluded that we would need to gather further data in order to create a better model to predict the race.

Another example of expectations being misleadingly set was seen in the previously referred study of ‘Answering the Call.’ That study was referred to in that article as white police officers firing 5 times more than black officers in similarly black populated areas. But when looking at the data, the set of data points in question (widest apart) were closer to four times difference; not five. It was rounded for ease of writing, and more careful observation shows the fit line being about 3 times more.

Concerning race, our modelling research results were inconclusive of race being determined or predicted by the other variables involved. Furthermore, to our research question, if race played more into a reason why a person was shot, our models were not capable of telling. And ultimately, this dataset is not large enough in data points or features to answer that question fully. This helps bolster the idea as mentioned earlier that more data is needed. Also, from this dataset of just 5700+ points, perhaps race isn’t everything but maybe something else like social class or incomes. Unfortunately, neither of those were recorded.

Mental illness was another variable we considered. The results on this were more promising for prediction. We found a high connection between an individual suffering from Mental Illness and the other variables involved of a victim from a fatal shooting. Our scores showed that this was more of a predicting variable or feature than the race feature. This seems to be similar with other researchers.

In the article People with Untreated Mental Illness 16 Times More Likely to Be Killed By Law Enforcement” they say “individuals with untreated severe mental illness are involved in at least 1 in 4 (police encounters) and as many as half of all fatal police shootings”.

Perhaps regional state factors played into fatal shootings. To normalize, we considered the US states based on a per 100,000 person basis. Alaska had the most but probably that is still it’s sparse population skewing that result. However, several states in the relative West were the next highest, New Mexico, Oklahoma, Arizona, Colorado, Nevada and Montana. Maybe this is because of the sparse density along with a general difference in psychology of the police officers -maybe a “Wild West” mentality?

Now certainly a more rigorous study should account for other variables and considerations like the full population of the United States (with an eye on race percentages) and economic levels of the deceased. As well as further connections of mental illness to violent crimes, and regional studies of police force resolution practices. Also, a larger set of interactions with police including but not limited to non-lethal situations, domestic and traffic situations and more.

Even with this limited look at fatal police force, more than just attention to police departments is necessary.

Seth Stoughton, a former police officer who is a law professor at the University of South Carolina in Columbia said, “I have become convinced that we do not have a race problem in policing. Rather, we have a race problem in society that is reflected in policing.”

However, despite the limited data and all the issues with race in the general populace, police forces are recognizing the need to do something. They can help set the tone for what the general public sees.

“A survey of 47 of the largest US law-enforcement agencies between 2015 and 2017 found that 39% changed their use-of-force policies in 2015–16 and revised their training to incorporate tactics such as de-escalation. Among the agencies surveyed, officer-involved shootings dropped by 21% during the study period11.”4

These kinds of results are the beginning of real change that can help our country move into the future with hope. And that despite our incomplete and misunderstood data, we can still recognize a very real and active need to do something to help all our fellow Americans feel safe and follow through on those actions.

Photo by Morvanic Lee on Unsplash

Read More …


Write a comment