Kernel Density Plots – Algobeans

[ad_1]


How do we make sense of the shape and distribution of data in a 2D space? In this two-part series, we look at how kernel density plots can be used to visualize football shots data, and how a random forest predictor can help us to predict the probability of a goal based on where the shot was taken on the field.

Association football (or soccer) is the world’s most popular sport. Two teams of 11 players compete to shoot the ball in the opposing goal, and the objective of the game is to outscore the opponent. Players may use any parts of their bodies except their arms to play the ball (goalkeepers may use their arms within the penalty area). Each game lasts 90 minutes, and each team typically takes 5 to 20 shots at the goal (“shots”), scoring a fraction of them.

Using data from Wyscout, a company that compiles match videos and tags match events, we depict all shots made by Liverpool Football Club in the 2017-18 English Premier League (38 games, 638 shots).

Scatterplot depicting all shots made by Liverpool in the 2017-18 English Premier League (638 shots). Each shot is represented by a grey dot, which denotes the position in the field in which the shot was taken towards the goal on the right.

From the scatterplot, we can see that most shots were taken within the penalty area, where players have the highest chance of scoring. In fact, the chances of scoring is affected by angle and distance from goal—we will get to that in part two of this series. Nonetheless, there seem to be no other observable characteristics or pattern about this data.

How can we then make sense of the shape and distribution of these shots?

We could perhaps differentiate the shots based on which players made them; this may be useful if we wish to analyze the performance of individual players. For example, in Liverpool’s case, we may depict the shots made by their top three attackers:

Scatterplot depicting all shots made by Liverpool in the 2017-18 English Premier League (638 shots), with the three players having the most number of shots highlighted: Mohamed Salah (144 shots), Roberto Firmino (84 shots), Sadio Mane (70 shots).

While individual visual representations are helpful, how can we summarize these insights to enable comparisons with other teams?

A kernel density plot is a good way to achieve this.

Kernel density plot of all shots made by Liverpool in the 2017-18 English Premier League (638 shots). The plot is divided into ten areas, with each area containing about a tenth of all shots. The red-most area is where shots are most concentrated.

Kernel density plots reveal several insights:

First, we can identify the centers of data distribution, i.e. areas where players made the most shots. In the plot above, we see that there are 3 centers, one outside the penalty box, two inside the penalty box to the left and right sides of the goal.

Second, we can identify the concentration of data points from the centers, i.e. how close or far apart the data points are. The plot above is divided into ten areas, with each area containing about a tenth of all shots. Areas (or lines) that are closer together depict a concentrated number of shots made within those areas. We can see that shots tend to be concentrated inside the box, and more spread out outside the box.

In fact, the centers and concentration of this kernel density plot correspond directly to the shot patterns of the three Liverpool players depicted in the earlier scatterplot:

  • Firmino, who makes infrequent shots but tends to shoot more from outside the box as compared to other attackers, is represented by the lighter colored center outside the box.
  • Mane, who mostly shoots inside the box at the left of goal, is represented by the dark-colored small center; and
  • Salah, who mostly shoots inside the box at the right of goal and who made twice as many shots as Mane, is represented by the dark-colored large center.

Think of kernel density plots as smoothed histograms. Going back to our analysis of Liverpool’s soccer play, we summarize the number of shots they made across 38 games in the 2017-18 Premier League season:

Histogram and kernel density plot of Liverpool’s shots across 38 games in the 2017-18 Premier League season.

To visualize the underlying goal probability distribution, this histogram can be smoothed into a kernel density plot. This is done using kernels, which transform each data point into curves, before summing these individual curves to produce an overall probability density plot. Typically, a Gaussian kernel is used (i.e. Gaussian bell curve).

One advantage of kernel density plots over histograms is how two density plots can easily be compared. For example, the figure below compares Liverpool and Manchester City—we can see that the latter had a shot distribution with a higher average and a lower variance:

Comparing shot frequency between Liverpool and Manchester City, across 38 games in the 2017-18 Premier League season.

To plot contoured kernel density plots, we take into account an additional dimension of position:

Renderings of a 3D histogram and contoured kernel density plot of all shots made by Liverpool in the 2017-18 English Premier League.

Hyperparameter tuning. Histograms can be rendered differently depending on the bin width, which specifies how big each bin will be. For example, in our first histogram, we used a bin width of 5, so the first bin holds all data points valued from 7.5 to 12.5. Smaller bin widths may make it difficult to see the overall trend, while larger bin widths may wash out features of the data. In kernel density plots, we must tune a bandwidth hyperparameter that works similarly to bin widths in histograms.

Interested to know how we can use these results to predict the probability of a goal based on where the shot was taken on the field? Follow us to be notified when part 2 of this series is published:

Free Data Science Tutorials

Bonus Graphs

Manchester City’s title-winning campaign was powered by key offensive players S. Agüero, R. Sterling, and K. De Bruyne, and the team made a whopping 665 shots in 38 games, scoring 106 of them. This set a record for most goals scored by a team in a single season, which still stands today.

Manchester United ranked 6th in total number of shots made in the season. Nonetheless, their shots tended to be closer to goal (the highest proportion of shots within the six-yard box among top six teams) and tended to convert into goals more often (2nd highest shots-to-goal ratio among top six teams), which drove them to runners-up of the 2017-18 Premier League.

Of Tottenham Hotspur’s trifecta, H. Kane was the most influential, making 184 shots compared to H. M. Son (75), and C. Eriksen (97), as represented by the over-sized centre in their offensive plot. H. Kane will eventually win the Golden Boot for most goals at the 2018 FIFA World Cup, the most-viewed competition in the world.

On top of Liverpool’s powerful offensive capabilities, defender V. van Dijk’s arrival to Liverpool in Jan 2018 for a word-record £75m also marked a key point for the club. With his presence as the central-left defender, shots made against Liverpool unsurprisingly came more from the right (as depicted). In the subsequent season, Van Dijk eventually won several Player of the Year awards.

While Chelsea managed 606 shots in the 2017-18 campaign, they scored only 62 of them (winners Manchester City made 665 shots and scored 106 goals). Striker A. Morata was rather ineffective, scoring only 11 goals through the season.

Data Source

Pappalardo et al., (2019) A public data set of spatio-temporal match events in soccer competitions, Nature Scientific Data 6:236, https://www.nature.com/articles/s41597-019-0247-7

We used data from the 2017/2018 season of five national first division football competitions in Europe: England, France, Germany, Italy, and Spain. Over 1,800 matches, 3 million events, and 4,000 players were analyzed.

Data was first collected in video format before events (e.g. shots, goals) were being tagged by operators (e.g. position of shot, player who made the shot) through a proprietary software.

Copyright © 2015-Present Algobeans.com. All rights reserved. Be a cool bean.

Read More …

[ad_2]


Write a comment