FYP Progress (Week 5)

Qamarul 'Ashraf

--

On last discussion with my supervisor, we chose to use multilayer perceptron (MLP) to perform predictive analytics. The accuracy of the results can then be compared with a research by Kou-Yuan Huang and Kai-Ju Chen, in which they used MLP prediction to predict results of World Cup 2006. Unlike their research, the techniques that is involved in this research is feature selection, which means that the accuracy of the algorithm will be tested with different number of features. However, the algorithm that is used currently cannot been completed as of yet. Thus, in order to keep progressing, I decided to divert my attention for a while to statistical analysis, using Poisson and Skellam distribution.

Statistical Analysis

Different to techniques that I used during descriptive analytics, Poisson distribution provide probabilities instead of the average. This is the goal scored by home and away team with Poisson distribution:

The Poisson distribution that we have now is actually consists of two different distribution, which means that they are not related to each other. One more thing, the numbers that we can predict from this is only based on time, which is incorrect since we should include the most important feature, team names. Regardless, with these numbers, we can calculate the probability of an event to occur, such as the probability of draws and probability of home team to win a match with 1 goal difference:

Probability mass function (PMF) will enable us to link two features, which is why we can get the probability of events between two means (μ), or as we understand it, mean goals scored by home and away teams. The calculation of the event will be discussed later. Right now, we can use Skellam distribution to relate these two Poisson distributions.

After successfully combining these two Poisson distributions, it is necessary to see if we can include another two features, home and away team names, because obviously the teams themselves are the reason for the difference in the number of goals. We can test this using by combining the PMF between two different teams.

The bar chart is a little bit messed up, fortunately, we can still see the Poisson distribution. The colour Red represents Liverpool, while Blue represents Chelsea, two different teams. The upper distribution is them being home teams against every opponent in English Premier League (EPL) 2019/20, while the one at the bottom is them being away teams. After accomplishing this, we can continue to extract more statistical information.

From what is gathered here, a lot of different information can be extracted, such as the number of goals that can be scored in a match:

Based on the probability, a match between Chelsea (home team) and Liverpool (away team) will only have one goal. Besides, we can calculate the probability of win-lose-draw with a defined number of goals:

The matrix represent the two teams we used, the row is home team goals while the column represents away team goals both in the same order [0, 1, 2, 3]. The lower triangle of the array is the probability of home team to win the match, the upper triangle is the probability of away team to win while the diagonals are the probabilities for draws. With a defined maximum goals, we can also find out the probabilities of win-draw-lose between any two teams:

As we can see, Liverpool is expected to win a match between them and Chelsea. It is very likely because, in fact, Liverpool are the champions of 2019/20 season, hence we can say that this information is quite believable.

The techniques applied above can also be applied to current season. A dataset from football-data.co.uk provides every required information, and the best thing is that the dataset is updated weekly

This is the dataframe of current EPL season; 2020/21. The results is not quite reliable because we are on game week 7 (7 games played by every EPL teams), which means that we might not have have enough data to process. Anyway, it does not stop us from extracting similar information.

Aston Villa and Brighton are set to meet on game week 8, which means that we can use the data we have to predict the result of next match. Aston Villa has shown a good performance this season after beating the champions 7–2, and perhaps that is why our model is giving them a huge odds to win the next one. We can also compare this to existing betting markets.

Source: https://www.oddsportal.com/soccer/england/premier-league/aston-villa-brighton-6DavxcEF/

And thus, marks the end of the project’s progress update, thank you for reading.

--

--

No responses yet