How and why I built a Machine Learning model to predict table tennis matches results
I am a Knowledge Skilled who loves constructing information merchandise to unravel issues. I am at the moment working along with professionals from varied backgrounds to supply new analytical insights in business. I would love to mix my ardour for open information to proceed contributing to vary folks lives in a greater and analytical world.
The issue I wished to unravel
A buyer reached me out to assist him constructing a worthwhile machine studying mannequin to foretell tennis desk matches outcomes primarily based on the historic information. After beginning the mission I’ve observed that the problem was larger than anticipated as a result of the information supplied, which was collected earlier than utilizing internet scraping, was not dependable sufficient to coach a very good mannequin.
What’s Machine Studying mannequin to foretell sports activities outcomes?
Provided that I instructed to separate the mission into three essential sprints:
- Gather the information once more, however this time utilizing a dependable API.
- Do once more all the information transformation and cleansing.
- Deploy a dependable prescriptive mannequin and automize it.
To begin with I’ve chosen Python because the language for the mission since python gives many libraries and documentations to help with any challengs throughout this milestone.
I’ve developed the mission utilizing Google Colab due the power to share and clarify every step to my clients given transparency for him.
The primary libraries used had been:
For information visualization:
The method of constructing Machine Studying mannequin to foretell sports activities outcomes
For the information assortment course of I’ve used the requests library, known as the information wanted primarily based on the Id of the leagues that my buyer wished to work with. These API calls returned, as anticipated, json format information that I simply transformed to a tabular format utilizing Pandas.
Ultimately of the information assortment I assure over 200 thousand of excellent high quality information to develop our predictive mannequin for tennis desk matches outcomes.
The issue was solved primarily based on a binary classification machine studying mannequin since every recreation ought to has solely two potentialities for every participant: Win ou Loss.
Challenges I confronted
The primary challenges confronted on this downside was the actual fact of not have a very good high quality information within the begining and it taken some days from me earlier than notice that. One other good problem that I’ve confronted was that I couldn’t consider the mannequin usually as a classification downside utilizing essential metrics resembling accuraccy, ROI curve, precision and recall.
The primary metric which indicated the success of the mannequin was RoI (Return of Investiment) in a long run primarily based.
Nevertheless it helped me to grasp some mission can’t be solved utilizing recognized and most used metrics in lots of machine studying issues.
Ideas and recommendation
This mission additionally helped me to know all of the steps of a knowledge science mission since information assortment to mannequin deployment and help to our clients. That is so essential for any information skilled who needs to have a very good perspective a few information science mission. It’s best to contemplate fixing a related downside simply not taking care about constructing mannequin, but in addition engaged on the information assortment, absolutely imersion into the issue and buyer must deal with and remodel the information with the intention to have good outcomes to current.
Ultimate ideas and subsequent steps
Now I’m exploring the Picture Classification world coursing the Deep Studying Specialization by Andrew Ng from Coursera which a extremely suggest it and count on to make use of these new abilities to supply options in deep studying issues to my clients.