Cyndi Kohashi
- UFC Historical Analysis -
The UFC (Ultimate Fighting Championship) is a mixed martial arts organization that was founded in 1993.
As someone with no knowledge of the sport, I chose to analyze match data to learn about the UFC and find what contributes to a win or loss.
Objective
To learn more about the UFC, what factors contribute to winning (age, height, weight, reach, etc.), and the most common way to win.
Data
- UFC-Historical Data from 1993-2021 by Rajeev Warrier via Kaggle
- Data includes fights from 1994-2021 and has fighter statistics
- World-countries.json by Kostya via Kaggle
- Project brief
Skills
- Sourcing open data
- Data cleaning, wrangling, and subsetting
- Performing exploratory visual analysis
- Creating geographical visualizations
- Supervised machine learning with linear regression
- Unsupervised machine learning with k-means clustering
- Sourcing and analyzing time-series data
- Creating a data dashboard
Tools
- Microsoft Excel
- Anaconda
- Jupyter Notebook
- Python
- Python libraries Pandas, NumPy, Seaborn, Matplotlib, Folium, and scikit-learn
- Tableau
Overview of the Fights
While the number of fights per year have increased since the UFC started, it didn't really accelerate until 2004.
There were 456 fights in 2019, compared to the humble 26 fights in 1994.
The forecast indicates a decrease of fights in the future, but I believe this is influenced by the stagnation of fights during the Covid pandemic.
Wins by Corner
Fighters in the Red corner make up 66.18% of all wins. The “favored” fighter is usually in the Red corner to enter the cage second and increase entertainment of the match.
I thought this might’ve been a factor in winning, as a fighter can be affected by this psychologically. Information on mental states, however, isn't included in this dataset.
The "favored" fighter can also be considered the more experienced fighter, and experience may be a factor in winning.
Wins by Type
The most common type of win is by Decision-Unanimous, making up 34.88% of all wins.
The potential for injuries is high in the UFC, but only 1.33% of all fights are ended by a doctor stoppage, where a doctor decides if a fighter cannot safely continue.
Linear Regression Analysis
When looking at the relationship between wins and other characteristics, the strongest correlation was between wins and total number of rounds fought. This makes sense as the more a person fights, the more chances they have to win and gain experience.
Based on this, I formed the following hypothesis to test: If a fighter is in more rounds, they will have more wins.
The analysis shows that linear regression is a fairly good fit for these variables. 87% of the variances in wins can be explained by the number of rounds fought.
It supports our hypothesis that having more rounds is related to having more wins, but there are still many data points further away from the trend line.
K-means Cluster Analysis
The next test was a k-means clustering, continuing to look at wins and total rounds fought. The clusters did not show up as divided vertically or horizontally, but instead appeared stacked on top of each other.
There didn’t seem to be one factor that influenced the clusters. This is better illustrated by the clusters’ descriptive statistics.
Cluster 1 in both Blue and Red corners has the highest average rounds, but Cluster 3 has the lowest average rounds, highest average wins, and age. This suggests that the group with lower rounds has more wins, contrasting our hypothesis.
UFC Fights Across the World From 1994-2021
While fighting in more rounds may contribute to more wins, this project has shown me that wins are multifaceted and there’s no way to guarantee a prediction. Humans are unpredictable and the UFC fighters should be celebrated as the sport continues to grow and thrive.
I was not a UFC, or even much of a sports fan before this analysis project, but I'm glad to have learned about something new and surprisingly complex.
While the results of the analysis may have been predictable, I was able to use machine learning techniques that were new to me. I also gained experience picking my own data and developing my own questions and project objectives.
Donald Cerrone
Highest total wins at 23, and the most matches at 36
Charles Oliveira
Most submission wins at 14, and most Blue corner wins at 11
Derrick Lewis
Most KO/TKO wins at 12, 75% of total wins
Anderson Silva
Longest win streak with 16 wins
Ron van Clief
Oldest fighter at 51, with 1 match in 1994
Randy Couture
Second oldest fighter at 47, with 16 wins
Jessica Andrade
11 wins, most of all the women’s weight classes
Jim Miller
Most matches at 36, with 21 wins
Our data ends in March 2021. Many of these top records could’ve been passed since then.
A good example of this is Jim Miller, who now has the most fights at 46 and the most wins at 26 (as of January 2024).
I was not a UFC, or even much of a sports fan before this analysis project, but I'm glad to have learned about something new and surprisingly complex.
While the results of the analysis may have been predictable, I was able to use machine learning techniques that were new to me.
I also gained experience picking my own data and developing my own questions and project objectives.
Recommendations & Next Steps
Expanding Globally
The UFC should continue to host fights internationally, to grow their fanbase and recruit new fighters.
Care for Injuries
Because of the nature of the sport, injuries can be severe. To increase career length and fighter wellbeing, the full effects of injuries should be researched.
Viewer Exploration
This data didn’t have viewership or fan metrics, but I would be interested in analyzing that data to look at how to increase the fanbase and interactivity.
Future Data
It would be interesting to see fight data after 2020. Because of the structure of the data, it may be more suited to a different machine learning model.
© 2019