Giter Site home page Giter Site logo

martj42 / international_results Goto Github PK

View Code? Open in Web Editor NEW
118.0 11.0 25.0 20.78 MB

Home Page: https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017

License: Creative Commons Zero v1.0 Universal

soccer football football-data soccer-data

international_results's Introduction

Context

Well, what happened was that I was looking for a semi-definite easy-to-read list of international football matches and couldn't find anything decent. So I took it upon myself to collect it for my own use. I might as well share it.

Content

This dataset includes 46,673 results of international football matches starting from the very first official match in 1872 up to 2024. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team.

results.csv includes the following columns:

  • date - date of the match
  • home_team - the name of the home team
  • away_team - the name of the away team
  • home_score - full-time home team score including extra time, not including penalty-shootouts
  • away_score - full-time away team score including extra time, not including penalty-shootouts
  • tournament - the name of the tournament
  • city - the name of the city/town/administrative unit where the match was played
  • country - the name of the country where the match was played
  • neutral - TRUE/FALSE column indicating whether the match was played at a neutral venue

shootouts.csv includes the following columns:

  • date - date of the match
  • home_team - the name of the home team
  • away_team - the name of the away team
  • winner - winner of the penalty-shootout
  • first_shooter - the team that went first in the shootout

goalscorers.csv includes the following columns:

  • date - date of the match
  • home_team - the name of the home team
  • away_team - the name of the away team
  • team - name of the team scoring the goal
  • scorer - name of the player scoring the goal
  • own_goal - whether the goal was an own-goal
  • penalty - whether the goal was a penalty

Note on team and country names: For home and away teams the current name of the team has been used. For example, when in 1882 a team who called themselves Ireland played against England, in this dataset, it is called Northern Ireland because the current team of Northern Ireland is the successor of the 1882 Ireland team. This is done so it is easier to track the history and statistics of teams.

For country names, the name of the country at the time of the match is used. So when Ghana played in Accra, Gold Coast in the 1950s, even though the names of the home team and the country don't match, it was a home match for Ghana. This is indicated by the neutral column, which says FALSE for those matches, meaning it was not at a neutral venue.

Acknowledgements

The data is gathered from several sources including but not limited to Wikipedia, rsssf.com, and individual football associations' websites.

Inspiration

Some directions to take when exploring the data:

  • Who is the best team of all time
  • Which teams dominated different eras of football
  • What trends have there been in international football throughout the ages - home advantage, total goals scored, distribution of teams' strength etc
  • Can we say anything about geopolitics from football fixtures - how has the number of countries changed, which teams like to play each other
  • Which countries host the most matches where they themselves are not participating in
  • How much, if at all, does hosting a major tournament help a country's chances in the tournament
  • Which teams are the most active in playing friendlies and friendly tournaments - does it help or hurt them

The world's your oyster, my friend.

Contribute

If you notice a mistake or the results are not updated fast enough for your liking, you can fix that by submitting a pull request.

international_results's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

international_results's Issues

One country with two names

In column city, Teheran and Tehran which refers to one single entity. Update either Teheran or Tehran for consistency

Ponder about awarded results

Some results have been awarded by tournament organizers or whatever. Not actual football match results. Probably 10-20 of these in the database. The score is normally 3-0 and there aren't any goalscorers listed - that's the easiest way to find them.

I wonder if it's appropriate to have them listed next to real results. If the purpose is historic record-keeping, then yes, if it's to calculate teams' strengths, then probably not.

Maybe move them to a separate file? Maybe one day it'll be a good idea to add a comments column?

Check results vs federation websites

If anyone would like to help with the historic accuracy of the dataset, which would be much appreciated, here's a way to do it:

  1. Find a country that you like
  2. Hope that their football association website keeps good records online
  3. Check this database against theirs
  4. Find the differences - this may be extra matches in either of the database, different results, different match locations
  5. Post the differences here or in a pull request
  6. Peace ✌

Note: Any difference doesn't automatically mean the 'official' source is correct. They might count things like Olympic Games results, results against select or B-teams. On the other hand, they might not count matches that took place outside the specific FA's jurisdiction: for example, Argentina had, for a while, two Football Federations. The one that won out, doesn't count the other's results, I do. Or the England FA took longer to restart after WW2 than the England football team so there are a bunch of matches they don't count.

Add managers

Shouldn't be too hard. Just go through it team-by-team.

Penalty shootout without related game?

The shootouts file has the game between Saare County and the Åland Islands on 2011-06-29. However, this date does not contain a game between these two on that same date.

Was this just a stand-alone shootout?

Clarify the first date

In the README it says

This dataset includes 44,066 results of international football matches starting from the very first official match in 1972 up to 2019.

The data in results.csv looks like it starts from 1872. Please can you clarify whether the README is wrong, or if there's been an error with the century part of the dates in the data file.

Last World Cup Wrong Results

hi! great resource. i noticed though the teams played together are incorrect in the end stages of the 2018 World Cup. for example France and Croatia were in the final. thanks!

Should I include Olympic Games football?

Oh, I don't know.

How can I easily separate them from "real" football? Just filtering the tournament "Olympic Games" out is easy enough but I'm guessing they play pre-tournament friendlies too. And, would a random person using the data know to filter these games out? They wouldn't.

Should I include them anyway if I don't even consider it "real" football?

Standardize city names

Currently, some cities exist under different spellings in the dataset e.g. Kiev and Kyïv or Cádiz and Cadiz. One needs to be picked. Possibly whichever one English Wikipedia uses. As a reminder to myself - the preferred spelling for the capital of Ukraine seems to be Kyiv.

Think through non-FIFA football

Nothing wrong with doing some thinking.

So far I've added the following non-FIFA competitions:

  • Island Games
  • ConIFA World Football Cup
  • Viva World Cup
  • ConIFA European Football Cup
  • FIFI Wild Cup
  • ELF Cup
  • UNPO Cup
  • Coupe de l'Outre-Mer
  • KTFF 50th Anniversary Cup
  • Corsica Football Cup

These are all fine and easily excludable but when I'm starting to add friendly matches and teams that are actually good e.g. Catalonia and Basque County, it might scare people.

Possibly add a column indicating FIFA/non-FIFA?

Tournament check

The following tournaments need to be checked against Wikipedia and/or rsssf:

  • African Cup of Nations qualification
  • Caribbean Games
  • Meredeka Cup
  • CECAFA Championship
  • Asian Cup qualification
  • South East Asian Games
  • Palestine Cup of Nations
  • Beijing Friendship Tournament
  • Indian Ocean Island Games
  • South Asian Games
  • Asian Games
  • Copa Amilcar Cabral
  • African Games
  • African Games qualification
  • Panamerican Games
  • Matthews Cup
  • South Pacific Mini Games
  • CEDEAO Cup
  • Copa Mercosur
  • Dakar Tournament
  • Pan Arab Games
  • Black Stars Tournament
  • Carlsberg Cup
  • Shell Caribbean Cup
  • Shell Caribbean Cup qualification
  • Intercontinental Championship
  • Jeux de la Francophonie
  • Copa Ministerio de Vivienda
  • Cup of Ancient Civilizations
  • Afro-Asian Games
  • Torneio Vale do Tejo
  • LG Cup
  • Tiger Cup
  • TIFOCO
  • Coupe de la CEMAC
  • Bolivarian Games
  • King's Cup
  • COSAFA Cup
  • Baltic Cup
  • Arab Cup

Things to look out for:

  • Add missing matches
  • Correct tournament name if a match already exists as a friendly (many such cases)
  • Make sure no B-teams or U-21 etc get through
  • Matches between A-teams in a tournament with non-A-teams are acceptable
  • Add a shootout if it exists
  • Some of these tournaments might not be full A-tournaments at all; in that case, existing matches should be removed from the database; they might be listed as friendlies as well
  • Any other tournament not in the above list can be checked as well; major tournaments are all correct as are ones between non-FIFA nations (though those haven't been checked for penalty shootouts)

Data sources needed

In case anyone happens to read this - is there a decent source for current results? For whatever reason, it seems incredibly hard to find a complete list of games with the necessary information included.

Wrong Date in Matches

2022-19-22,Rwanda,Sudan,0,0,Friendly,Kigali,Rwanda,FALSE ---->>>> Is this match a real one? Both seems to be at same date
2022-19-22,Rwanda,Sudan,1,0,Friendly,Kigali,Rwanda,FALSE

which I believe should be:
2022-11-19

Start thinking about adding goalscorers

Perhaps in a separate file. Perhaps creating some sort of a match id. Perhaps also a player id.

There are around 125k goals to account for so this might take some time. Will need to look into existing databases that are usable. Probably Wikipedia; rsssf might have most of the data but doesn't have consistent formatting. Neither does Wikipedia to be fair. I wonder if I can plug into any commercial feeds for at least the data from recent years...

First shooter in penalty shootouts

The penalty shootouts file now marks the first shooter. Currently, it's rather bare as only the major tournaments had easy-to-find info available. If anyone has a source for any of the missing data, feel free to link or put up a pull request.

Tracking goalscorers

So far all the goalscorers from the following tournaments are included in goalscorers.csv file:

  • FIFA World Cup
  • FIFA World Cup qualification
  • AFC Asian Cup
  • African Cup of Nations
  • Confederations Cup
  • CONMEBOL–UEFA Cup of Champions
  • Copa América
  • Gold Cup (but NOT the preceding CONCACAF Championship)
  • Oceania Nations Cup
  • UEFA Euro
  • UEFA Euro qualification
  • A handful of British Home Championship games that were used as qualifiers as the Euros one year

These total roughly 32% of all the goals in the results.csv file.

Purpose of the city and country columns

Need to think through what these columns are for. If for easiness of plotting maps, country names should be changed to some kind of a standard format for easy joining with coordinate databases. FIFA names aren't probably it e.g. London is in the UK not England.

If plotting is the main purpose then historic names should also be removed. Belgrade being in Yugoslavia, then Serbia and Montenegro and now Serbia is true and a cool bit of trivia for us history buffs but does it help or hinder anyone using the dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.