Giter Site home page Giter Site logo

transfers's Introduction

transfers

summer update winter update

Data on European football clubs' player transfers, as found on Transfermarkt, since the 1992/93 season.

IMPORTANT: As of July 2022, transfer fees are now in EUR (taken from the .com Transfermarkt site), not GBP (taken from the .co.uk Transfermarkt site) due to a distortion of older fees being converted with a recent exchange rate.

Contents

Data

Transfers can be found in the data/ directory, in .csv format. There's a file for each of these leagues:

  • English Premier League (premier-league.csv)
  • English Championship (championship.csv)
  • French Ligue 1 (ligue-1.csv)
  • German 1.Bundesliga (1-bundesliga.csv)
  • Italian Serie A (serie-a.csv)
  • Spanish La Liga (primera-division.csv)
  • Portugese Liga NOS (liga-nos.csv)
  • Dutch Eredivisie (eredivisie.csv)
  • Russian Premier Liga (premier-liga.csv)

Common variables:

Header Description Data Type
club_name name of club text
player_name name of player text
position position of player text
club_involved_name name of secondary club involved in transfer text
fee raw transfer fee information text
transfer_movement transfer into club or out of club? text
transfer_period transfer window (summer or winter) text
fee_cleaned numeric transformation of fee, in EUR millions numeric
league_name name of league club_name belongs to text
year year of transfer text
season season of transfer (interpolated from year) text
country country of league text

Config

  • config/league-meta.csv - leagues to scrape

New leagues can be exported by adding rows to the config/league-meta.csv file. Required columns are league_name, league_id and country. To get league_name and league_id, extract these from the transfermarkt URL for any leagues' transfer history:

www.transfermarkt.com/{LEAGUE_NAME}/transfers/wettbewerb/{LEAGUE_ID}/

When scraping new leagues, it is recommended to remove all the leagues that data has already been downloaded from (i.e. only include new leagues). This cuts on processing time and stress on servers.

Code

R:

  • src/scrape-summer.R: retrieves latest summer window's data and appends new observations to CSVs in data/
  • scr/scrape-winter.R: retrieves latest winter window's data and appends new observations to CSVs in data/
  • src/scrape-history.R: retrieves transfer history by league and exports to CSVs in data/
  • src/functions.R: local R functions used elsewhere

Source

All squad data was scraped from Transfermarkt, in accordance with their terms of use.

transfers's People

Contributors

actions-user avatar ewenme avatar imgbotapp avatar tim-hy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

transfers's Issues

Player Date of Birth's possible?

This is quite an amazing resource that I would like to use to fill a sports database i run (www.thesportsdb.com). We are trying to create an entire history of all sports.

Is it possible to scrape the players date of birth at all and insert into the CSV? It would really help me write some import scripts. Position and country would be even more awesome ;)

Thanks again for your work on this.

Encoding Issue?

Hi!

First of all, thanks for this amazing data.

I'm having a problem with the encoding of this data reading the url of the raw files, I can't read it with utf-8 or Latin-1, I am doing something wrong i think or there is some specific encoding for this data (2019 files)?

Thanks in advance!

Forbidden 403 error

Hi, nice little tool you've built here.

I get the following error when GET() is called:

Error in read_html.response(.) : Forbidden (HTTP 403).

I've tried passing my transfermarkt credentials to GET using authenticate() but I still get the 403. I get this error no matter what league I try to pull from.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6.8

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] httr_1.4.4       glue_1.6.2       stringr_1.5.0    readr_2.1.4      janitor_2.2.0    purrr_1.0.2.9000
[7] dplyr_1.1.2      rvest_1.0.3     

loaded via a namespace (and not attached):
 [1] xml2_1.3.3       magrittr_2.0.3   hms_1.1.2        bit_4.0.5        tidyselect_1.2.0
 [6] timechange_0.2.0 R6_2.5.1         rlang_1.1.1      fansi_1.0.4      tools_4.2.1     
[11] parallel_4.2.1   vroom_1.6.1      utf8_1.2.3       cli_3.6.1        withr_2.5.0     
[16] ellipsis_0.3.2   bit64_4.0.5      tibble_3.2.1     lifecycle_1.0.3  crayon_1.5.2    
[21] tzdb_0.3.0       vctrs_0.6.3      curl_5.0.0       snakecase_0.11.0 stringi_1.7.12  
[26] compiler_4.2.1   pillar_1.9.0     generics_0.1.3   lubridate_1.9.2  pkgconfig_2.0.3 

Let `club_involved_name` be easily linkable

First of all, thanks a lot for this incredible dataset.
However, I find a small flaw in it: the club_involved_name feature contains club names as written in the text of the correspondent Transfermarkt entry. However, these names are often inconsistent with the names in club_name. Having the same names on both columns would ease the analysis of the data - e.g., allowing to join the involved club name with its own league, to study flows between leagues.
In Transfermarkt, the name to use is the title of the very same a HTML tag, should be an easy fix. I'd love to help with a pull request, but I had a look at the source code and R is out of my league. In the future I could think of proposing a Python alternative to scrape the data.
Again, congratulation on such a useful repo.

Updated Data

Do you happen to know how to particularly get the JSON link for the newer Guardian football transfer collections? There doesn't seem to be one nice easily collectable set like the new Guardian ones and I was hoping you could help me out with that.

Pound exchange rate issue on Transfermarkt

Hi! I was using the data to try to visualise the market growth based on transfers.

I made a post on reddit with the project and turns out the Transfermarkt reported pound fees are converted back from euros with a recent exchange rate, rendering the older fees off by over 30%.

I would like to finish my project but have 0 skills in scraping websites. Would be sweet if these same values could be extracted from their .com site euro values as they seem far more accurate than the pound values on the uk site!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.