Giter Site home page Giter Site logo

fcscrapr's Introduction

Introducing fcscrapR

The goal of fcscrapR is to allow R users quick access to the commentary for each soccer game available on ESPN. The commentary data includes basic events such as shot attempts, substitutions, fouls, cards, corners, and video reviews along with information about the players involved. The data can be accessed in-game as ESPN updates their match commentary. This package was created to help get data in the hands of soccer fans to do their own analysis and contribute to reproducible metrics.

Installation

You can install fcscrapR from github with:

# install.packages("devtools")
devtools::install_github("ryurko/fcscrapR")

Example game scraping

Here’s an example of how to scrape a game using fcscrapR. The workhorse function of the package is scrape_commentary() which takes in a game id. This game id is located in the url for a game, such as the group stage match between Serbia and Costa Rica in the 2018 World Cup: http://www.espn.com/soccer/commentary?gameId=498194

Using this game id, we can easily grab the commentary data frame:

library(fcscrapR)
#> Loading required package: magrittr
srb_crc_commentary <- scrape_commentary(498194)

Check out the documentation for scrape_commentary() for a description of all of the columns in the commentary data:

colnames(srb_crc_commentary)
#>  [1] "game_id"                 "commentary"             
#>  [3] "match_time"              "team_one"               
#>  [5] "team_two"                "team_one_score"         
#>  [7] "team_two_score"          "half_end"               
#>  [9] "match_end"               "half_begins"            
#> [11] "shot_attempt"            "penalty_shot"           
#> [13] "shot_result"             "shot_by_player"         
#> [15] "shot_by_team"            "shot_with"              
#> [17] "shot_where"              "net_location"           
#> [19] "assist_by_player"        "foul"                   
#> [21] "foul_by_player"          "foul_by_team"           
#> [23] "follow_set_piece"        "assist_type"            
#> [25] "follow_corner"           "offside"                
#> [27] "offside_team"            "offside_player"         
#> [29] "offside_pass_from"       "shown_card"             
#> [31] "card_type"               "card_player"            
#> [33] "card_team"               "video_review"           
#> [35] "video_review_event"      "video_review_result"    
#> [37] "delay_in_match"          "delay_team"             
#> [39] "free_kick_won"           "free_kick_player"       
#> [41] "free_kick_team"          "free_kick_where"        
#> [43] "corner"                  "corner_team"            
#> [45] "corner_conceded_by"      "substitution"           
#> [47] "sub_injury"              "sub_team"               
#> [49] "sub_player"              "replaced_player"        
#> [51] "penalty"                 "team_drew_penalty"      
#> [53] "player_drew_penalty"     "player_conceded_penalty"
#> [55] "team_conceded_penalty"   "half"                   
#> [57] "comment_id"              "stoppage_time"          
#> [59] "team_one_penalty_score"  "team_two_penalty_score" 
#> [61] "match_time_numeric"

Can quickly make a chart showing the difference in shot attempts for each team by the outcome:

# install.packages("ggplot2")
library(ggplot2)
srb_crc_commentary %>%
  dplyr::filter(!is.na(shot_result)) %>%
  ggplot(aes(x = shot_by_team, fill = shot_result)) +
  geom_bar() + labs(x = "Team", y = "Count", 
                    fill = "Shot result",
                    title = "Distribution of shot attempts for Costa Rica vs Serbia by result",
                    caption = "Data from ESPN, accessed with fcscrapR") +
  scale_fill_manual(values = c("darkorange", "darkblue", "darkred", "darkcyan")) +
  theme_bw()

Gather game ids

The only function available currently to get game ids is scrape_scoreboard_ids() which pulls the game ids for all soccer matches on ESPN’s soccer scoreboard given a league or tournament. You must use a league or tournament that has an associated url in the league_url_data table provided in fcscrapR:

# install.packages(pander)
league_url_data %>%
  head() %>%
  pander::pander()
name
show all leagues
fifa world cup
uefa champions league
uefa europa league
english premier league
spanish primera división

Table continues below

url
http://www.espn.com/soccer/scoreboard/_/league/all
http://www.espn.com/soccer/scoreboard/_/league/fifa.world
http://www.espn.com/soccer/scoreboard/_/league/uefa.champions
http://www.espn.com/soccer/scoreboard/_/league/uefa.europa
http://www.espn.com/soccer/scoreboard/_/league/eng.1
http://www.espn.com/soccer/scoreboard/_/league/esp.1

Here’s an example of grabbing the World Cup games from June 20th, 2018:

scrape_scoreboard_ids(scoreboard_name = "fifa world cup", 
                      game_date = "2018-06-20") %>%
  pander::pander()
#> Loading required package: XML
#> Loading required package: RCurl
#> Loading required package: bitops
game_id team_one team_two
498185 Portugal Morocco
498184 Uruguay Saudi Arabia
498183 Iran Spain

Acknowledgements

Many thanks to the sports analytics community on Twitter for guiding me to various resources of soccer data. Big thanks to Brendan Kent for pointing me to the commentary data.

fcscrapr's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.