Giter Site home page Giter Site logo

dynastyprocess / data Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 19.0 14.99 GB

An open-data fantasy football repository, maintained by DynastyProcess.com

Home Page: https://dynastyprocess.com

License: GNU General Public License v3.0

fantasy-football fantasy-football-database sports-data

data's Introduction

DynastyProcess.com Data Repository

This repository contains resources and data for the purpose of supporting apps and developers, and is updated via Github Actions on a weekly basis.

Status

weekly-playerids weekly-fantasypros weekly-playervalues

Description

The main files available are:

  • db_playerids.csv
  • db_fpecr.csv.gz and db_fpecr.parquet - use parquet for python/R, it's faster/better!
  • values.csv (and sibling files values-players.csv and values-picks.csv)

Archived

The old database.csv was getting a little unwieldy (80+ columns) so it's been broken down into smaller, more specific pieces.

Other dataframes may be available (and if not, we can direct to potential sources of the data - please open an issue!)

A number of older files were moved into the archives/ folder and are out of date.

Please note: the old db_fpecr.csv file has been gzipped to the csv.gz file due to GitHub size restrictions.

Contributing

Many hands make light work, especially when maintaining open data! Here are some ways you can contribute to this project:

  • You can open an issue if you'd like to request specific data or report a bug/error.

  • While the main files (as listed above) are maintained by an automated script, you can also make pull requests to supporting files (i.e. missing_playerids.csv) which are used to help fill in any gaps.

  • You can buy me a coffee or sponsor this project by donating to help with server costs!


Please note that this project is released with a Contributor Code of Conduct - by participating, you agree to abide by these terms.

data's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

old ecr scrapes

There is some older fantasypros scrapedata that could/should get manipulated into the current structure

[Data Bug] Missing gsis_ids for Tanner Conner and Michael Woods

Describe the data bug
Using load_ff_playerids(), Tanner Conner and Michael Woods don't have gsis-ids, it returns NA for both of them.

** Which file is having trouble?**
db_playerids.csv

Expected data
Tanner Conner should have a gsis_id of 00-0037348 and Michael Woods should have an id of 00-0037300

Key between database.csv and value<X>.csv files.

Recently, the structure of the various value.csv files changed. It used to be the case that they had a "mergename" column that matched the same column in database.csv. However, it is now just "player", and the name used doesn't match the mergename from database.csv or the joined first_name + last_name columns.

I had been using the data in database.csv to link the player values to the mfl_ids. If there is an alternative way to do that, please let me know.

gsis_id 00-0029435 misattributed (Damari/Dennis Johnson)

In the FF player Ids, the 00-0029435 seems duplicated, and is missatributed.

load_ff_playerids() %>% filter(gsis_id == "00-0029435") %>% select(name,gsis_id)
name gsis_id

1 Dennis Johnson 00-0029435
2 Dennis Johnson 00-0029435

You can see here how it shows twice. But when you look for example at 2014 week 1 game for Houston, you can see how they atribute this to D.Johnson.

load_pbp(2014) %>% filter(game_id=="2014_01_WAS_HOU",play_id==83) %>% select(receiver,receiver_id)
receiver receiver_id

1 D.Johnson 00-0029435

Looking at PFR you can see how all teh targets where to Damaris Johnson, not Dennis
https://www.pro-football-reference.com/boxscores/201409070htx.htm
image

[Data Bug]

I found a duplicate gsis_id

Both MFL ids 11724 & 14683 gave a gsis_id of 00-0034641

I think gsis_id of 00-0034641 goes with MFL ID 14683 (Chris Jones)

ras_id and otc_id need to be added to nflreadr::load_ff_playerids()

I noticed that ras_id and otc_id live as variables in the missing_ids.csv file here on dynastyprocess, but are not currently variables in nflreadr::load_ff_playerids(). This file and the function it feeds should mirror each other, yes? Looks like the csv has useful data for the function.

Adding missing ESPN ID's to Player ID's map

remotes::install_github("dfs-with-r/ffespn")

library(ffscrapr)
library(ffespn)
library(tidyverse)

#Build df listing all ESPN position designations at current moment.
espnqblist_2022 <- ffespn_projections(2022, 0, "QB", league_id = espn_league_id_2022) %>%
  select(id:position)
espnrblist_2022 <- ffespn_projections(2022, 0, "RB", league_id = espn_league_id_2022) %>%
  select(id:position)
espnwrlist_2022 <- ffespn_projections(2022, 0, "WR", league_id = espn_league_id_2022) %>%
  select(id:position)
espntelist_2022 <- ffespn_projections(2022, 0, "TE", league_id = espn_league_id_2022) %>%
  select(id:position)
espnpklist_2022 <- ffespn_projections(2022, 0, "K", league_id = espn_league_id_2022) %>%
  select(id:position)
espnpnlist_2022 <- ffespn_projections(2022, 0, "P", league_id = espn_league_id_2022) %>%
  select(id:position)
espndtlist_2022 <- ffespn_projections(2022, 0, "DT", league_id = espn_league_id_2022) %>%
  select(id:position)
espndelist_2022 <- ffespn_projections(2022, 0, "DE", league_id = espn_league_id_2022) %>%
  select(id:position)
espnlblist_2022 <- ffespn_projections(2022, 0, "LB", league_id = espn_league_id_2022) %>%
  select(id:position)
espncblist_2022 <- ffespn_projections(2022, 0, "CB", league_id = espn_league_id_2022) %>%
  select(id:position)
espnslist_2022 <- ffespn_projections(2022, 0, "S", league_id = espn_league_id_2022) %>%
  select(id:position)

espnlist_2022 <- bind_rows(espnqblist_2022,
                           espnrblist_2022,
                           espnwrlist_2022,
                           espntelist_2022,
                           espnpklist_2022,
                           espnpnlist_2022,
                           espndtlist_2022,
                           espndelist_2022,
                           espnlblist_2022,
                           espncblist_2022,
                           espnslist_2022) %>%
  rename(espn_id = id,
         espn_pos = position,
         espn_team = team)

#Update ffscrapr's ESPN ID's by building df w/ missing IDs (usually rookies)
espn_id_adds <- dp_playerids() %>%
  rename(dp_espn_id = espn_id,
         player = name) %>%
  full_join(., y = espnlist_2022,
            by = c("player")) %>%
  select(player,
         position,
         espn_pos,
         team,
         espn_team,
         dp_espn_id,
         espn_id) %>%
  filter(espn_id %in% dp_espn_id == FALSE)

yields 228 updates to ESPN ID's ("OAK" Sam Williams needs no updating). Please add!

Add ESPN ID's for 2023 rookies

Hi guys,

I'd like to give a try at a pull request updating ESPN ID's for the 2023 rookies. I'm noticing that the missing ID's file doesn't have 2023 rookies. Should I submit a Pull Request for db_playerids.csv instead?

[Data Bug] db_playerids.csv 503 error

Describe the data bug
This code chunk:

ffscrapr::dp_playerids()

Returns the following:

Request failed [503]. Retrying in 1.3 seconds...
Request failed [503]. Retrying in 1.7 seconds...
Error: GitHub request failed with error: <503> 

while calling <https://github.com/DynastyProcess/data/raw/master/files/db_playerids.csv>

Which file is having trouble?
Which file?
db_playerids.csv

Expected data
What should this look like?
This should return the csv as a dataframe.

[Data Bug] Chris Herndon incorrect Sleeper ID

Describe the data bug
Chris Herndon's Sleeper ID should be 5009, not 5755

** Which file is having trouble?**
db_playerids.csv

There is an old, deprecated Chris Herndon in Sleeper's player database:

  "5755": {
    "hashtag": "#ChrisHerndon-NFL-FA-0",
    "sport": "nfl",
    "practice_description": null,
    "birth_state": null,
    "team": null,
    "injury_notes": null,
    "last_name": "Herndon",
    "injury_start_date": null,
    "weight": "",
    "fantasy_data_id": 19914,
    "search_first_name": "chris",
    "college": "Miami (Fla.)",
    "age": 22,
    "first_name": "Chris",
    "full_name": "Chris Herndon",
    "search_rank": 9999999,
    "yahoo_id": 900358,
    "birth_date": "1996-02-23",
    "injury_status": null,
    "news_updated": null,
    "position": null,
    "metadata": null,
    "birth_city": null,
    "rotowire_id": null,
    "high_school": null,
    "rotoworld_id": 13228,
    "player_id": "5755",
    "number": 0,
    "birth_country": null,
    "pandascore_id": null,
    "search_last_name": "herndon",
    "years_exp": 0,
    "height": "",
    "active": true,
    "stats_id": null,
    "search_full_name": "chrisherndon",
    "depth_chart_position": null,
    "depth_chart_order": null,
    "sportradar_id": "780a48de-d092-4e87-9c34-8d1b45a154cc",
    "gsis_id": "00-0034766",
    "injury_body_part": null,
    "fantasy_positions": null,
    "status": "Inactive",
    "espn_id": null,
    "practice_participation": null
  }

The Chris Herndon that is actually on rosters is this one:

  "5009": {
    "hashtag": "#ChrisHerndon-NFL-NYJ-89",
    "sport": "nfl",
    "practice_description": null,
    "birth_state": null,
    "team": "NYJ",
    "injury_notes": null,
    "last_name": "Herndon",
    "injury_start_date": null,
    "weight": "253",
    "fantasy_data_id": 19947,
    "search_first_name": "chris",
    "college": "Miami (FL)",
    "age": 24,
    "first_name": "Chris",
    "full_name": "Chris Herndon",
    "search_rank": 290,
    "yahoo_id": 31077,
    "birth_date": "1996-02-23",
    "injury_status": null,
    "news_updated": 1593620408253,
    "position": "TE",
    "metadata": null,
    "birth_city": null,
    "rotowire_id": 12899,
    "high_school": null,
    "rotoworld_id": 13228,
    "player_id": "5009",
    "number": 89,
    "birth_country": null,
    "pandascore_id": null,
    "search_last_name": "herndon",
    "years_exp": 2,
    "height": "6'4\"",
    "active": true,
    "stats_id": 832080,
    "search_full_name": "chrisherndon",
    "depth_chart_position": "TE",
    "depth_chart_order": 1,
    "sportradar_id": "780a48de-d092-4e87-9c34-8d1b45a154cc",
    "gsis_id": " 00-0034766",
    "injury_body_part": null,
    "fantasy_positions": [
      "TE"
    ],
    "status": "Active",
    "espn_id": 3123050,
    "practice_participation": null
  }

db_playerids.csv missing Sleeper IDs - contributing?

I'm not sure how you collate the player IDs, but I've got some contributions to make if that's possible:


Chris Herndon seems to have duplicate IDs in Sleeper's database: 5009 and 5755 - 5755 is a free agent, 5009 looks to be the real Chris Herndon

Anthony Gordon (MFL ID 14787) missing Sleeper ID: 6898

Marquez Callaway (MFL ID 15034) missing Sleeper ID: 6989

Mike Warren (MFL ID 14816) missing Sleeper ID: 6992

Benny Snell (MFL ID 14072) missing Sleeper ID: 6156

Quartney Davis (MFL ID 14856) missing Sleeper ID: 6879

Salvon Ahmen (MFL ID 14811) missing Sleeper ID: 6918

Jeff Thomas (MFL ID 14866) missing Sleeper ID: 7076

JaMycal Hasty (MFL ID 14821) missing Sleeper ID: 6996

Patrick Taylor (MFL ID 14817) missing Sleeper ID: 6963

Thaddeus Moss (MFL ID 14869) missing Sleeper ID: 6919


I have an app that's using the awesome .csv to help analyse some rosters. Depending on how you create the csv, maybe I could contribute these in a more automated way - or maybe not! Have a chat on Twitter DMs if you want?

Adding player IDs from nflscrapR, nflfastR, and Pro Football Focus

I'm loving the map of player IDs in the mfl_players() function. Wondering if a few ID columns could get added/mapped:

nflscrapR / (I think this is old NFL API). Example: Kyle Orton = 00-0023541
nflfastR / (new NFL API). Example: Kirk Cousins = 32013030-2d30-3032-3936-303492e9d55e

You can find their code easily on GitHub

fantasypros history csv

Hey there, found my way to your git via https://www.reddit.com/r/DynastyFF/comments/gzzaa1/question_for_all_you_data_driven_folks_do_any_of/

I looked through your database file (https://github.com/DynastyProcess/data/blob/master/files/database.csv) and that's honestly pretty darn close...

  • how often is this refreshed and can we get the 2020 draft class in this data set?

  • are you able to get ADP from different sites under different constraints? (aka league size/league scoring filters ... espn, mfl, yahoo, sleeper, etc???) and are they broken down by site?

  • It looks like you have aggregated consensus rankings from fantasy pros... are you able to extract the individual rankings that create those overall/info from each site?

Here's what I'm trying to do-
I don't need any of the statistics, b/c i'm just building a draft board that has my own rankings but the 40-times and stuff is great .... one of the things i'd like to do is build out an aggregate ADP using info from all of the sites and look at consensus rankings for (where possible) both redraft and dynasty. I have my own weighting system (i weight different sites differently for example) so i have my draft board sheet in the workbook - i then want to build out a data source page with this information - i would either grab the data 1x (a week before my draft) or hopefully be able to update that data over time as it changes.

Let me know if that makes sense and if what i'm looking for is possible from your data set?

Nick Williams & Randy Gregory espn_id contradiction

I think the Randy Gregory and Nick Williams espn_id variables might be wrong in dp_playerids(). I'm working on updating the espn_id and using the ffespn package, the only IDs that aren't matching are these two players. Not sure which is right/wrong.

remotes::install_github("dfs-with-r/ffespn")
remotes::install_github("ffverse/ffscrapr", ref = "dev")
remotes::install_github("nflverse/nflreadr")

library(tidyverse)
library(ffespn)
library(ffscrapr)
library(nflreadr)

espn_list <- bind_rows(ffespn_projections(2021, 0, "QB") %>% select(-notes),
                       ffespn_projections(2021, 0, "RB") %>% select(-notes),
                       ffespn_projections(2021, 0, "WR") %>% select(-notes),
                       ffespn_projections(2021, 0, "TE") %>% select(-notes),
                       ffespn_projections(2021, 0, "K") %>% select(-notes),
                       ffespn_projections(2021, 0, "P") %>% select(-notes),
                       ffespn_projections(2021, 0, "DT") %>% select(-notes),
                       ffespn_projections(2021, 0, "DE") %>% select(-notes),
                       ffespn_projections(2021, 0, "LB") %>% select(-notes),
                       ffespn_projections(2021, 0, "CB") %>% select(-notes),
                       ffespn_projections(2021, 0, "S") %>% select(-notes)
) %>%
  select(new_espn_id = id,
         name = player,
         team) %>%
  mutate(name = clean_player_names(name),
         team = clean_team_abbrs(team))

updated_playerids <- dp_playerids() %>%
  mutate(name = clean_player_names(name),
         team = clean_team_abbrs(team)) %>%
  left_join(.,
            espn_list,
                   by = c("name", "team")
  )

espn_match <- updated_playerids %>%
  select(name,
         team,
         position,
         espn_id,
         new_espn_id)
view(espn_match)

check <- espn_match %>%
  filter(!is.na(espn_id) & !is.na(new_espn_id) & espn_id != new_espn_id)

view(check)

Rotoworld uses three types of IDs

Thanks for creating this database, it's very helpful. I've only looked into this a little, but it looks like Rotoworld uses different IDs for the player's page URL, the API calls to get news about the player, and the player's profile image. The ID listed in this database appears to be the profile image ID. It would be helpful to have all three, depending on what the user's goal is. If you're going to choose only one, I might recommend the player URL ID, because you can get the other two from there.

For example, for Aaron Rodgers the URL is: https://www.rotoworld.com/football/nfl/player/7815/aaron-rodgers
The ID for his profile image (the one used in DynastyProcess) is 3118
The ID used for searching news articles is 39071

[Data Bug]

Describe the data bug
Weekly FantasyPros (FP) ranks not updated since January

** Which file is having trouble?**
fp_latest_weekly.csv
fp_latest_weekly.rds

Expected data
Updated with the fantasypros week 1 ranks for 2022

[Data Bug] gsis_id not updating automatically

Describe the data bug
There are many gsis_id values missing for new players like Ahmad Gardner in nflreadr::load_ff_playerids(), but these same gsis_id values are available via nflreadr::load_player_stats(). It seems like nflreadr::load_ff_playerids() should load gsis_id automatically from nflreadr::load_player_stats() automatically so as to allow for joins to other ID variables.

[Data Bug] Weekly playerids failing

Describe the data bug
Weekly job for updating playerids is failing. Looks to not have successfully completed since last month. The effect is that some players are associated with the wrong team.

** Which file is having trouble?**
Which file?
db_playerids.csv

Expected data
What should this look like?
db_playerids.csv updates weekly and gets accurate team info linked for all players.

Duplicate gsis_id in load_ff_playerids

In load_ff_playerids, there are four cases of duplicate gsis_id's. The one I have researched the most is two rows where gsis_id == '00-0016098', both listed as Fred Taylor, with different birthdays and some other data.

I cant find almost any information on the 2005 Fred Taylor (like this table says drafted 32nd pick of the 7th round, but was not according to the 2005 draft I looked at). The one thing I found using the mfl_id listed was this on MFL that matches the given information (like the incorrect draft spot): https://www72.myfantasyleague.com/2000/player?L=0&P=8058. Though appears I got lucky randomly throwing in 2000 as the year while seeing if you could look up players by id this way on MFL as its the only one I was able to get it to show up at -4 years of experience.

I don't know how this list is maintained and if its done automatically so the above site existing means this row will exist, but thought I'd bring attention to it. And figured at the very least the 00-0016098 gsis_id probably shouldn't belong to this row.

The other duplicates are for 00-0019641, 00-0020270, and 00-0029435
Maybe not super useful, but a reproducible list of the duplicates:

nflreadr::load_ff_playerids() %>% 
    filter(!is.na(gsis_id)) %>% 
    group_by(gsis_id) %>% 
    summarize(n = n()) %>% 
    filter(n > 1)

00-0019641: I believe the incorrect one is the 1990 draft year row
00-0020270: I believe the incorrect one is the 1985 draft year row
Like the Fred Taylor one, both above have less info for that row and doesn't show up in the year it says they were drafted

00-0029435 is the only one of these that it seems that both rows are real players. Both have PFR pages, but I think that they were incorrectly given the same gsis_id and pff_id. I believe the 2013 draft_year row should be gsis_id '00-0030236' based on there being rushing play by play data for a RB Dennis Johnson in 2013 with that gsis_id

[Data Request] Missing Tony Jones Jr.

Love your trade calculator for use with my Dynasty league. I was looking at a trade involving the new Saints #2 RB, Tony Jones Jr., and couldn't find him in the system. Is that an oversight or am I doing something wrong? Thanks!

[Data Bug] Players missing from the mapping

Describe the data bug
Several players are not joined in the ID mapping file

** Which file is having trouble?**
https://github.com/dynastyprocess/data/blob/master/files/db_playerids.csv

Expected data
What should this look like?

player pos team age draft_year
Saquon Barkley RB NYG NA NA
Courtland Sutton WR DEN NA NA
Dak Prescott QB DAL NA NA
Blake Jarwin TE DAL NA NA
O.J. Howard TE TBB NA NA
Tarik Cohen RB CHI NA NA
Tyrell Williams WR LVR NA NA
JJ Arcega-Whiteside WR PHI NA NA
Marlon Mack RB IND NA NA
Adam Trautman TE NOS NA NA
Cole Kmet TE CHI NA NA
James Proche WR BAL NA NA
Jalen Hurts QB PHI NA NA
Albert Okwuegbunam TE DEN NA NA
Jordan Love QB GBP NA NA
Damien Williams RB KCC NA NA
Harrison Bryant TE CLE NA NA

[Data Request]

Describe the data you'd like to have
I'm looking for historical ECR or ADP values. I know how to find the current year's, but is there a way to get last year? Or the year before? Do you keep that kind of archive and if so how far back does it go?

A snapshot at some point before the start of the season would be fine.

Thanks!

[Data Bug] FantasyPros mixing in old rankings

Describe the data bug
The default view of FantasyPros dynasty rankings mixes in some really old rankings (from this summer or farther), "take locking" value

Which file is having trouble?
Player-Values.csv

Expected data
Taylor and CEH were booted from top 15 dynasty assets as an example. Michael Gallup is now worth overall double what he was two days ago etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.