Giter Site home page Giter Site logo

chadwick's People

Contributors

jerryword avatar tturocy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chadwick's Issues

cwevent does not correctly mark PIT_ID in instances where the starting pitcher pitches to the first batter but is then removed during the first batter

For Steven Matz start on 22 May 2022 (PIT202205220), cwevent generates a PIT_ID and RESP_PIT_ID for Angel Rondon for Event 7, when Ke'Bryan Hayes strikes out. The event data shows Matz throwing several pitches before being removed from the game.

id,PIT202205220
version,2
.. removed some lines...
start,edmat001,"Tommy Edman",0,1,6
start,gormn001,"Nolan Gorman",0,2,4
start,arenn001,"Nolan Arenado",0,3,10
start,yepej001,"Juan Yepez",0,4,3
start,donob001,"Brendan Donovan",0,5,9
start,sosae001,"Edmundo Sosa",0,6,5
start,dickc002,"Corey Dickerson",0,7,7
start,kniza001,"Andrew Knizner",0,8,2
start,badeh001,"Harrison Bader",0,9,8
start,matzs001,"Steven Matz",0,0,1
start,hayek001,"Ke'Bryan Hayes",1,1,5
start,reynb001,"Bryan Reynolds",1,2,8
start,chavm001,"Michael Chavis",1,3,3
start,gameb001,"Ben Gamel",1,4,7
start,castd004,"Diego Castillo",1,5,9
start,tsuty001,"Yoshi Tsutsugo",1,6,10
start,castr006,"Rodolfo Castro",1,7,6
start,vanmj001,"Josh VanMeter",1,8,4
start,heint001,"Tyler Heineman",1,9,2
start,wilsb003,"Bryse Wilson",1,0,1
play,1,0,edmat001,12,BFFX,S9/G4
play,1,0,gormn001,12,CBS>B,SB2
play,1,0,gormn001,22,CBS>B.FFX,D8/L8XD+.2-H
play,1,0,arenn001,00,X,9/F89XD
play,1,0,yepej001,12,CBSS,K
play,1,0,donob001,01,CX,7/F7L
play,1,1,hayek001,22,CFBB,NP
sub,ronda001,"Angel Rondon",0,0,1
play,1,1,hayek001,22,CFBB.T,K
play,1,1,reynb001,32,BBBCFB,W

No output for any year using cwevent

When I run the following command from command prompt on windows 11

start cwevent -y 2023 -f 0-96 2023*.EV* > all2023.csv

I get the file I requested and no errors but the file is 0KB and contains nothing. I have all the .ROS, .EV* files for 2023 unzipped and in the working directory. I also have the executable for cwevent in the same working directory as well. Any suggestions or insights on how to get an output would be appreciated.

batter/runner fate code 7 for scoreless inning

I think I found one for real this time. chadwick-0.10.0. Events 79-81 of game ATL202205150

Bottom of the 10th. The automatic runner is out on a leadoff FC. After a strikeout, Olson singles, goes to 2nd on a defensive indifference, and is stranded when Riley lines out. No runs scored in the inning. The batter/runner fate code is 7 for Olson for all 3 events he is involved in.

Error-checking in boxscore generation from boxscore event files

In cwlib/box.c, no checking is done to confirm that the number of data items anticipated actually are present. This leads to crashes on the rare occasions when there are formatting errors in the boxscore event files.

Checks should be added to confirm the number of data items is what is expected, and raise an error and terminate if not.

Clean up data model for boxscores

The boxscore data structure (see src/cwlib/box.h) has one collection of records for batting and fielding, and a separate collection for pitching. While having the separate linked list of pitching stints in order is useful, it would also be useful to have a direct link between CWBoxPlayer entries and corresponding CWBoxPitching entries.

Note that a bit of care may need to be taken in the case of a player who pitches, moves to another position, and then returns to pitcher.

FEAT: Command line switch for path to data

All of the command-line tools assume the input data are all in the current working directory.

For some workflows, it could be convenient to have the tools instead access data from a different directory than cwd.

This would be implemented as a command-line option. It should apply across all of the tools (so the place to start is cwtools.c).

Consider minimising copying of runner data

From 6f0f040 the implementation of tracking runner data has been reorganised, with a separate data structure collecting the data by base.

At present a lot of copying is done:

  • Copying of the IDs of runner, pitcher, and catcher into the structure;
  • Copying of data between bases (instead of manipulation).

In principle, it should be possible instead for the game state to use pointers to refer back to strings allocated in the original game, rather than allocating strings and copying them, as the game state only makes sense in reference to a game, and it would be an error to deallocate a game and then use a game state after the game has been deallocated.

This would also involve needing to change lineups from copying to pointer manipulation as well.

This should be straightforward to accomplish, but will require some testing to be careful of memory management. It would also have as a side effect the benefit of removing the fixed-width buffers (of 50 characters) for baserunner/pitcher/catcher IDs, which is one of the few places there's any sort of fixed-width restriction in the codebase.

This might also have some performance benefits, as it would eliminate a fair amount of copying as well as in some places allocation/deallocation, although the library is fast enough as is that the benefits of being slightly faster ought to be marginal. So this is very much a rainy-day sort of project.

Enhancement for a field to indicate the BAT_ID and PIT_ID at the start of the event

Enhancement for a field to indicate the BAT_ID and PIT_ID at the start of the event

Both will have value, and the PIT_ID version would address a situation where it is impossible to derive the correct Games Started for a player using the resulting data from cwevent.

Create a field that identifies the pitcher at the start of an event so that one can differentiate between a game like Blake Snell (ARI202204100) where he was not credited with a start by MLB and Steven Matz start on 22 May 2022 (PIT202205220) where he was credited with a start by MLB.

cwevent generates a PIT_ID and RESP_PIT_ID for Angel Rondon for Event 7 of PIT202205220, when Ke'Bryan Hayes strikes out. The event data shows Matz throwing several pitches before being removed from the game. The generated Event and Game data for PIT202205220 and ARI202204100 makes it impossible to correctly get both Snell and Matz Games Started.

Currently, we have two ways of looking at the data that cwevent and cwgame provide, and each way provides an incorrect result.
If you assume that the Game data indicates the scheduled starter pitched, then Snell will be incorrectly given a start.
If you assume the Event data for the first event versus the Home/Away batters indicates the starting pitcher, then Matz will incorrectly not be credited with a start.

id,PIT202205220
version,2
.. removed some lines...
start,edmat001,"Tommy Edman",0,1,6
start,gormn001,"Nolan Gorman",0,2,4
start,arenn001,"Nolan Arenado",0,3,10
start,yepej001,"Juan Yepez",0,4,3
start,donob001,"Brendan Donovan",0,5,9
start,sosae001,"Edmundo Sosa",0,6,5
start,dickc002,"Corey Dickerson",0,7,7
start,kniza001,"Andrew Knizner",0,8,2
start,badeh001,"Harrison Bader",0,9,8
start,matzs001,"Steven Matz",0,0,1
start,hayek001,"Ke'Bryan Hayes",1,1,5
start,reynb001,"Bryan Reynolds",1,2,8
start,chavm001,"Michael Chavis",1,3,3
start,gameb001,"Ben Gamel",1,4,7
start,castd004,"Diego Castillo",1,5,9
start,tsuty001,"Yoshi Tsutsugo",1,6,10
start,castr006,"Rodolfo Castro",1,7,6
start,vanmj001,"Josh VanMeter",1,8,4
start,heint001,"Tyler Heineman",1,9,2
start,wilsb003,"Bryse Wilson",1,0,1
play,1,0,edmat001,12,BFFX,S9/G4
play,1,0,gormn001,12,CBS>B,SB2
play,1,0,gormn001,22,CBS>B.FFX,D8/L8XD+.2-H
play,1,0,arenn001,00,X,9/F89XD
play,1,0,yepej001,12,CBSS,K
play,1,0,donob001,01,CX,7/F7L
play,1,1,hayek001,22,CFBB,NP
sub,ronda001,"Angel Rondon",0,0,1
play,1,1,hayek001,22,CFBB.T,K

play,1,1,reynb001,32,BBBCFB,W

Incorrect position for pinch-runners in extended cwevent output

The extended cwevent fields reporting the position of baserunners is sometimes incorrect for pinch-runners, reporting a position of 0 instead of 12. This is caused by an interaction with the logic to avoid crediting PH or PR stats when a player comes up again after the team has batted around.

Remove unsafe uses of atoi()

The file-reading routines use atoi() to convert text to integers where integers are expected. This is not totally safe, as the behavior is undefined if the argument is not an integer.

Replace calls to atoi() with a safer approach that raises an error if the argument is not a valid integer.

extra-inning runners missing from runner on base and runner fate fields

XIPR (as Tango calls them) are missing from the Runner on base ID fields, and only show up in the runner fate fields on the event where they actually scored. They show up fine in all the runner destination fields, as far as I can tell.

I haven't seen any complaints about this so I'm suspicious it might be a me problem. I only ran cwevent.

Support for manual reassignment for pitcher responsibility

There are ambiguous cases for assigning pitcher responsibility, where official statistics do not match what BEVENT/cwevent generate.

Implement a system for manually changing responsibility. Presumably this should be done using comments so as not to interfere with possible future DiamondWare extensions.

Extend cwgame to report gametype field

Retrosheet have introduced a new info,gametype field. This will be reported by BGAME as field 84, and the value will be the same text string as in the info record. It will not be included in the default fields for BGAME (becoming the first such field for BGAME).

Pitcher Wins in cwdaily

Hi! Thanks for this awesome library. I'm really enjoying the cwdaily tool - its super fast and the csv output is great. Is it possible to output pitcher wins (ex. "P_WIN") as one of the fields? Or is there another tool I should be using for this stat?

Process new game metadata

The 2020 Retrosheet release includes new metadata for scheduled game length and tiebreaker rules

Propose adding these to cwgame:
innings: SCHED_INN_CT
tiebreaker: TIEBREAK_CD

README.md: installation instructions

The installation instructions are incomplete. They should look like:

  1. sudo apt install libtool-bin
  2. libtoolize
  3. ./ltconfig ltmain.sh
  4. aclocal
  5. autoconf
  6. automake
  7. ./configure
  8. make
  9. sudo make install

Force flag on non-GDP ground double plays

On ground double plays that are not charged as GDP due to runner interference, the force flag in the cwevent extended flags is not being set.

As an example, PHI201205070 has 46(1)3/G/DP/RINT

cwevent not working in ubuntu in WSL

Hello,

I downloaded and installed chadwick-0.1.0 exactly as described in the instructions. However when I run

cwevent -y 2023 -f 0-96 2023*.EV* > all2023.csv

I get the following error:

cwevent: error while loading shared libraries: libchadwick.so.0: cannot open shared object file: No such file or directory

There is no information on this type of error in the Chadwick documentation.

Suppressing warning messages

Is there a way to suppress warning messages generated by the cwtools? cwgame is throwing Warning: Invalid integer value 'unknown' causing my downstream process to fail.

Proper CSV output for tools by escaping quotes

The DiamondWare file format does not permit embedded quotes (") in text strings. However, occasionally one does slip through in files.

Chadwick takes the position that output is undefined for invalid input files, although attempts are made to muddle through as best as possible. In particular, the parser does tolerate such embedded quotes.

However, when there are strings with embedded quotes, these are output by the command-line tools with the quotes not escaped. As a result, the output files are not valid in standard CSV dialects.

It would be nice to do something sensible that resulted in clean CSV files in this case. The most straightforward solution would be to escape any embedded double-quotes on output. However this would require changing code in a substantial number of places. Further, one of the objectives of the command-line tools is that they are fast - any change to implement escaping of embedded double-quotes should not have a significant overhead that would slow output down in the case in which there aren't any - because after all there "shouldn't" be any in well-formatted input files.

Add clear indication of game-starting pitcher in cwbox -S

SportsML (as of 2.2) lacks a mechanism for reporting games started by pitchers - this is an unfortunate omission in that spec.

Once this is resolved, cwbox -S should report the identity of the starting pitchers in the appropriate way. At present one can try to infer this from the player status and player position metadata fields - but this is fragile and may not work in edge cases where pitchers are replaced before facing a batter.

Additional range checking when parsing boxscore event files

When parsing boxscore event files in box.c:cw_box_process_boxscore_file(), range checking should be done that team numbers (0 or 1) or batting order slot numbers or positions are in the valid range. It would be acceptable to terminate the program on an error, as these should not happen.

Extend cwbox -S with fielding stats by position

The SportsML output from cwbox only gives aggregate fielding statistics. This is silly from a baseball perspective, but at the time it was originally written, scoped fielding statistics weren't properly fleshed out in the spec.

Now that there is by-position scoping, the SportsML output should be updated to report fielding stats by position.

generalizing for manual retrosheets

I am looking to use this tool for aggregating data in a non-MLB team.

This may be tricky, how does one record in a sheet an “extra hitter” position. It is equivalent to a fielding position, but the player only bats. If I give this position a unused position number, does it break chadwick? Does a lineup of 10 batters break it?

Need Help with Install

I have a Windows and am getting this when I run baseballr::chadwick_ld_library_path():
Warning: running command '"find" C:/Users/jense/OneDrive/Documents/R/Baseball Analytics/Projects -name "libchadwick*"' had status 1Error in if (!chadwick_find_lib() %in% old_ld_library_paths) { :
the condition has length > 1

baseballr::chadwick_is_installed is TRUE so I don't know what I've done wrong.

Add extended linescore data to boxscore generation

The boxscore data structure should be extended to capture a full linescore, not just with runs per inning but also hits, errors, double plays, and LOB, as well as a function to report the length of the game in innings.

Refactor some functions in cwevent/cwgame for DRY

There are some functions for generating the columns in cwevent and cwgame which arguably violate DRY - for example, one function which applies to each of the three runners on base, which differ only in a parameter (1, 2, 3). These could be re-factored so the logic appears once, and is suitably parameterised.

Because there are only three bases, the benefit from this would be minimal - it would not save much code, and after all there are only three bases so it does not really lead to the generalisation that DRY usually would.

This could be a good mini-project for someone who wants a bit of practice refactoring C code and working within GitHub, to make a contribution to the implementation.

Make -y switch optional in command-line tools

It appears the DiamondWare tools operate correctly even without the -y switch being specified (presumably because it manages to infer the year from the game date).

It would be useful to investigate implementing such a facility, where year would be inferred if -y is not specified.

Problems compliling chadwick 0.9.1 on Linux

I downloaded the source code for 0.9.1 to my Ubuntu system, but it doesn't include a configure file to build the makefiles. I tried generating one with autoconf, but get an error beginning with "configure.ac:28: error: possibly undefined macro: AM_INIT_AUTOMAKE".

I can't seem to locate any documentation for building the package that goes deeper than "./configure; make; make install'. Pointers to docs or assistance in building this version would be greatly appreciated.

Fall 2013 Release Update

I saw that the 2013 fall release came out earlier this week.

@tturocy is there anything I can do to help the release make its way over here?

cwbox team LOB issue for auto runner in extras (2020/2021)

Hey there I'm a developer over at sports reference and was looking into an issue on our boxes where team LOB did not equal what we were seeing on the MLB website for games that have an auto runner on second in extras.

From the looks of it when running cwbox (or any of them I suppose) I don't think we are incrementing gameiter->state->num_auto_runners for either of the teams. I could be wrong though as I never worked with C before and could be missing something. But I do know the end result of LOB is different than MLB's box when running cwbox.

MLB
bref (You can ignore the PBP LOB as it's not related to this issue)

This is the retrosheet ID of the game I've been testing on - ANA202008150

Missing detail in cwbox output for POCS plays

For some plays coded like POCS2(136), cwbox generates a line like this:

<caughtstealing runner="sosas001" pitcher="ruscg001" catcher="osikk001" inning="3" half="1" base="2" pickoff="0"/>

Primarily, Pickoff="0" is inaccurate, but we also don't get any information about the other fielders involved.

I've noticed that this does not happen with every single play coded as POCS2(136) so there must be another factor involved, but I'm not sure what it is. CHN200305070 is a good example game to test with to see the issue with this line (produces the above sample output):

play,6,1,bellm002,00,1,POCS2(136)

Steps to reproduce:

wget https://www.retrosheet.org/events/2003eve.zip
unzip 2003eve.zip CHN2003.ROS MIL2003.ROS 2003CHN.EVN TEAM2003
cwbox -i CHN200305070 -y 2003 -X 2003CHN.EVN

Statistics credits when batter out advancing after strikeout, putout by catcher

By chance I happened upon this play string in a game: K+WP.BX3(E2/TH)(2). The parser does not award a putout to the catcher or a third of an inning to the pitcher. The reason is that the parser specifically suppresses putouts by the catcher on a dropped third strike - except of course in very rare circumstances the catcher winds up making the putout anyway!

I am guessing this may never have happened in the Retrosheet corpus otherwise I assume I would have noticed it in regression testing...

Distinguish 0-0 counts from missing count data

There are 150,000+ plays for which Retrosheet has count data, but no pitch sequence data. For at bats that are at least two pitches long, it is easy to distinguish between completely missing pitch data and partial pitch data. However, because missing ball/strike counts are interpolated with 0, it doesn't seem to be possible to distinguish the ?? case from the 00 case once the data has been processed by cwevent. (See BOS191210090 for a few examples). Would it be possible to add a flag or code indicating the presence/absence of count data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.