rippinrobr / baseball-stats-db Goto Github PK
View Code? Open in Web Editor NEWTools to help create and maintain databases based on the Baseball Databank Files
License: Apache License 2.0
Tools to help create and maintain databases based on the Baseball Databank Files
License: Apache License 2.0
Currently, loading the different databases is done in a serial fashion, the files are read and parsed multiple times, and the data is loaded into each db type one at a time. The new process should be
This has a list of all people who have played baseball with others who have had something to do with baseball.
This project should only be backups, schemas and the sqlite db, no binaries. I will move it to https://github.com/rippinrobr/sports-stats-utilities
here was an error while attempting to parse and storethe file /Users/robertrowe/src/baseballdatabank/core/FieldingPost.csv Error: line 105, column 9: strconv.ParseInt: parsing "33.0": invalid syntax
Column 9 is InnOuts which should be an int
Instead of one big release file, I want a tgz file for each of the db types
Hey Rob, great work on this!
I've been trying to make releases locally, but I'm having a few issues.
I'm new to Go, and had a bit of trouble of getting everything installed properly (go get ./internal/... ./cmd/databank-dbloader/... ./cmd/retrosched-dbloader/... ./cmd/retrogl-dbloader/...
finally got me where I needed to be).
Additionally, I've had to alter -inputdir
arguments in the Makefile to utilize environment variables so I can customize where data sources live on my machine.
I still haven't successfully compiled a release yet, but the further I get into this, I'm thinking that Dockerizing this tool might be useful for getting anyone up and running quickly. This would allow for a portable Go installation, -inputdir
values could all be relative to installation paths within the Docker container, and db types could even have their own Docker image (sqlite, postgres, mysql, mongodb).
I'm willing to put some time into this and work on PR, but wanted to get your thoughts first.
Since I am now using my csv-to project to load the data, I want this project to contain just the schema files, database backups, and docker files. The proposed structure is:
/baseballdatabank
/schemas
/backups
/dockerfiles
/retrosheet
/schemas
/backups
/dockerfiles
Any new data sources I add would have the similar structure. Each supported database will have its own schema, and dockerfile.
I want this repo to only be concerned with schemas, database backups, and perhaps helpful scripts
Truncate calls are erring out due to the foreign key relationships.
Need to automate the following processes
First step in parallel processing the data is to add the ability to pass in more than one db to load per run plus the ability to pass in all as an option. Just need to change code around db connections.
relates to #23
There was an error while attempting to parse and storethe file /Users/robertrowe/src/baseballdatabank/core/FieldingOFsplit.csv Error: line 25848, column 10: strconv.ParseInt: parsing "29.0": invalid syntax
Column 10 is the putouts column which is an int and not a float.
A comma delimited list of file names to be parsed and stored
If someone runs the db_loader without running the schema scripts first they receive an error similar to this:
2018/06/10 04:48:21 Insert error: no such table: people
[followed by 19477 more "Insert error: no such table: people" errors]
I need to write out an error like the ones the rust compiler creates, something that prints out the error above but followed by 'Have you loaded the schema file for your database? Schemas can be found in the 'schemas directory' or in the release files.
Just downloaded postgres_databank_backup_2018.01.tgz from the releases page, and ran the contained SQL. the latest yearID I was able to find in the batting and pitching tables is 2017. The post on the releases page says it had been updated to include 2018 stats. Any chance you can update the release to include 2018 stats?
Thanks.
Schema only files haven't been updated for a bit. Need to get them updated.
2018/03/30 10:30:20 There was an error while attempting to parse and storethe file /Users/robertrowe/src/baseballdatabank/core/Teams.csv Error: line 11, column 23: strconv.ParseInt: parsing "53.0": invalid syntax
That is the SB column which should be an integer, there is no way to steal a fraction of a base.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.