Giter Site home page Giter Site logo

rippinrobr / baseball-stats-db Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 2.0 164.61 MB

Tools to help create and maintain databases based on the Baseball Databank Files

License: Apache License 2.0

Makefile 5.44% Go 94.56%
baseball-databank baseball-statistics baseball-players baseball-data

baseball-stats-db's People

Contributors

rippinrobr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

dtpoole cbwinslow

baseball-stats-db's Issues

Parallelize the DB Loading

Currently, loading the different databases is done in a serial fashion, the files are read and parsed multiple times, and the data is loaded into each db type one at a time. The new process should be

  • Read the files once
  • the data should be passed to each DB type's worker who is responsible for loading the data into the DB
  • Should add a verbose flag so I only print out the progress in the way I do it now when I need it

Build in Docker Environment

Hey Rob, great work on this!

I've been trying to make releases locally, but I'm having a few issues.

I'm new to Go, and had a bit of trouble of getting everything installed properly (go get ./internal/... ./cmd/databank-dbloader/... ./cmd/retrosched-dbloader/... ./cmd/retrogl-dbloader/... finally got me where I needed to be).

Additionally, I've had to alter -inputdir arguments in the Makefile to utilize environment variables so I can customize where data sources live on my machine.

I still haven't successfully compiled a release yet, but the further I get into this, I'm thinking that Dockerizing this tool might be useful for getting anyone up and running quickly. This would allow for a portable Go installation, -inputdir values could all be relative to installation paths within the Docker container, and db types could even have their own Docker image (sqlite, postgres, mysql, mongodb).

I'm willing to put some time into this and work on PR, but wanted to get your thoughts first.

Restructure the project

Since I am now using my csv-to project to load the data, I want this project to contain just the schema files, database backups, and docker files. The proposed structure is:

/baseballdatabank
           /schemas 
           /backups
           /dockerfiles
/retrosheet 
           /schemas
           /backups
           /dockerfiles

Any new data sources I add would have the similar structure. Each supported database will have its own schema, and dockerfile.

Create a friendlier error when schema isn't present.

If someone runs the db_loader without running the schema scripts first they receive an error similar to this:

2018/06/10 04:48:21 Insert error: no such table: people
[followed by 19477 more "Insert error: no such table: people" errors]

I need to write out an error like the ones the rust compiler creates, something that prints out the error above but followed by 'Have you loaded the schema file for your database? Schemas can be found in the 'schemas directory' or in the release files.

latest baseball databank release doesn't include 2018 stats

Just downloaded postgres_databank_backup_2018.01.tgz from the releases page, and ran the contained SQL. the latest yearID I was able to find in the batting and pitching tables is 2017. The post on the releases page says it had been updated to include 2018 stats. Any chance you can update the release to include 2018 stats?

Thanks.

Update Schema files

Schema only files haven't been updated for a bit. Need to get them updated.

Loading 2017 baseball databank Teams.csv file throws an error

2018/03/30 10:30:20 There was an error while attempting to parse and storethe file /Users/robertrowe/src/baseballdatabank/core/Teams.csv Error: line 11, column 23: strconv.ParseInt: parsing "53.0": invalid syntax

That is the SB column which should be an integer, there is no way to steal a fraction of a base.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.