Giter Site home page Giter Site logo

matsonj / nba-monte-carlo Goto Github PK

View Code? Open in Web Editor NEW
382.0 12.0 59.0 43.19 MB

Monte Carlo simulation of the NBA season, leveraging dbt, duckdb and evidence.dev

Home Page: https://www.mdsinabox.com

License: MIT License

Makefile 8.33% Dockerfile 3.76% Python 52.84% Svelte 4.75% PLpgSQL 30.33%

nba-monte-carlo's Introduction

Current progress: "Serverless BI"

The latest version of the project is available at mdsinabox.com. The website embraces the notion of "Serverless BI" - the pages are built asynchronously with open source software on commodity hardware and then pushed to a static site. The github action that automatically deploys the site upon PR can be found here.

MDS-in-a-box

This project serves as end to end example of running the "Modern Data Stack" on a single node. The components are designed to be "hot swappable", using makefile to create clearly defined interfaces between discrete components in the stack. It runs in many enviroments with many visualization options. In addition, the data transformation documentation is self hosted on github pages.

Many Environments

It runs practically anywhere, and has been tested in the environments below.

Operating System Local Docker Devcontainer Docker in Devcontainer
Windows (w/WSL) n/a
Mac (Ventura)
Linux (Ubuntu 20.04)

Beautiful serving layer

1 2 3
image image image

It can also be explored live at mdsinabox.com.

Getting Started

Building MDS-in-a-box in Github Codespaces

Want to try MDS-in-a-box right away? Create a Codespace:

image

You can run in the Codespace by running the following command:

make build run

You will need to wait for the pipeline to run and Evidence configuration to complete. The 4-core codespace performs signifcantly better in testing, and is recommended for a better experience.

Once the build completes, you can access the Evidence dashboard by clicking on the Open in Browser button on the Ports tab: image and log in with the username and password: "admin" and "password".

Codespaces also supports "Docker-in-docker", so you can run docker inside the codespace with the following command:

make docker-build docker-run-evidence

Building MDS-in-a-box in Windows

  1. Create your WSL environment. Open a PowerShell terminal running as an administrator and execute:
wsl --install
  • If this was the first time WSL has been installed, restart your machine.
  1. Open Ubuntu in your terminal and update your packages.
sudo apt-get update
  1. Install python3.
sudo apt-get install python3.9 python3-pip python3.9-venv
  1. clone the this repo.
mkdir my_projects
cd my_projects
git clone https://github.com/matsonj/nba-monte-carlo.git
# Go one folder level down into the folder that git just created
cd nba-monte-carlo
  1. build your project
make build run

Make sure to open up evidence when prompted (default location is 127.0.0.1:8088). The username and password is "admin" and "password".

Using Docker

You can build a docker container by running:

make docker-build

Then run the container using

make docker-run-evidence

These are both aliases defined in the Makefile:

docker-build:
	docker build -t mdsbox .

docker-run-evidence:
	docker run \
		--publish 8088:8088 \
		--env MDS_SCENARIOS=10000 \
		--env MDS_INCLUDE_ACTUALS=true \
		--env MDS_LATEST_RATINGS=true \
		--env MDS_ENABLE_EXPORT=true \
		--env ENVIRONMENT=docker \
		mdsbox make run serve

Notes on Design Choices

DuckDB as compute engine

Using DuckDB keeps install and config very simple - its a single command and runs everywhere. It also frankly covers for the sin of building a monte carlo simulation in SQL - it would be quite slow without the kind of computing that DuckDB can do.

Postgres was also considered in this project, but it is not a great pattern to run postgres on the same node as the rest of the data stack.

Using Parquet instead of a database

This project leverages parquet in addition to the DuckDB database for file storage. This is experimental and implementation will evolve over time - especially as both the DuckDB format continues to evolve and Iceberg/Delta support is added to DuckDB.

External Tables

dbt-duckdb supports external tables, which are parquet files exported to the data_catalog folder. This allows easier integration with Rill, for example, which can read the parquet files and transform them directly with its own DuckDB implementation.

What's next?

To-dos

  • clean up env vars + implement incremental builds
  • submit your PR or open an issue!

Source Data

The data contained within this project comes from pro football reference, sports reference (cfb), basketball reference, and draft kings.

nba-monte-carlo's People

Contributors

aaronsteers avatar alex-monahan avatar archiewood avatar charliermarsh avatar gregwdata avatar matsonj avatar mermelstein avatar mycaule avatar pedramnavid avatar richardscottoz avatar tayloramurphy avatar ud3sh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nba-monte-carlo's Issues

duckdb version issue on superset using docker

Environment: MacOS Ventura 13.4.1

Command: make docker-run-superset

Description: When I run the docker container for the superset visuals, I get the error shown below in the dashboard. I tried setting the duckdb-engine version to 0.7.1 in the meltano.yml file, but it still didn't work.
Screenshot 2023-07-24 at 12 18 49 PM

IOError from DuckDB when run with Superset

First of all, this is a great project! However, I ran into a minor error on my local machine.

I cloned the project and run it with make docker-run-superset. Everything went well until I visit the dashboard in Superset. Then I got the following error:

Error: (duckdb.IOException) IO Error: Trying to read a database file with version number 51, but we can only read version 43.
The database file was created with an newer version of DuckDB.

The storage of DuckDB is not yet stable; newer versions of DuckDB cannot read old database files and vice versa.
The storage will be stabilized when version 1.0 releases.

For now, we recommend that you load the database file in a supported version of DuckDB, and use the EXPORT DATABASE command followed by IMPORT DATABASE on the current version of DuckDB.

See the storage page for more information: https://duckdb.org/internals/storage
(Background on this error at: http://sqlalche.me/e/13/e3q8)

Definitely a backward incompatibility with DuckDB but I don't know what DuckDB version I should use.

Rework how parquet export is orchestrated.

Today there is a pretty janky parquet integration. What I think is a better way to do this is something like the following:

  1. use target-parquet in meltano, and remove target-duckdb. This allows a cleaner interface into duckdb, and also hedges against stability problems with the duckdb database format.
  2. add a dbt tag (?) to identify models to export.
  3. add a var to dbt_project.yml for export_to_parquet: true (or false) if you don't want to export.
  4. run the entire project inside of duckdb.
  5. run a macro to export files, either as a run-operation or via on-run-end. The macro would grab a list of all models with a certain tag, then loop through exporting them one at a time.

This allows for a much cleaner set of dbt models. It also is a clear hand-off into some other system to handle the parquet files. As an example, you could orchestrate this pretty cleanly with meltano invoking the run-operation after dbt build completes.

automate ingestion

currently i am updating by hand. need to figure out a way to get the csv sources updates w/python

meltano pipeline error

I'm attempting to run on a mac within pyenv (python version 3.8.15). No problem running 'make build'. However running into an error running 'make pipeline'. See Meltano Logs below. Seems like it might relate to python 3.8 multiprocessing issue which is a dependency of target-parquet. Any tips?

Meltano Logs

2022-11-15T22:36:12.680915Z [info ] Performing full refresh, ignoring state left behind by any previous runs.
2022-11-15T22:36:13.985106Z [info ] INFO Using supplied catalog /Users//dev/meltano-projects/nba-monte-carlo/.meltano/run/tap-spreadsheets-anywhere/tap.properties.json. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.985599Z [info ] INFO Processing 4 selected streams from Catalog cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.985788Z [info ] INFO Syncing stream:nba_schedule_2023 cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.995274Z [info ] INFO Walking ./data. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.995459Z [info ] INFO Found 3 files. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.995631Z [info ] INFO Checking 3 resolved objects for any that match regular expression "nba_schedule_2023.csv" and were modified since 2001-01-01 00:00:00+00:00 cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.998637Z [info ] INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:13.998865Z [info ] INFO Syncing file "nba_schedule_2023.csv". cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.109560Z [info ] INFO Sending version information to singer.io. To disable sending anonymous usage data, set the config parameter "disable_collection" to true cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.110472Z [info ] INFO writing streams in separate folders cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126474Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126637Z [info ] File "/Users//dev/meltano-projects/nba-monte-carlo/.meltano/loaders/target-parquet/venv/bin/target-parquet", line 8, in cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126707Z [info ] sys.exit(main()) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126767Z [info ] File "/Users//dev/meltano-projects/nba-monte-carlo/.meltano/loaders/target-parquet/venv/lib/python3.8/site-packages/target_parquet/init.py", line 274, in main cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126826Z [info ] state = persist_messages( cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126882Z [info ] File "/Users//dev/meltano-projects/nba-monte-carlo/.meltano/loaders/target-parquet/venv/lib/python3.8/site-packages/target_parquet/init.py", line 225, in persist_messages cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126936Z [info ] t2.start() cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.126989Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/process.py", line 121, in start cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127063Z [info ] self._popen = self._Popen(self) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127114Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/context.py", line 224, in _Popen cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127167Z [info ] return _default_context.get_context().Process._Popen(process_obj) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127219Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/context.py", line 284, in _Popen cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127268Z [info ] return Popen(process_obj) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127317Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127366Z [info ] super().init(process_obj) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127415Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127466Z [info ] self._launch(process_obj) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127514Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127565Z [info ] reduction.dump(process_obj, fp) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127614Z [info ] File "/Users//.pyenv/versions/3.8.15/lib/python3.8/multiprocessing/reduction.py", line 60, in dump cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127663Z [info ] ForkingPickler(file, protocol).dump(obj) cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.127713Z [info ] AttributeError: Can't pickle local object 'persist_messages..consumer' cmd_type=elb consumer=True name=target-parquet producer=False stdio=stderr string_id=target-parquet
2022-11-15T22:36:14.142130Z [info ] INFO Wrote 1341 records for stream "nba_schedule_2023". cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.142359Z [info ] INFO Syncing stream:team_ratings cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.142598Z [info ] INFO Walking ./data. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.142795Z [info ] INFO Found 3 files. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.143263Z [info ] INFO Checking 3 resolved objects for any that match regular expression "team_ratings.csv" and were modified since 2001-01-01 00:00:00+00:00 cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.143363Z [info ] INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.143426Z [info ] INFO Syncing file "team_ratings.csv". cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.144764Z [info ] INFO Wrote 30 records for stream "team_ratings". cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.144874Z [info ] INFO Syncing stream:xf_series_to_seed cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145051Z [info ] INFO Walking ./data. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145115Z [info ] INFO Found 3 files. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145438Z [info ] INFO Checking 3 resolved objects for any that match regular expression "xf_series_to_seed.csv" and were modified since 2001-01-01 00:00:00+00:00 cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145510Z [info ] INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145568Z [info ] INFO Syncing file "xf_series_to_seed.csv". cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145835Z [info ] INFO Wrote 14 records for stream "xf_series_to_seed". cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.145935Z [info ] INFO Syncing stream:nba_elo_latest cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.146058Z [info ] INFO Assembled https://projects.fivethirtyeight.com/nba-model/nba_elo_latest.csv as the URL to a source file. cmd_type=elb consumer=False name=tap-spreadsheets-anywhere producer=True stdio=stderr string_id=tap-spreadsheets-anywhere
2022-11-15T22:36:14.337744Z [error ] Loader failed
Traceback (most recent call last):
File "/Users//.pyenv/versions/3.8.15/envs/meltano/lib/python3.8/site-packages/meltano/core/logging/output_logger.py", line 201, in redirect_logging
yield
File "/Users//.pyenv/versions/3.8.15/envs/meltano/lib/python3.8/site-packages/meltano/core/block/extract_load.py", line 457, in run
await self.run_with_job()
File "/Users//.pyenv/versions/3.8.15/envs/meltano/lib/python3.8/site-packages/meltano/core/block/extract_load.py", line 483, in run_with_job
await self.execute()
File "/Users//.pyenv/versions/3.8.15/envs/meltano/lib/python3.8/site-packages/meltano/core/block/extract_load.py", line 449, in execute
await manager.run()
File "/Users//.pyenv/versions/3.8.15/envs/meltano/lib/python3.8/site-packages/meltano/core/block/extract_load.py", line 647, in run
_check_exit_codes(
File "/Users//.pyenv/versions/3.8.15/envs/meltano/lib/python3.8/site-packages/meltano/core/block/extract_load.py", line 799, in _check_exit_codes
raise RunnerError("Loader failed", {PluginType.LOADERS: consumer_code})
meltano.core.runner.RunnerError: Loader failed
2022-11-15T22:36:14.338209Z [error ] Block run completed. block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False

Split docker actions based on visualization target

Now that Superset, Evidence, and Rill can be visualization options, I want to split the makefile to run the docker container based on which tool is handling the viz.

Superset works now with:

make docker-build
make docker-run-pipeline
make docker-run-superset

I want to continue the same pattern with Rill & Evidence, but I am stuck on how to get it to work. Debugging docker is hell!

make docker-build
make docker-run-pipeline
make docker-run-<viz_tool>

Persist parquet files from target-parquet

Since the meltano pipeline state is persisted as part of the container, running meltano run tap-spreadsheets-anywhere target-parquet in an existing container for the second (or really, any subsequent time) will fail because the parquet files are written to /tmp. It looks like the logic inside the .devcontainer drops /tmp when the container is stopped.

The best thing to do here is re-route the data created by the meltano run to a directory that is a part of the repo, but added to .gitignore. This will keep both the meltano run state and the data "in sync" and allow better behavior when re-running the pipeline.

Superset Dashboard Files

I ran the entire workflow within a VSCode devcontainer. Cool stuff! Do you have the Superset dashboard definitions uploaded to the repo (which produce the images in the readme?)

Playoff calcs seem off

Have quite a few issues to sort out based on this.

  1. Division winners cannot be play-in eligible (Hawks/Heat), i.e. they get 6 seed at worst.
  2. need to apply tiebreakers.
  3. would expect approx 50/50 East vs West odds, but instead it is more like 75/25. Need to investigate why this is, and if is a code issue.

Missing LICENSE

Would love to use this as a base for some of my own projects, can you add a license file? I'd recommend MIT or Apache

Add snapshotting to allow comparison between points and time.

Since ELO rating is re-forecasted after every game, it would be interesting to see how predicted outcomes change over time.

The chart below from 538 is an example of the kind of visual that is possible if we have snapshotting enabled.

image

It should be noted that this is where persistence comes into play since current models are NOT persisted (by design).

Rework external materializations to allow persistence

Currently, external materializations are basically a toy - they write parquet files into folders, but they only exist inside the environment during execution and then are flushed into the void. Furthermore, not all models that should be persisted this way are persisted!

Options:

  • use Motherduck to handle persistence between runs. CC @Alex-Monahan, this is a very interesting use case for duckDB + serverless analytics. Would love some thoughts about this. I am thinking "every job run outputs a unique file to motherduck" or something. My biggest concern is coupling to duckDB version since they are not backwards compatible.
  • use serverless postgres (i.e. Neon) and write out resulting datasets to a persisted cache.
  • write to S3/GCS/Azure Blob after each job run.
  • I tend to favor this approach because it is the most "generic" but the affordances of a real database (i.e. a catalog) are very nice.

Bug: 'make pipeline' fails - loader fails

When running 'make pipeline', the loader fails. There is an open issue for this here.

The workaround, annoyingly, is to skip the duckdb loader entirely and instead run 'make parquet'.

test unique_xf_series_to_seed_series_id fails

Environment: Ubuntu 20.04.6 via WSL

Command: make docker-run-evidence and make docker-run-superset

Description: When meltano invoke dbt-duckdb build is called when running the docker container, the test unique_xf_series_to_seed_series_id fails. The same test passes without using docker. Here are the messages which show up in the terminal after the 75 tests are completed:

23:19:50  Running 1 on-run-end hook
23:19:51  Statements to run:
23:19:51  1 of 1 START hook: nba_monte_carlo.on-run-end.0 ................................ [RUN]
23:19:51  1 of 1 OK hook: nba_monte_carlo.on-run-end.0 ................................... [OK in 0.00s]
23:19:51
23:19:51
23:19:51  Finished running 7 table models, 10 external models, 41 tests, 17 view models, 1 hook in 0 hours 0 minutes and 32.05 seconds (32.05s).
23:19:51
23:19:51  Completed with 1 error and 0 warnings:
23:19:51
23:19:51  Failure in test unique_xf_series_to_seed_series_id (models/_docs.yml)
23:19:51    Got 14 results, configured to fail if != 0
23:19:51
23:19:51    compiled Code at ../docs/compiled/nba_monte_carlo/models/_docs.yml/unique_xf_series_to_seed_series_id.sql
23:19:51
23:19:51  Done. PASS=68 WARN=0 ERROR=1 SKIP=6 TOTAL=75

Superset fails to run with ```ModuleNotFoundError: No module named 'cryptography.hazmat.backends.openssl.x509'```

After cloning the repository and trying to run in docker on master branch, I get this error:

meltano invoke superset fab create-admin --username admin --firstname lebron --lastname james --email [email protected] --password password
2023-01-03T10:30:34.244420Z [error    ] Traceback (most recent call last):
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/bin/superset", line 5, in <module>
    from superset.cli.main import superset
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/lib/python3.9/site-packages/superset/__init__.py", line 21, in <module>
    from superset.app import create_app
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/lib/python3.9/site-packages/superset/app.py", line 23, in <module>
    from superset.initialization import SupersetAppInitializer
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/lib/python3.9/site-packages/superset/initialization/__init__.py", line 33, in <module>
    from superset.extensions import (
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/lib/python3.9/site-packages/superset/extensions/__init__.py", line 32, in <module>
    from superset.utils.cache_manager import CacheManager
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/lib/python3.9/site-packages/superset/utils/cache_manager.py", line 24, in <module>
    from superset.utils.core import DatasourceType
  File "/workspaces/nba-monte-carlo/.meltano/utilities/superset/venv/lib/python3.9/site-packages/superset/utils/core.py", line 76, in <module>
    from cryptography.hazmat.backends.openssl.x509 import _Certificate
ModuleNotFoundError: No module named 'cryptography.hazmat.backends.openssl.x509'

Need help fixing this problem? Visit http://melta.no/ for troubleshooting steps, or to
join our friendly Slack community.

Superset metadata database could not be initialized: `superset db upgrade` failed
make: *** [Makefile:18: superset-visuals] Error 1
make: *** [docker-run-superset] Error 2

It seems to be similar to error mentioned in #54 but the superset-test branch seems to be gone now. I'm happy to provide a PR if I can be pointed in the right direction on where this might be happening. I am running on macOS Catalina 10.15.7

`make docker-run` command fails with "plugin 'superset' is not known to meltano" error

Steps to replicate:

  1. clone repository
  2. make docker-build
  3. make docker-run

Error message:

16:45:43  Done. PASS=74 WARN=0 ERROR=0 SKIP=0 TOTAL=74


meltano invoke superset fab create-admin --username admin --firstname lebron --lastname james --email [email protected] --password password
Need help fixing this problem? Visit http://melta.no/ for troubleshooting steps, or to
join our friendly Slack community.

Plugin 'superset' is not known to Meltano
make: *** [Makefile:14: superset-visuals] Error 1
make: *** [Makefile:23: docker-run] Error 2

Evaluate which code can be consolidated into macros across NBA, NFL, NCAAF models.

The following models should in theory be generic:

  • inputs
    • teams
    • schedule
    • actual results
  • outputs
    • elo over time
    • predictions
    • end of season standings (blending predictions + actuals) AKA game log

This will allow additional sports to be added very quickly and easily - NHL, MLB, Premier League (?), College bball

non-generic models + reasons

  • in season tournament / "champions cups": the games are not fixed, so we have compute projected winners to slot teams into subsequent games.
  • end of season seeding: tiebreaking methodology depends on specific league rules
  • playoffs: for leagues with playoffs, subsequent games take a dependency on projected outcomes. additionally, some leagues have different criteria for wins (NFL is 1 game, NBA is best of 7, MLB is hybrid, and so on)

However, the most interesting analysis depends on "end of season seeding" (argh), so we will need to figure out how to build models for end of season seeding for each sport. Playoff models I am less certain on, because I am not confident "regular season win totals" which is currently driving ELO ratings is necessarily predictive of playoff success. For all intents and purposes, regular season and post season models should probably behave independently.

db file not created

Running from main branch in codespaces, follwed directions in readme.md, and was not able to connect to the db from superset.

Upon investigation, realized the file was not present at /tmp/mdsbox.db after running make pipeline.

image

It looks like either the deletion of the path keys in profiles.yml in #33 , or that in combination with the switch to use external materialization in #36 is the cause. I have a PR incoming that I will link to this issue to add the path key back in that seems to fix it.

After doing that, the mdsbox.db file shows up for me:

image

Clean up cosmestic issues

Have a dirth of cosmetic issues I need to fix, been delaying that as I focus more on modeling. Going back to this now since modeling is mostly stable.

NBA

  • main page
    • upcoming games filter should be sorted alphabetically on team name
    • upcoming games filter only applies to home games
    • upcoming games data table is too wide
    • in-season tournament should be removed
    • standings table should show records as integer, not decimal.
    • add 3 columns to standing table: make playoffs, win championship & elo vs vegas.
  • Historical Matchups
    • playoff wins / losses not populated
      • check underlying data to see if i need to fix the input csv file (likely)
    • elo change table - label the x axis
    • elo change table - label the y axis, make sure lines show up
    • #143
    • #142
    • data is stopping at 2016, why?
  • IST - add championship summary for LAL
  • IST - hide upcoming games
  • Matchup calcs
    • scores should show as integers, not decimals
    • #141
    • table with elo value is in wrong order (fixed in historical matchups page, so replicate that logic here)
  • predictions
    • game id should be int not decimal (marking complete - looks like an issue with evidence sources)
    • table is too wide
    • replace search with a team filter
  • prediction details
    • add a line for avg team (1600)
    • table with elo value is in wrong order (fixed in historical matchups page, so replicate that logic here)
    • last 5 games - scores should be int not decimal
    • team matchups - record should be int not decimal
  • teams
    • records should be int, not dec
    • add elo vs vegas w/red+green indicators
  • team details
    • seed range should be int, not dec
    • win range should be int, not dec
    • recent games - should be int, not dec
    • upcoming schedule - use this table format anywhere we are surfacing predictions (its the right columns)

NCAAF

  • Lock this as of end of regular season (incl army/navy game)

NFL

  • End of season seeding out of order on the bar chart (works in NBA section, so replicate that component)
  • most recent games - sort of date DESC (currently date ASC), set it to 5 games)
  • scores should render as int, not dec

What to do about Rill & Superset?

Rill & Superset are in the project for legacy purposes but certainly make it a little heavier to maintain. It does seem that many users do find value in the instructions on how to config superset, and it is a nice tool! Perhaps we just leave them in here as options installed with config flags and focus on the core workflow on Evidence.

Add NCAAF Forecast

Once the NFL team pages are built out and sorted, add pages for NCAAF teams. The model should be roughly appropriate, although the post-season stuff is much more complex.

Execute a python model

This way of structing a data-stack in a box is just brilliant. I can get everything up and running except making a python model instead of a sql model.

I have this very basic py model to test:

import pandas as pd
def model(dbt, session):
    dbt.config(packages=["pandas"])
    data= dbt.ref("train_test_split")
    data['test_column'] = 1

    return data

However, upon a dbt:run I get:

Python model failed:
No module named 'pandas'

I consulted the author of duckdb-dbt here, but I am still not able to run it. Is it possible in your end?

misc model updates

nfl

  • add tiebreakers
  • add playoffs

nba

  • add in-season tournament analytics page
  • move evidence queries to sources
  • add tiebreakers for in-season tournament
  • fix h2h tiebreakers for in-season tournament
  • add in-season tournament predictive model
  • handle plug for the 22 teams that didn't make the in-season tournament
  • add end of season tiebreakers
  • fix playoffs (currently showing 0% for all teams to win finals)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.