cd datamart
if you are using linux, refer here to remove 2 lines: conda/conda#6073 (comment)
conda env create -f environment.yml
source activate datamart_env
git update-index --assume-unchanged datamart/resources/index_info.json
python -W ignore -m unittest discover
If you meet problems about level.db, please try the following commands:
If you have homebrew installed: brew install leveldb
otherwise:
pip install leveldb
CFLAGS='-mmacosx-version-min=10.7 -stdlib=libc++' pip install plyvel --no-cache-dir --global-option=build_ext --global-option="-I/usr/local/Cellar/leveldb/1.20_2/include/" --global-option="-L/usr/local/lib"
pip install rltk
Before commit: please run the following commands to update the dependencies config.
conda env export --no-build > environment.yml
pip freeze > requirements.txt
After pull: please run conda env update -f environment.yml
to update the dependencies.
Dataset providers should validate their dataset schema against our json schema by the following
python scripts/validate_schema.py --validate_json {path_to_json}
eg.
$ python scripts/validate_schema.py --validate_json test/tmp/tmp.json
$ Valid json
-
Prepare your dataset schema following datamart index schema and validate it with the previous step
-
Create your materialization method by creating a subclass of
materializer_base.py
. and put indatamart/materializers
. See README -
Have your dataset schema json
materialization.python_path
pointed to the materialization method. Take a look at tmp.json. -
Play with the following:
Create metadata and index it on Elasticsearch, following: Indexing demo
Query datamart, following: Query demo
Dealing with TAXI example, following: taxi_example
Dealing with FIFA example, following: fifa_example
Dealing with Hall of Fame example, following: hof_example
Python API documentation: python api
REST API documentation: rest_example
(more information can be found under wiki)
Note: Launch notebook:
jupyter notebook test/index.ipynb