- json_parser is a pipeline where a given json input is mapped to a certain format and saved back to postgres table
-
First create the database using the given make file
make run-db
To run this succesfully you need to have docker installed in your environment. More information can be found on https://docs.docker.com/engine/install/. -
Then, install dependencies;
pip install -r requirements.txt
-
Then run models.py to create a new database using sqlalchemy as the ORM;
python -m src.models
-
We can now run our parser to parse the data and save it in Postgres
python -m src.parser
The parser can be written in pure Python to reduce the package size, but Pandas is fun and gives a lot of out of the box methods for rapid development.
- In the current format there is only one requirement tested. This is open to improvement. For example we can also try to test the datetime fileds before parsing to make sure they are type of a datetime object.
python -m unittest tests.test_pipeline
- Connect to psql docker.
docker exec -it {container_id}
bash - Connect to psql
psql -U postgres
- Use the retriever database
\c retriever;
- To check records do;
select * from retriever_table;
- To check record types and confirm the data being stored as json;
select pg_typeof("obj") from retriever_table;