Input:
- mixtape.json: data file including users, songs and playlists
- changes.json: changes including:
add_new_playlist
remove_playlist
add_song_to_playlist
Output:
- output.json: result file with the changes listed in
changes.json
applied tomixtape.json
Prerequisites: Python 3.x and pip 20.x
git clone https://github.com/jameswang2015/file_processing.git
cd file_processing/
python -m venv .venv
source ./.venv/bin/activate
pip install pydantic
python main.py # or python main.py -i mixtape.json -c changes.json
Some rules are designed as followings. Note that some of these rules are defined by author per author's best knowledge of the processing logic, they can be adjusted per request.
add_new_playlist
:- if user_id doesn't exist in users, the new playlist won't be created
- the new playlist_id is generated as
max_existing_playlist_id + 1
remove_playlist
:- if playlist_id does not exist, print the info and don't perform removal
add_song_to_playlist
:- if
song
orplaylist
does not exist, print the info and don't perform addition - if
sone
is already inplaylist
, print the info and don't perform addition
- if
- Pydantic model is used for data validation. If the given
change_detail
does not meet the model requirement, aValidationError
traceback is printed and the application will be terminated, meaning all following changes won't be performed. This behaviour can be modified as requested, for example, we can catch this error and make application continue with following changes. See the documentation of the Pydantic library for more information.
-
vary large maxtape.json input file
If the json input file is too large to fit in memory, we can't usejson.load()
to load it in whole. Rather, we need some streaming tool likeijson
to read json as streaming.We could also consider to put this into database and create three tables for
users
,songs
andplaylists
, respectively. Then we can leverage database sql and primary/foreign key constraints to handle this.Or, we can convert this to three hive tables and leverage hql to handle them.
-
vary large changes.json file
we can use generator to read this changes.json as streaming, yield one change each time, and handle changes one by one.