File_Proccessing application

Input:

mixtape.json: data file including users, songs and playlists
changes.json: changes including:
- add_new_playlist
- remove_playlist
- add_song_to_playlist

Output:

output.json: result file with the changes listed in changes.json applied to mixtape.json

How to run it

Prerequisites: Python 3.x and pip 20.x

git clone https://github.com/jameswang2015/file_processing.git
cd file_processing/
python -m venv .venv
source ./.venv/bin/activate
pip install pydantic
python main.py #  or python main.py -i mixtape.json -c changes.json

Rules for each functions

Some rules are designed as followings. Note that some of these rules are defined by author per author's best knowledge of the processing logic, they can be adjusted per request.

add_new_playlist:
- if user_id doesn't exist in users, the new playlist won't be created
- the new playlist_id is generated as max_existing_playlist_id + 1
remove_playlist:
- if playlist_id does not exist, print the info and don't perform removal
add_song_to_playlist:
- if song or playlist does not exist, print the info and don't perform addition
- if sone is already in playlist, print the info and don't perform addition
Pydantic model is used for data validation. If the given change_detail does not meet the model requirement, a ValidationError traceback is printed and the application will be terminated, meaning all following changes won't be performed. This behaviour can be modified as requested, for example, we can catch this error and make application continue with following changes. See the documentation of the Pydantic library for more information.

How to scale up

vary large maxtape.json input file
If the json input file is too large to fit in memory, we can't use json.load() to load it in whole. Rather, we need some streaming tool like ijson to read json as streaming.

We could also consider to put this into database and create three tables for users, songs and playlists, respectively. Then we can leverage database sql and primary/foreign key constraints to handle this.

Or, we can convert this to three hive tables and leverage hql to handle them.
vary large changes.json file
we can use generator to read this changes.json as streaming, yield one change each time, and handle changes one by one.

jameswang2015 / file_processing Goto Github PK

file_processing's Introduction

File_Proccessing application

How to run it

Rules for each functions

How to scale up

file_processing's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent