Giter Site home page Giter Site logo

file_processing's Introduction

File_Proccessing application

Input:

  • mixtape.json: data file including users, songs and playlists
  • changes.json: changes including:
    • add_new_playlist
    • remove_playlist
    • add_song_to_playlist

Output:

  • output.json: result file with the changes listed in changes.json applied to mixtape.json

How to run it

Prerequisites: Python 3.x and pip 20.x

git clone https://github.com/jameswang2015/file_processing.git
cd file_processing/
python -m venv .venv
source ./.venv/bin/activate
pip install pydantic
python main.py #  or python main.py -i mixtape.json -c changes.json

Rules for each functions

Some rules are designed as followings. Note that some of these rules are defined by author per author's best knowledge of the processing logic, they can be adjusted per request.

  • add_new_playlist:
    • if user_id doesn't exist in users, the new playlist won't be created
    • the new playlist_id is generated as max_existing_playlist_id + 1
  • remove_playlist:
    • if playlist_id does not exist, print the info and don't perform removal
  • add_song_to_playlist:
    • if song or playlist does not exist, print the info and don't perform addition
    • if sone is already in playlist, print the info and don't perform addition
  • Pydantic model is used for data validation. If the given change_detail does not meet the model requirement, a ValidationError traceback is printed and the application will be terminated, meaning all following changes won't be performed. This behaviour can be modified as requested, for example, we can catch this error and make application continue with following changes. See the documentation of the Pydantic library for more information.

How to scale up

  • vary large maxtape.json input file
    If the json input file is too large to fit in memory, we can't use json.load() to load it in whole. Rather, we need some streaming tool like ijson to read json as streaming.

    We could also consider to put this into database and create three tables for users, songs and playlists, respectively. Then we can leverage database sql and primary/foreign key constraints to handle this.

    Or, we can convert this to three hive tables and leverage hql to handle them.

  • vary large changes.json file
    we can use generator to read this changes.json as streaming, yield one change each time, and handle changes one by one.

file_processing's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.