Giter Site home page Giter Site logo

benjokek / henry-individual-project-1 Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 15.61 MB

Video showing API and model: https://www.youtube.com/watch?v=jDqP0YQ24PM

Home Page: https://benjokek-henry-individual-project-1-streamlit-qc9jn6.streamlit.app/

Jupyter Notebook 71.34% Python 28.66%

henry-individual-project-1's Introduction

INDIVIDUAL PROJECT Nº1

Machine Learning Operations (MLOps)

Welcome to the first individual project of the labs stage! On this occasion, I will have to do a job placing myself in the role of a MLOps Engineer.


Context

Role to develop: I recently started working as a Data Scientist at a streaming platform aggregation service startup. My first project is to create a recommendation system that hasn't been implemented yet. When examining the available data, I realized that its maturity is very low: it is nested, untransformed, and there are no automated processes for updating new movies or series, among other problems.

I must start from scratch and quickly become a Data Engineer to have an MVP (Minimum Viable Product) ready for next week.

ETL

First I import the dirty data from the datasets directory. Then I clean it. Finally I export it as a ".csv" file in order to use it later in the model to test the recommendation system. Also I upload it to my PostgreSQL database hosted in Render (create your own here: https://dashboard.render.com/new/database) so I can use it with my API. The connection uses the credentials saved in a ".env" file that must be created with the data as in the ".env.example".

DEA

The DEA uses ydata_profiling, dataprep, sweetviz, autoviz, missingno and wordcloud to analyse the data from the ETL.

FastAPI

I used the FastAPI framework to create an app with several endpoints. The application communicates with my PostgreSQL database using SQLAlchemy to perform various operations on movie data. I host the API in Render (the same website where I created the database) and also there I can configure the enviroment variables with the database credentials. Also when creating the "Web Service" server in Render, in order to be able to use the API, I have to configure the settings so the "Build Command" is "pip install -r requirements.txt" and the "Start Command" is "uvicorn main:app --host 0.0.0.0 --port 10000". Here is the importance of having a environment in our project so we can add the dependencies with "pip freeze > requirements.txt" so the server use that file to install everything needed (https://github.com/HX-FNegrete/render-fastapi-tutorial).

API goals:

  • The month is entered and the function returns the number of movies that were released that month (name of the month, in str, example 'January') historically: return {'month':month , 'amount':response}
  • The day is entered and the function returns the number of movies that were released that day (of the week, in str, example 'Monday') historically: return {'day': day , 'amount':response}
  • Enter the franchise, returning the number of movies, total and average profit: return {'franchise':franchise, 'quantity':response, 'total_profit':response, 'average_profit' :answer}
  • Enter the country, returning the number of movies produced in it: return {'country':country, 'quantity':response}
  • Enter the producer, returning the total profit and the number of movies they produced: return {'producer':producer, 'total_profit':response, 'quantity':response}
  • Enter the movie, returning investment, profit, return, and the year it was released: return {'movie':movie, 'investment':response, 'profit' :response,'return':response, 'year':response}
  • Enter a movie name and it recommends similar ones in a list of 5 values: return {'recommended list': response}

Model - Movie Description Based Recommender

The model will look for similarity between movies. This is known as Content Based Filtering/Recommender because I will be using movie metadata to build it. It will be based on movie "Overviews" and "Taglines". Also, I will be using a subset of all the movies available due to limiting computing power.

The FastAPI app retrieves all movies from my PostgreSQL database, performs some data preprocessing on them, and then computes the cosine similarity between their descriptions to get a list of recommended movies for a specified title. It is limited to 500 movies because the very limited memory in the free server. I could run it with more thans 20000 movies in local (more than 30000 becomes too much for me) and the recommendations seem good. The notebook is available to test.

Streamlit

I created a simple Streamlit app which make requests to the API.

henry-individual-project-1's People

Contributors

benjz2 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.