This app was made using streamlit for the codenation Aceleradev Challenge 2020
It uses a dataset of businesses information to recommend similar businesses based on a portfolio of existing clients
A Tf-Idf matrix was created using the dataset features
With this matrix, the cosine similarity is calculated between the given ids and the dataset.
Then, a score is attributed to each example on the dataset by the mean of the similarity scores
Create a virtual environment and install the necessary libraries from requirements.txt
Download the data using misc/download.sh
Go to the src
directoty
Train the model executing src/train_model.py
Create the geolocations executins src/geolocations.py
To start the app run streamlit run app.py
In the notebook folder the is a jupyter notebook exploring how te model mas build.
estaticos_market.zip
→ Market dataestaticos_portifolio{1, 2, 3}.csv
→ Test setsgeo.zip
→ Geolocation of each address fromestaticos_market.csv
recommender.pkl
→ Trained Recommender modelfeatures_dictionary.pdf
→ Description of the featuresdownload.sh
→ Bash script to download the csv datalinks.txt
→ File containing the urls of the files to downloadREADME.md
→ Description of the challenge
app.py
→ Streamlit app filegeolocations.py
→ Script created to extract the geolocation from the location on theestaticos_market.csv
filepreprocessor.py
→ Class responsible for preprocessing theestaticos_market.csv
data to be used in theRecommender
classrecommender.py
→ Class implementing a recommendation system based on text simiarity using tf-idf and cosine distanceSessionState.py
→ Class used for persisting user session data on streamlittrain_model.py
→ Script user to train the recommender modelr model
download.sh
→ Downloads the datasetup.sh
→ Configures streamlit for deployr deploy