The c-3po from lttkgp

Remove the NAT Gateway on AWS Infrastructure

Right now, we use a NAT Gateway in our infrastructure to allow resources inside a public subnet to access resources present inside of a private subnet (All part of the same VPC). AWS mandates creating ECS services inside of a private subnet which is why we need a NAT Gateway to route connections to resources inside private subnets (Needs to be looked into again to confirm)

The issue with having a NAT Gateway is that it is the major contributor to infrastructure costs. NAT seems an expendable resource inside of the infra if the issue with routing is fixed and hence we need to find a solution to either remove the need for private subnets (Hence removing the need for a NAT) or access resources in private subnets without a NAT (Which is unlikely if not impossible, what's the point of a private subnet otherwise)

Minor change in KWOC.md

The first heading says "Why, hello there!". I am not sure if you wrote "Why" over there or if it is a typo.
If it is a typo please let me correct it

Insert song metadata into the postgres database

Add Google API key to .env file

An environment variable GOOGLE_APPLICATION_CREDENTIALS needs to be added to the .env file as required by metadata-extractor.

Use the `embeddable` attribute from metadata-extractor

Currently, we are not using the embeddable attribute made available by metadata-extractor. As a result, there are unavailable or blocked videos on the frontend. We need to modify the logic in insert_link so that it only inserts if the link is embeddable.

Delete a post from DB if it is deleted from the group in less than 24 hours

Add search endpoint

Add a search endpoint that accpets search parameter and returns a list of posts that match the the searched phase.

The preferred way would be a fuzzy search.

Refresh the Facebook data for the group and store a publicly accessible backup

Setup Cron job to fetch from Graph API and populate DBs

Update docs

I think an update in the documentation is due 😄
We need to update docs about:

How to setup the project and install dependencies
Populate the databases.
Launching API
Code guidelines

Setup CodeDeploy to automate deployment to ECS

This involves:

Running a GitHub action on every push to master to build the latest Docker image
Push the new image to ECR
Update the ECS task definition to use the latest image
Ensure the new definition gets deployed to the cluster

Initialise MongoDB

Use pymongo to communicate with MongoDB. The script should be able to create a new db (named 'c3po') and initialize a collection named 'posts', if they don't exist already. If they do, should be able to load them (ie. use them).

Add a pre-commit hook to run sanity.sh

A pre-commit hook can be set up to run any command/script before changes are committed. Running sanity.sh as a pre-commit hook would ensure all commits/PRs are properly formatted, as expected by the project.

Dockerize the C3PO setup

As of now, since we're deploying both the Slack application and the Postgres DB on the server, it might be worth using docker-compose to manage the deployment of both of them.

Create migration scripts to setup from scratch

Create endpoint for relatively unpopular and undiscovered songs.

One of the best features that we can implement is to show posts with songs that are largely undiscovered.
Name for the endpoint can be discovered or something. Open for discussion 😅

Extend get_feed to handle pagination

Currently, get_feed fetches only the first page. To learn more about how pagination works in the Graph API, check out their documentation at https://developers.facebook.com/docs/graph-api/using-graph-api/#paging.

To start with, fetch all posts until the very first by following the pagination links returned by the API. This is needed to populate our db (when we have one).

Add genre filtering service to endpoints

Introduce sane defaults to env variables in the code

The ideal dev experience for a new contributor should be less than 3 steps. So, we should ensure that they don't have to search around for how to fill all environment variables.

Create Spotify weekly/monthly Spotify & YouTube playlists in LTTKGP account

Create a barebones Flask API layer with a default endpoint

The default endpoint can do nothing but return a static string (like "Hello LTTKGP!").

Sanitize Python code using Pycodestyle

It might be a good idea to adopt a code style guideline for our work. This might enforce some good coding and documentation practices within the project.

We could use Dodgy, Isort, Pycodestyle and Pydocstyle and set up a script to run them before pushing or sending a PR. I believe setting up Travis would also be a good idea to make sure everything is fine in terms of coding style.

Add random endpoint

Add a random endpoint that returns random tracks from the entire post history.

This would help in having different/fresh content without forcing people to use specific filters etc when the filters in feed are implemented.
Also, It will be a fun jukebox experience and would help in reviving old posts that are almost impossible to find in the group.

Fix /popular logic

We are setting the timeframe for popular posts as one week before the current date. We should change this to one week prior to the latest post in the DB

Enhance logging across the codebase

Right now, we have very limited visibility into the functionality via logs. We should add useful (but not a lot, only essential) logs wherever needed.
Use module-level loggers instead of root-level loggers so logs can also explitly highlight which module they're coming and simplify querying as well.

Add Elasticsearch functionality to /search endpoint

We could use ES to index queries using the /search endpoint.
This would require:

Wrapping the SQLAlchemy model with a SearchMixin class
Adding a DB event to be triggered (to add to ES index) whenever a new item is added to DB
Creating a search function to allow custom quries with fields as a parameter (To search for only specific fields in the ES index)

Handle error scenario when Spotify key expires / becomes invalid

As of now, insert_metadata fails with an Invalid Client error when Spotify key expires / becomes invalid (when?). Handle this scenario gracefully so that parsing still succeeds without metadata.

Update the likes count for every song posted in the last week

Fix Python dependency caching in our Actions workflow

For now, we cache Python pip dependencies using:

    - name: Cache Python Dependencies
      uses: actions/cache@v2
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
        restore-keys: |
          ${{ runner.os }}-pip-
          ${{ runner.os }}-

(Ref -> test.yml)

This doesn't seem to work for our use case and we need to fix it to properly cache dependencies

Create get_post

Add a method get_post to use the id from each post in get_feed to fetch the following information:

Created time
Message
Message tags
From
Link
Shares
To
Updated time

Information regarding these fields can be found here: https://developers.facebook.com/docs/graph-api/reference/v2.11/post

Finalise the postgres schema for managing all data and add to Wiki

Create the feed endpoint

Remove unnecessary keys from AWS Secrets Manager

We're using AWS Secrets Manager to manage our API Keys and prod environment variables. AWS charges on a per secret basis monthly and we can remove some (not-so-private) keys from there.

We could include the following keys directly in our task configuration:

SPOTIFY_REDIRECT_URI
SPOTIPY_CLIENT_ID (The client secret still stays on secrets manager of course)

Improve logic of underrated endpoints

Currently, we are sorting on the custom_popularity column while fetching the data from the DB. However, this is not producing the desired results. We need to display songs that are underrated and recently posted.
I imagine a modification to the logic as follows:

We decide upon certain lower and upper thresholds for which a song maybe decided to be underrated. These thresholds maybe applied on custom_popularity or more conveniently on YouTube views since we are currently supporting only YouTube links. For example views < 300000 and views > 500
We make a join on user_posts and link with the above conditions and sort the resulting posts by date.

Add youtube-id parameter wherever applicable

Currently, there is no uniformity in youtube links, there are both https://www.youtube.com/watch?v=bR8sE9ubyTI and youtu.be/bR8sE9ubyTI, If we could also have a parameter which just stores the Youtube's video id bR8sE9ubyTI as a separate string.

This video ID is rarely changed throughout youtube's URLs and APIs and thus is essential to have separate.

This can be fixed by having a function that separates the video ID for both types of URL, but it will be more sensible in backend.

Extend get_post to fetch likes and comments

Once #2 is complete, extend get_post to get information about users who have reacted to the post (and what reaction) and the comments on the post.

From the comments in each post, it is only required to find any links posted by other users. If so, maintain them in the object. (Storing this data can be handled later) It is not necessary to maintain the 'comment' -> 'comment reply' hierarchy. It is okay to keep all links simply as a list of 'children' (say).

New method for saving a post to db

To start with, the structure of 'posts' can be like this:

{'_id': ObjectId('...'),
 'author': 'Naresh',
 'created_at': '',
 'reactions': [...],
 'message': 'Awesome song 1!',
 ...}
{'_id': ObjectId('...'),
 'author': 'Mayank',
 'created_at': '',
 'reactions': [...],
 'message': 'Awesome song 2!',
 ...}

The list of fields to include are in #2 and #3 (although those 2 issues are not a prerequisite for this one).

Once #4 is done, write a new method that adds a new post to the DB with the above fields as parameters.

Setup a filter so we only return active YouTube links

NOTE: Do not pick this up yet until we finalize a plan on how to solve it.

To start with, we should investigate what happens when we send links that are either deleted/blocked. If the player can automatically skip it, we should start there rather than having to poll YouTube at regular intervals to check if links are active. Such a change would also require a change to the database schema.

Handle cases with no Spotify data

Currently metadata-extractor raises an exception when no data is found on Spotify and the link is not being inserted in PostgreSQL from the dump in Mongo. Many links do not have data on Spotify, thus resulting in a loss of songs that are posted. We can catch the Spotify exception by creating a UserPost and a Link object with no corresponding Song and not show the Spotify data on the frontend.

Write a lambda function to rotate Spotify Access Token

For now, we've setup the SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET environment variables in our AWS Secrets Manager. Since these tokens expire and as a good practice, it'd be nice to setup automatic rotation of these every fixed number of days. AWS Secrets Manager allows for writing a Lambda function to rotate keys and provide the new keys back to it which it updates automatically.

For Spotify,here's their guide to authorization workflow: https://developer.spotify.com/documentation/general/guides/authorization-guide/#client-credentials-flow

Enable hot-reloading Docker dev setup

Currently, during development, the Docker image needs to be rebuilt every time there is a change, which slows down testing. Hot reloading would make this a lot smoother. This is quite trivial - fun fix for first-timers.
Tutorial: https://stackoverflow.com/a/44344442/4396392

On first run, when FB_LONG_ACCESS_TOKEN is not available, the GraphQL request fails and throws an exception, even though the long access token is fetched as part of the flow. Change this to retry automatically after failure.
Occasionally, the long access token is not written back to the environment file correctly (gets written without a newline, corrupting the previous property too). Fix this.

lttkgp / c-3po Goto Github PK

c-3po's People

Contributors

Stargazers

Watchers

Forkers

c-3po's Issues

Recommend Projects

Recommend Topics

Recommend Org