Giter Site home page Giter Site logo

c-3po's People

Contributors

2000yeshu avatar cipherlord avatar damienjacinto avatar dependabot[bot] avatar ghostwriternr avatar parth-paradkar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

c-3po's Issues

Remove the NAT Gateway on AWS Infrastructure

Right now, we use a NAT Gateway in our infrastructure to allow resources inside a public subnet to access resources present inside of a private subnet (All part of the same VPC). AWS mandates creating ECS services inside of a private subnet which is why we need a NAT Gateway to route connections to resources inside private subnets (Needs to be looked into again to confirm)

The issue with having a NAT Gateway is that it is the major contributor to infrastructure costs. NAT seems an expendable resource inside of the infra if the issue with routing is fixed and hence we need to find a solution to either remove the need for private subnets (Hence removing the need for a NAT) or access resources in private subnets without a NAT (Which is unlikely if not impossible, what's the point of a private subnet otherwise)

Minor change in KWOC.md

The first heading says "Why, hello there!". I am not sure if you wrote "Why" over there or if it is a typo.
If it is a typo please let me correct it

Add Google API key to .env file

An environment variable GOOGLE_APPLICATION_CREDENTIALS needs to be added to the .env file as required by metadata-extractor.

Use the `embeddable` attribute from metadata-extractor

Currently, we are not using the embeddable attribute made available by metadata-extractor. As a result, there are unavailable or blocked videos on the frontend. We need to modify the logic in insert_link so that it only inserts if the link is embeddable.

Add search endpoint

Add a search endpoint that accpets search parameter and returns a list of posts that match the the searched phase.

The preferred way would be a fuzzy search.

Update docs

I think an update in the documentation is due ๐Ÿ˜„
We need to update docs about:

  1. How to setup the project and install dependencies
  2. Populate the databases.
  3. Launching API
  4. Code guidelines

Setup CodeDeploy to automate deployment to ECS

This involves:

  1. Running a GitHub action on every push to master to build the latest Docker image
  2. Push the new image to ECR
  3. Update the ECS task definition to use the latest image
  4. Ensure the new definition gets deployed to the cluster

Initialise MongoDB

Use pymongo to communicate with MongoDB. The script should be able to create a new db (named 'c3po') and initialize a collection named 'posts', if they don't exist already. If they do, should be able to load them (ie. use them).

Add a pre-commit hook to run sanity.sh

A pre-commit hook can be set up to run any command/script before changes are committed. Running sanity.sh as a pre-commit hook would ensure all commits/PRs are properly formatted, as expected by the project.

Dockerize the C3PO setup

As of now, since we're deploying both the Slack application and the Postgres DB on the server, it might be worth using docker-compose to manage the deployment of both of them.

Sanitize Python code using Pycodestyle

It might be a good idea to adopt a code style guideline for our work. This might enforce some good coding and documentation practices within the project.

We could use Dodgy, Isort, Pycodestyle and Pydocstyle and set up a script to run them before pushing or sending a PR. I believe setting up Travis would also be a good idea to make sure everything is fine in terms of coding style.

Add random endpoint

Add a random endpoint that returns random tracks from the entire post history.

This would help in having different/fresh content without forcing people to use specific filters etc when the filters in feed are implemented.
Also, It will be a fun jukebox experience and would help in reviving old posts that are almost impossible to find in the group.

Fix /popular logic

We are setting the timeframe for popular posts as one week before the current date. We should change this to one week prior to the latest post in the DB

Enhance logging across the codebase

  1. Right now, we have very limited visibility into the functionality via logs. We should add useful (but not a lot, only essential) logs wherever needed.
  2. Use module-level loggers instead of root-level loggers so logs can also explitly highlight which module they're coming and simplify querying as well.

Add Elasticsearch functionality to /search endpoint

We could use ES to index queries using the /search endpoint.
This would require:

  • Wrapping the SQLAlchemy model with a SearchMixin class
  • Adding a DB event to be triggered (to add to ES index) whenever a new item is added to DB
  • Creating a search function to allow custom quries with fields as a parameter (To search for only specific fields in the ES index)

Fix Python dependency caching in our Actions workflow

For now, we cache Python pip dependencies using:

    - name: Cache Python Dependencies
      uses: actions/cache@v2
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
        restore-keys: |
          ${{ runner.os }}-pip-
          ${{ runner.os }}-

(Ref -> test.yml)

This doesn't seem to work for our use case and we need to fix it to properly cache dependencies

Remove unnecessary keys from AWS Secrets Manager

We're using AWS Secrets Manager to manage our API Keys and prod environment variables. AWS charges on a per secret basis monthly and we can remove some (not-so-private) keys from there.

We could include the following keys directly in our task configuration:

  • SPOTIFY_REDIRECT_URI
  • SPOTIPY_CLIENT_ID (The client secret still stays on secrets manager of course)

Improve logic of underrated endpoints

Currently, we are sorting on the custom_popularity column while fetching the data from the DB. However, this is not producing the desired results. We need to display songs that are underrated and recently posted.
I imagine a modification to the logic as follows:

  1. We decide upon certain lower and upper thresholds for which a song maybe decided to be underrated. These thresholds maybe applied on custom_popularity or more conveniently on YouTube views since we are currently supporting only YouTube links. For example views < 300000 and views > 500
  2. We make a join on user_posts and link with the above conditions and sort the resulting posts by date.

Add youtube-id parameter wherever applicable

Currently, there is no uniformity in youtube links, there are both https://www.youtube.com/watch?v=bR8sE9ubyTI and youtu.be/bR8sE9ubyTI, If we could also have a parameter which just stores the Youtube's video id bR8sE9ubyTI as a separate string.

This video ID is rarely changed throughout youtube's URLs and APIs and thus is essential to have separate.

This can be fixed by having a function that separates the video ID for both types of URL, but it will be more sensible in backend.

Extend get_post to fetch likes and comments

Once #2 is complete, extend get_post to get information about users who have reacted to the post (and what reaction) and the comments on the post.

From the comments in each post, it is only required to find any links posted by other users. If so, maintain them in the object. (Storing this data can be handled later) It is not necessary to maintain the 'comment' -> 'comment reply' hierarchy. It is okay to keep all links simply as a list of 'children' (say).

New method for saving a post to db

To start with, the structure of 'posts' can be like this:

{'_id': ObjectId('...'),
 'author': 'Naresh',
 'created_at': '',
 'reactions': [...],
 'message': 'Awesome song 1!',
 ...}
{'_id': ObjectId('...'),
 'author': 'Mayank',
 'created_at': '',
 'reactions': [...],
 'message': 'Awesome song 2!',
 ...}

The list of fields to include are in #2 and #3 (although those 2 issues are not a prerequisite for this one).

Once #4 is done, write a new method that adds a new post to the DB with the above fields as parameters.

Setup a filter so we only return active YouTube links

NOTE: Do not pick this up yet until we finalize a plan on how to solve it.

To start with, we should investigate what happens when we send links that are either deleted/blocked. If the player can automatically skip it, we should start there rather than having to poll YouTube at regular intervals to check if links are active. Such a change would also require a change to the database schema.

Handle cases with no Spotify data

Currently metadata-extractor raises an exception when no data is found on Spotify and the link is not being inserted in PostgreSQL from the dump in Mongo. Many links do not have data on Spotify, thus resulting in a loss of songs that are posted. We can catch the Spotify exception by creating a UserPost and a Link object with no corresponding Song and not show the Spotify data on the frontend.

Write a lambda function to rotate Spotify Access Token

For now, we've setup the SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET environment variables in our AWS Secrets Manager. Since these tokens expire and as a good practice, it'd be nice to setup automatic rotation of these every fixed number of days. AWS Secrets Manager allows for writing a Lambda function to rotate keys and provide the new keys back to it which it updates automatically.

For Spotify,here's their guide to authorization workflow: https://developer.spotify.com/documentation/general/guides/authorization-guide/#client-credentials-flow

Add 'Date' column to Link

The table link does not have adate column, that will be needed to query posts by date. The data can be taken from the created_time field in the corresponding document in MongoDB

Handle YouTube Music links

Luckily, the v parameter for YouTube Music and YouTube is the same. This way, we can handle YouTube Music links just by extracting this id and passing it as a YouTube link.

Unify all the config management in one place

Currently, the codebase has config management for different parts scattered all over the place, making it harder to track down what's where. Cleanup the code to unify all configuration to one place.

Fix the Facebook token refresh logic

  • On first run, when FB_LONG_ACCESS_TOKEN is not available, the GraphQL request fails and throws an exception, even though the long access token is fetched as part of the flow. Change this to retry automatically after failure.
  • Occasionally, the long access token is not written back to the environment file correctly (gets written without a newline, corrupting the previous property too). Fix this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.