lttkgp / c-3po Goto Github PK
View Code? Open in Web Editor NEWThe metadata overlord and API server for LTTKGP
Home Page: https://api.lttkgp.com
License: MIT License
The metadata overlord and API server for LTTKGP
Home Page: https://api.lttkgp.com
License: MIT License
Right now, we use a NAT Gateway in our infrastructure to allow resources inside a public subnet to access resources present inside of a private subnet (All part of the same VPC). AWS mandates creating ECS services inside of a private subnet which is why we need a NAT Gateway to route connections to resources inside private subnets (Needs to be looked into again to confirm)
The issue with having a NAT Gateway is that it is the major contributor to infrastructure costs. NAT seems an expendable resource inside of the infra if the issue with routing is fixed and hence we need to find a solution to either remove the need for private subnets (Hence removing the need for a NAT) or access resources in private subnets without a NAT (Which is unlikely if not impossible, what's the point of a private subnet otherwise)
The first heading says "Why, hello there!". I am not sure if you wrote "Why" over there or if it is a typo.
If it is a typo please let me correct it
An environment variable GOOGLE_APPLICATION_CREDENTIALS
needs to be added to the .env file as required by metadata-extractor
.
Currently, we are not using the embeddable
attribute made available by metadata-extractor
. As a result, there are unavailable or blocked videos on the frontend. We need to modify the logic in insert_link
so that it only inserts if the link is embeddable.
Add a search endpoint that accpets search parameter and returns a list of posts that match the the searched phase.
The preferred way would be a fuzzy search.
I think an update in the documentation is due ๐
We need to update docs about:
This involves:
Use pymongo to communicate with MongoDB. The script should be able to create a new db
(named 'c3po') and initialize a collection
named 'posts', if they don't exist already. If they do, should be able to load them (ie. use them).
A pre-commit hook can be set up to run any command/script before changes are committed. Running sanity.sh as a pre-commit hook would ensure all commits/PRs are properly formatted, as expected by the project.
As of now, since we're deploying both the Slack application and the Postgres DB on the server, it might be worth using docker-compose to manage the deployment of both of them.
One of the best features that we can implement is to show posts with songs that are largely undiscovered.
Name for the endpoint can be discovered
or something. Open for discussion ๐
Currently, get_feed
fetches only the first page. To learn more about how pagination works in the Graph API, check out their documentation at https://developers.facebook.com/docs/graph-api/using-graph-api/#paging.
To start with, fetch all posts until the very first by following the pagination links returned by the API. This is needed to populate our db (when we have one).
The ideal dev experience for a new contributor should be less than 3 steps. So, we should ensure that they don't have to search around for how to fill all environment variables.
The default endpoint can do nothing but return a static string (like "Hello LTTKGP!").
It might be a good idea to adopt a code style guideline for our work. This might enforce some good coding and documentation practices within the project.
We could use Dodgy
, Isort
, Pycodestyle
and Pydocstyle
and set up a script to run them before pushing or sending a PR. I believe setting up Travis would also be a good idea to make sure everything is fine in terms of coding style.
Add a random endpoint that returns random tracks from the entire post history.
This would help in having different/fresh content without forcing people to use specific filters etc when the filters in feed are implemented.
Also, It will be a fun jukebox experience and would help in reviving old posts that are almost impossible to find in the group.
We are setting the timeframe for popular posts as one week before the current date. We should change this to one week prior to the latest post in the DB
We could use ES to index queries using the /search endpoint.
This would require:
As of now, insert_metadata
fails with an Invalid Client
error when Spotify key expires / becomes invalid (when?). Handle this scenario gracefully so that parsing still succeeds without metadata.
For now, we cache Python pip
dependencies using:
- name: Cache Python Dependencies
uses: actions/cache@v2
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
${{ runner.os }}-
(Ref -> test.yml)
This doesn't seem to work for our use case and we need to fix it to properly cache dependencies
Add a method get_post
to use the id
from each post in get_feed
to fetch the following information:
Information regarding these fields can be found here: https://developers.facebook.com/docs/graph-api/reference/v2.11/post
We're using AWS Secrets Manager to manage our API Keys and prod environment variables. AWS charges on a per secret basis monthly and we can remove some (not-so-private) keys from there.
We could include the following keys directly in our task configuration:
SPOTIFY_REDIRECT_URI
SPOTIPY_CLIENT_ID
(The client secret still stays on secrets manager of course)Currently, we are sorting on the custom_popularity
column while fetching the data from the DB. However, this is not producing the desired results. We need to display songs that are underrated and recently posted.
I imagine a modification to the logic as follows:
custom_popularity
or more conveniently on YouTube views since we are currently supporting only YouTube links. For example views < 300000
and views > 500
user_posts
and link
with the above conditions and sort the resulting posts by date.Currently, there is no uniformity in youtube links, there are both https://www.youtube.com/watch?v=bR8sE9ubyTI and youtu.be/bR8sE9ubyTI, If we could also have a parameter which just stores the Youtube's video id bR8sE9ubyTI as a separate string.
This video ID is rarely changed throughout youtube's URLs and APIs and thus is essential to have separate.
This can be fixed by having a function that separates the video ID for both types of URL, but it will be more sensible in backend.
Once #2 is complete, extend get_post to get information about users who have reacted to the post (and what reaction) and the comments on the post.
From the comments in each post, it is only required to find any links posted by other users. If so, maintain them in the object. (Storing this data can be handled later) It is not necessary to maintain the 'comment' -> 'comment reply' hierarchy. It is okay to keep all links simply as a list of 'children' (say).
To start with, the structure of 'posts' can be like this:
{'_id': ObjectId('...'),
'author': 'Naresh',
'created_at': '',
'reactions': [...],
'message': 'Awesome song 1!',
...}
{'_id': ObjectId('...'),
'author': 'Mayank',
'created_at': '',
'reactions': [...],
'message': 'Awesome song 2!',
...}
The list of fields to include are in #2 and #3 (although those 2 issues are not a prerequisite for this one).
Once #4 is done, write a new method that adds a new post to the DB with the above fields as parameters.
NOTE: Do not pick this up yet until we finalize a plan on how to solve it.
To start with, we should investigate what happens when we send links that are either deleted/blocked. If the player can automatically skip it, we should start there rather than having to poll YouTube at regular intervals to check if links are active. Such a change would also require a change to the database schema.
Currently metadata-extractor
raises an exception when no data is found on Spotify and the link is not being inserted in PostgreSQL from the dump in Mongo. Many links do not have data on Spotify, thus resulting in a loss of songs that are posted. We can catch the Spotify exception by creating a UserPost
and a Link
object with no corresponding Song
and not show the Spotify data on the frontend.
For now, we've setup the SPOTIPY_CLIENT_ID
and SPOTIPY_CLIENT_SECRET
environment variables in our AWS Secrets Manager. Since these tokens expire and as a good practice, it'd be nice to setup automatic rotation of these every fixed number of days. AWS Secrets Manager allows for writing a Lambda function to rotate keys and provide the new keys back to it which it updates automatically.
For Spotify,here's their guide to authorization workflow: https://developer.spotify.com/documentation/general/guides/authorization-guide/#client-credentials-flow
Currently, during development, the Docker image needs to be rebuilt every time there is a change, which slows down testing. Hot reloading would make this a lot smoother. This is quite trivial - fun fix for first-timers.
Tutorial: https://stackoverflow.com/a/44344442/4396392
Setup AWS Cloudwatch alerts to notify when something is wrong and integrate with the slack workspace
The table link
does not have adate
column, that will be needed to query posts by date. The data can be taken from the created_time
field in the corresponding document in MongoDB
Luckily, the v
parameter for YouTube Music and YouTube is the same. This way, we can handle YouTube Music links just by extracting this id and passing it as a YouTube link.
Currently, the codebase has config management for different parts scattered all over the place, making it harder to track down what's where. Cleanup the code to unify all configuration to one place.
FB_LONG_ACCESS_TOKEN
is not available, the GraphQL request fails and throws an exception, even though the long access token is fetched as part of the flow. Change this to retry automatically after failure.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.