emcf / thepipe Goto Github PK

View Code? Open in Web Editor NEW

647.0 647.0 45.0 3.64 MB

Feed PDFs, URLs, Slides, YouTube, GitHub, and more into Vision-Language models with one line of code ⚡

Home Page: https://thepi.pe

License: MIT License

Python 64.38% C++ 0.15% CSS 0.23% C 0.08% Jupyter Notebook 34.51% TypeScript 0.65%

gpt-4 large-language-models multimodal pdf scrapers vision-transformer web youtube

thepipe's People

Contributors

Stargazers

Watchers

thepipe's Issues

Feature requests 🔨

Accepting requests features in this thread, please feel free to suggest!
The roadmap so far includes:

Cloud storage extraction (Google Drive, OneDrive)
E-Commerce platform extraction (Amazon, Ebay)
Alternative version control platforms (GItLab, Atlassian)

Video frame + transcript extraction

Looking to support extraction of mp4, mov, webm, avi files as well as youtube for a Vision-Language model (not a video model)

Video files vary widely in duration, so due to the context window limitations of LLMs in 2024, this implementation would likely extract a constant number of images rather than extract a variable number of images at a constant frame rate.

Audio is not standard in commercial multimodal models today. Because of this, I am also looking to provide the option to transcribe audio from the video.

emcf / thepipe Goto Github PK

thepipe's People

Contributors

Stargazers

Watchers

Forkers

thepipe's Issues

Feature requests 🔨

Video frame + transcript extraction

Make docker image

Audio transcript extraction

No longer working after addition of THEPIPE_API_KEY

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent