transitive-bullshit / yt-semantic-search Goto Github PK

View Code? Open in Web Editor NEW

511.0 9.0 45.0 2.18 MB

OpenAI-powered semantic search for any YouTube playlist – featuring the All-In Podcast. 💪

Home Page: https://all-in-on-ai.vercel.app

License: MIT License

Shell 0.07% JavaScript 1.69% TypeScript 80.43% CSS 17.81%

openai pinecone podcast search youtube

yt-semantic-search's Introduction

transitivebullsh.it

Shortcut to my personal portfolio site.

Usage

npx transitive-bullshit

transitivebullsh.it - Source for the portfolio site itself.

License

MIT © Travis Fischer

Support my OSS work by following me on twitter twitter

yt-semantic-search's People

Contributors

Stargazers

Watchers

Forkers

leoncoe fangd123 jimmc414 rmarji separac shervinr saientropy zavier-sanders mysticaltech unbanksytv akshay5995 neocybereth crywas indianappguy starascendin leroyg joshdance hwasiti ayoubeth alexbehrens sanjayk0508 harish-garg vat99 opxnai amcleay abhisheksuresh2 ihanif weolopez kozakroman rogervaas mammarai wdshin rossman22590 suryasinghmv kn-neeraj vanhoof12 gpt-col hkgill amzamani dylan-albertazzi demircancelebi justaigithub postpcera khadherinc mdroidian

yt-semantic-search's Issues

Error Code :429

After running the command : npx tsx src/bin/process-yt-playlist.ts

I am getting this error :

error upserting video's embeddings, server/pinecone.ts
error upserting transcripts for video bXLZ8I7s8tw How to Start Your First Business - The CASTLE Method Error: Request failed with status code 429
at createError (/Users/amzamani/Desktop/project/ai-powered-search/yt-semantic-search/node_modules/axios/lib/core/createError.js:16:15)
at settle (/Users/amzamani/Desktop/project/ai-powered-search/yt-semantic-search/node_modules/axios/lib/core/settle.js:17:12)
at IncomingMessage.handleStreamEnd (/Users/amzamani/Desktop/project/ai-powered-search/yt-semantic-search/node_modules/axios/lib/adapters/http.js:322:11)
at IncomingMessage.emit (node:events:525:35)
at IncomingMessage.emit (node:domain:489:12)
at endReadableNT (node:internal/streams/readable:1359:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
config: {
transitional: {
silentJSONParsing: true,
forcedJSONParsing: true,
clarifyTimeoutError: false
},
adapter: [Function: httpAdapter],
transformRequest: [ [Function: transformRequest] ],
transformResponse: [ [Function: transformResponse] ],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: {
Accept: 'application/json, text/plain, /',
'Content-Type': 'application/json',
'User-Agent': 'OpenAI/NodeJS/3.2.1',
Authorization: 'Bearer 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',
'Content-Length': 562
},
method: 'post',
data: {"input":"resources that'll help you because this video is going to be ridiculously long and so we're just going to pile all the other information straight into that template again it's completely free just hit the link down below and you can download it if you like so let's start with step one of the castle method and that is conceptualize and to illustrate this I want to tell you a little bit about Cedric the strapping young rad must be Cedric am I right Cedric is an 18 year old who has just graduated from school and he","model":"text-embedding-ada-002"},
url: 'https://api.openai.com/v1/embeddings'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype] {
abort: [Function (anonymous)],
aborted: [Function (anonymous)],
connect: [Function (anonymous)],
error: [Function (anonymous)],
socket: [Function (anonymous)],
timeout: [Function (anonymous)],
finish: [Function: requestOnFinish]
},
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: false,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: false,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: 562,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: false,
socket: TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
secureConnecting: false,
_SNICallback: null,
servername: 'api.openai.com',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object: null prototype],
_eventsCount: 10,
connecting: false,
_hadError: false,
_parent: null,
_host: 'api.openai.com',
_closeAfterHandlingError: false,
_readableState: [ReadableState],
_maxListeners: undefined,
_writableState: [WritableState],
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: [TLSWrap],
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [Circular *1],
[Symbol(res)]: [TLSWrap],
[Symbol(verified)]: true,
[Symbol(pendingSession)]: null,
[Symbol(async_id_symbol)]: 140,
[Symbol(kHandle)]: [TLSWrap],
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBuffer)]: null,
[Symbol(kBufferCb)]: null,
[Symbol(kBufferGen)]: null,
[Symbol(kCapture)]: false,
[Symbol(kSetNoDelay)]: false,
[Symbol(kSetKeepAlive)]: true,
[Symbol(kSetKeepAliveInitialDelay)]: 60,
[Symbol(kBytesRead)]: 0,
[Symbol(kBytesWritten)]: 0,
[Symbol(connect-options)]: [Object]
},
_header: 'POST /v1/embeddings HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: application/json\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer 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\r\n' +
'Content-Length: 562\r\n' +
'Host: api.openai.com\r\n' +
'Connection: close\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: Agent {
_events: [Object: null prototype],
_eventsCount: 2,
_maxListeners: undefined,
defaultPort: 443,
protocol: 'https:',
options: [Object: null prototype],
requests: [Object: null prototype] {},
sockets: [Object: null prototype],
freeSockets: [Object: null prototype] {},
keepAliveMsecs: 1000,
keepAlive: false,
maxSockets: Infinity,
maxFreeSockets: 256,
scheduling: 'lifo',
maxTotalSockets: Infinity,
totalSocketCount: 4,
maxCachedSessions: 100,
_sessionCache: [Object],
[Symbol(kCapture)]: false
},
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
path: '/v1/embeddings',
_ended: true,
res: IncomingMessage {
_readableState: [ReadableState],
_events: [Object: null prototype],
_eventsCount: 4,
_maxListeners: undefined,
socket: [TLSSocket],
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
rawHeaders: [Array],
rawTrailers: [],
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 429,
statusMessage: 'Too Many Requests',
client: [TLSSocket],
_consuming: false,
_dumped: false,
req: [Circular *1],
responseUrl: 'https://api.openai.com/v1/embeddings',
redirects: [],
[Symbol(kCapture)]: false,
[Symbol(kHeaders)]: [Object],
[Symbol(kHeadersCount)]: 22,
[Symbol(kTrailers)]: null,
[Symbol(kTrailersCount)]: 0
},
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: Writable {
_writableState: [WritableState],
_events: [Object: null prototype],
_eventsCount: 3,
_maxListeners: undefined,
_options: [Object],
_ended: true,
_ending: true,
_redirectCount: 0,
_redirects: [],
_requestBodyLength: 562,
_requestBodyBuffers: [],
_onNativeResponse: [Function (anonymous)],
_currentRequest: [Circular *1],
_currentUrl: 'https://api.openai.com/v1/embeddings',
[Symbol(kCapture)]: false
},
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kEndCalled)]: true,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype] {
accept: [Array],
'content-type': [Array],
'user-agent': [Array],
authorization: [Array],
'content-length': [Array],
host: [Array]
},
[Symbol(kUniqueHeaders)]: null
},
response: {
status: 429,
statusText: 'Too Many Requests',
headers: {
date: 'Thu, 11 May 2023 08:17:01 GMT',
'content-type': 'application/json; charset=utf-8',
'content-length': '206',
connection: 'close',
vary: 'Origin',
'x-request-id': '1ff19f3ed973180354c754c5a0147026',
'strict-transport-security': 'max-age=15724800; includeSubDomains',
'cf-cache-status': 'DYNAMIC',
server: 'cloudflare',
'cf-ray': '7c58fbaeaec3f4da-BOM',
'alt-svc': 'h3=":443"; ma=86400, h3-29=":443"; ma=86400'
},
config: {
transitional: [Object],
adapter: [Function: httpAdapter],
transformRequest: [Array],
transformResponse: [Array],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: [Object],
method: 'post',
data: {"input":"resources that'll help you because this video is going to be ridiculously long and so we're just going to pile all the other information straight into that template again it's completely free just hit the link down below and you can download it if you like so let's start with step one of the castle method and that is conceptualize and to illustrate this I want to tell you a little bit about Cedric the strapping young rad must be Cedric am I right Cedric is an 18 year old who has just graduated from school and he","model":"text-embedding-ada-002"},
url: 'https://api.openai.com/v1/embeddings'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype],
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: false,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: false,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: 562,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: false,
socket: [TLSSocket],
_header: 'POST /v1/embeddings HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: application/json\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer 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\r\n' +
'Content-Length: 562\r\n' +
'Host: api.openai.com\r\n' +
'Connection: close\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: [Agent],
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
path: '/v1/embeddings',
_ended: true,
res: [IncomingMessage],
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: [Writable],
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kEndCalled)]: true,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype],
[Symbol(kUniqueHeaders)]: null
},
data: { error: [Object] }
},
isAxiosError: true,
toJSON: [Function: toJSON]
}

@transitive-bullshit @sanjayk0508 can you guys suggest any fix ?

Question (not an issue): How are you splitting transcript text?

Hi there, love what you've built. Very cool use case for a great podcast :)

I was wondering, how did you split up the transcripts text? Did you experiment with sentences, paragraphs or just text blocks?

Starting to play with this, but keep finding different best practise on text splitting.

Question to help with understanding

Hey this project looks great! Thanks for releasing it. I'm probably being slow here, but trying to understand, what is the unique selling point of semantic search in relation to the example queries? Some of the example queries seem to just return exact phrase matches, or partial word matches - but not sure how semantic search plays a role here. I see some queries return results that don't appear to have specific word matches, so presumably, that's the benefit of this approach? E.g. searching for 'space' returns a result that has 'universe'. If that's intentional, that's pretty cool!

Curious about a few other things:

If we set this up ourselves, can you share some example OpenAI costs you've incurred for an example playlist?
How easy would it be to adapt this so we can write queries that use playlist video transcripts as context. My basic understanding is you can inject transcripts into the prompt itself, which OpenAI can use to derive an answer from, but I guess that's not practical to send a ton of transcripts in each query e.g.
- User asks: "Why are wages suffering?"
- This web app responds: "People are seeing wages destroyed by inflation, food and gas prices."
- Which it hopefully learnt from: 'E86: Macro outlook': "...down they're seeing their wages get destroyed by inflation food and gas prices being much higher so there's a really good reason why people are sentiment is so negative out there..."

Such a great project, thanks again.

Doubt in setting up the project

Is running the command npx tsx src/bin/generate-thumbnails.ts optional during setting up of the project ? if yes, would my project run without putting the GOOGLE_STORAGE_BUCKET ?
In .env file , what value to put in PINECONE_NAMESPACE, in my pinecone instance I am unable to find any namespace to put in, can I put in "" empty strings, will that work ?

Instructions for setting up the project

Thanks for working on this project.

can you please help by documenting the instructions on how to set up this project?

Yt semantics

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.