Giter Site home page Giter Site logo

transitive-bullshit / yt-semantic-search Goto Github PK

View Code? Open in Web Editor NEW
511.0 9.0 45.0 2.18 MB

OpenAI-powered semantic search for any YouTube playlist โ€“ featuring the All-In Podcast. ๐Ÿ’ช

Home Page: https://all-in-on-ai.vercel.app

License: MIT License

Shell 0.07% JavaScript 1.69% TypeScript 80.43% CSS 17.81%
openai pinecone podcast search youtube

yt-semantic-search's Introduction

yt-semantic-search's People

Contributors

sanjayk0508 avatar transitive-bullshit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yt-semantic-search's Issues

Error Code :429

After running the command : npx tsx src/bin/process-yt-playlist.ts

I am getting this error :

error upserting video's embeddings, server/pinecone.ts
error upserting transcripts for video bXLZ8I7s8tw How to Start Your First Business - The CASTLE Method Error: Request failed with status code 429
at createError (/Users/amzamani/Desktop/project/ai-powered-search/yt-semantic-search/node_modules/axios/lib/core/createError.js:16:15)
at settle (/Users/amzamani/Desktop/project/ai-powered-search/yt-semantic-search/node_modules/axios/lib/core/settle.js:17:12)
at IncomingMessage.handleStreamEnd (/Users/amzamani/Desktop/project/ai-powered-search/yt-semantic-search/node_modules/axios/lib/adapters/http.js:322:11)
at IncomingMessage.emit (node:events:525:35)
at IncomingMessage.emit (node:domain:489:12)
at endReadableNT (node:internal/streams/readable:1359:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
config: {
transitional: {
silentJSONParsing: true,
forcedJSONParsing: true,
clarifyTimeoutError: false
},
adapter: [Function: httpAdapter],
transformRequest: [ [Function: transformRequest] ],
transformResponse: [ [Function: transformResponse] ],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: {
Accept: 'application/json, text/plain, /',
'Content-Type': 'application/json',
'User-Agent': 'OpenAI/NodeJS/3.2.1',
Authorization: 'Bearer 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',
'Content-Length': 562
},
method: 'post',
data: {"input":"resources that'll help you because this video is going to be ridiculously long and so we're just going to pile all the other information straight into that template again it's completely free just hit the link down below and you can download it if you like so let's start with step one of the castle method and that is conceptualize and to illustrate this I want to tell you a little bit about Cedric the strapping young rad must be Cedric am I right Cedric is an 18 year old who has just graduated from school and he","model":"text-embedding-ada-002"},
url: 'https://api.openai.com/v1/embeddings'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype] {
abort: [Function (anonymous)],
aborted: [Function (anonymous)],
connect: [Function (anonymous)],
error: [Function (anonymous)],
socket: [Function (anonymous)],
timeout: [Function (anonymous)],
finish: [Function: requestOnFinish]
},
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: false,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: false,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: 562,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: false,
socket: TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
secureConnecting: false,
_SNICallback: null,
servername: 'api.openai.com',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object: null prototype],
_eventsCount: 10,
connecting: false,
_hadError: false,
_parent: null,
_host: 'api.openai.com',
_closeAfterHandlingError: false,
_readableState: [ReadableState],
_maxListeners: undefined,
_writableState: [WritableState],
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: [TLSWrap],
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [Circular *1],
[Symbol(res)]: [TLSWrap],
[Symbol(verified)]: true,
[Symbol(pendingSession)]: null,
[Symbol(async_id_symbol)]: 140,
[Symbol(kHandle)]: [TLSWrap],
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBuffer)]: null,
[Symbol(kBufferCb)]: null,
[Symbol(kBufferGen)]: null,
[Symbol(kCapture)]: false,
[Symbol(kSetNoDelay)]: false,
[Symbol(kSetKeepAlive)]: true,
[Symbol(kSetKeepAliveInitialDelay)]: 60,
[Symbol(kBytesRead)]: 0,
[Symbol(kBytesWritten)]: 0,
[Symbol(connect-options)]: [Object]
},
_header: 'POST /v1/embeddings HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: application/json\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer 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\r\n' +
'Content-Length: 562\r\n' +
'Host: api.openai.com\r\n' +
'Connection: close\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: Agent {
_events: [Object: null prototype],
_eventsCount: 2,
_maxListeners: undefined,
defaultPort: 443,
protocol: 'https:',
options: [Object: null prototype],
requests: [Object: null prototype] {},
sockets: [Object: null prototype],
freeSockets: [Object: null prototype] {},
keepAliveMsecs: 1000,
keepAlive: false,
maxSockets: Infinity,
maxFreeSockets: 256,
scheduling: 'lifo',
maxTotalSockets: Infinity,
totalSocketCount: 4,
maxCachedSessions: 100,
_sessionCache: [Object],
[Symbol(kCapture)]: false
},
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
path: '/v1/embeddings',
_ended: true,
res: IncomingMessage {
_readableState: [ReadableState],
_events: [Object: null prototype],
_eventsCount: 4,
_maxListeners: undefined,
socket: [TLSSocket],
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
rawHeaders: [Array],
rawTrailers: [],
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 429,
statusMessage: 'Too Many Requests',
client: [TLSSocket],
_consuming: false,
_dumped: false,
req: [Circular *1],
responseUrl: 'https://api.openai.com/v1/embeddings',
redirects: [],
[Symbol(kCapture)]: false,
[Symbol(kHeaders)]: [Object],
[Symbol(kHeadersCount)]: 22,
[Symbol(kTrailers)]: null,
[Symbol(kTrailersCount)]: 0
},
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: Writable {
_writableState: [WritableState],
_events: [Object: null prototype],
_eventsCount: 3,
_maxListeners: undefined,
_options: [Object],
_ended: true,
_ending: true,
_redirectCount: 0,
_redirects: [],
_requestBodyLength: 562,
_requestBodyBuffers: [],
_onNativeResponse: [Function (anonymous)],
_currentRequest: [Circular *1],
_currentUrl: 'https://api.openai.com/v1/embeddings',
[Symbol(kCapture)]: false
},
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kEndCalled)]: true,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype] {
accept: [Array],
'content-type': [Array],
'user-agent': [Array],
authorization: [Array],
'content-length': [Array],
host: [Array]
},
[Symbol(kUniqueHeaders)]: null
},
response: {
status: 429,
statusText: 'Too Many Requests',
headers: {
date: 'Thu, 11 May 2023 08:17:01 GMT',
'content-type': 'application/json; charset=utf-8',
'content-length': '206',
connection: 'close',
vary: 'Origin',
'x-request-id': '1ff19f3ed973180354c754c5a0147026',
'strict-transport-security': 'max-age=15724800; includeSubDomains',
'cf-cache-status': 'DYNAMIC',
server: 'cloudflare',
'cf-ray': '7c58fbaeaec3f4da-BOM',
'alt-svc': 'h3=":443"; ma=86400, h3-29=":443"; ma=86400'
},
config: {
transitional: [Object],
adapter: [Function: httpAdapter],
transformRequest: [Array],
transformResponse: [Array],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: [Object],
method: 'post',
data: {"input":"resources that'll help you because this video is going to be ridiculously long and so we're just going to pile all the other information straight into that template again it's completely free just hit the link down below and you can download it if you like so let's start with step one of the castle method and that is conceptualize and to illustrate this I want to tell you a little bit about Cedric the strapping young rad must be Cedric am I right Cedric is an 18 year old who has just graduated from school and he","model":"text-embedding-ada-002"},
url: 'https://api.openai.com/v1/embeddings'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype],
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: false,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: false,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: 562,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: false,
socket: [TLSSocket],
_header: 'POST /v1/embeddings HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: application/json\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer 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\r\n' +
'Content-Length: 562\r\n' +
'Host: api.openai.com\r\n' +
'Connection: close\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: [Agent],
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
path: '/v1/embeddings',
_ended: true,
res: [IncomingMessage],
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: [Writable],
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kEndCalled)]: true,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype],
[Symbol(kUniqueHeaders)]: null
},
data: { error: [Object] }
},
isAxiosError: true,
toJSON: [Function: toJSON]
}

@transitive-bullshit @sanjayk0508 can you guys suggest any fix ?

Question (not an issue): How are you splitting transcript text?

Hi there, love what you've built. Very cool use case for a great podcast :)

I was wondering, how did you split up the transcripts text? Did you experiment with sentences, paragraphs or just text blocks?

Starting to play with this, but keep finding different best practise on text splitting.

Question to help with understanding

Hey this project looks great! Thanks for releasing it. I'm probably being slow here, but trying to understand, what is the unique selling point of semantic search in relation to the example queries? Some of the example queries seem to just return exact phrase matches, or partial word matches - but not sure how semantic search plays a role here. I see some queries return results that don't appear to have specific word matches, so presumably, that's the benefit of this approach? E.g. searching for 'space' returns a result that has 'universe'. If that's intentional, that's pretty cool!

Curious about a few other things:

  • If we set this up ourselves, can you share some example OpenAI costs you've incurred for an example playlist?
  • How easy would it be to adapt this so we can write queries that use playlist video transcripts as context. My basic understanding is you can inject transcripts into the prompt itself, which OpenAI can use to derive an answer from, but I guess that's not practical to send a ton of transcripts in each query e.g.
    • User asks: "Why are wages suffering?"
    • This web app responds: "People are seeing wages destroyed by inflation, food and gas prices."
    • Which it hopefully learnt from: 'E86: Macro outlook': "...down they're seeing their wages get destroyed by inflation food and gas prices being much higher so there's a really good reason why people are sentiment is so negative out there..."

Such a great project, thanks again.

Doubt in setting up the project

  1. Is running the command npx tsx src/bin/generate-thumbnails.ts optional during setting up of the project ? if yes, would my project run without putting the GOOGLE_STORAGE_BUCKET ?
  2. In .env file , what value to put in PINECONE_NAMESPACE, in my pinecone instance I am unable to find any namespace to put in, can I put in "" empty strings, will that work ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.