pashpashpash / vault-ai Goto Github PK

OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database). Upload your own custom knowledge base files (PDF, txt, epub, etc) using a simple React frontend.

Home Page: https://vault.pash.city

License: MIT License

JavaScript 33.92% Less 29.57% Go 33.66% Shell 1.11% HTML 1.74%

chatgpt go golang knowledge-base long-term-memory machine-learning openai pdf-support pinecone question-answering

vault-ai's People

Contributors

Stargazers

Watchers

Forkers

maxthraxx jessemgmd cheizr mrtnbm jcontre905 scienceartist jeqele tianzewang liu1700 brylcreem aneta2112 xandykati98 deutschluz gregoryg primordiallabs annias fokenpi mrdjtools cheesegoat anitatailor qbitnaut dbhurley dhamma-dev jmackles wrightmikea babyblue26 gmtisrad wpappdev playhousehosting amene-org troy-lamerton puppetrob mearman justinohms nate2427 phi-line twigwam akamil-etsy devguyrash oreomc payneinbrklyn rolltidehero wembassyco curiosity007 joyxof vcpandya martinmagala rondiniw uchemanuels jasmin25 iandennismiller knowlimit oscarsantillana remriel philrosenthal pikipupiba bimehta eltociear yuliu669 hikarukun benoitddlp wizkid1968 amponce goswamig saltedbone pllz7 yakumwamba blm666 vanhuycntt gudaostudio tngamemo ziqian2023 mbaneshi alexmp1312 mmholdings skordio curtisjones00 alejo0284 ceresinvent wilkiee javanomad anonyfox zzhsec silver556 dan-dean superdev-kingd meezyart bytesyzed oxxio buphnezz chillskellingtom adam2am fozz2k hkgill clande matiaspaulez oijoijcoiejoijce williamtran29 loftwah hjgraca

vault-ai's Issues

How to update OPEN AI API KEY?

After filling the form, the server will automatically choose the api key from the form (and not from the secret/openai_api_key file anymore) but how to change openai api key after? Because the form only shows one time.
Many thanks

Document Size Issue

I'm having an issue persisting my embeddings to my vector DB. Whenever I upload a pdf or text file I hit a quote error
Reason: Error getting embeddings: error, status code: 429, message: You exceeded your current quota, please check your plan and billing details.

I tried a 61MB pdf and then a 500kb txt file and both yielded the same error message. What size documents do you recommend? Do you prepare your data in any way before upload? Or is a paid plan for Pinecone needed to persist book-sized ( or larger) text files?

I uploaded sonnet 18 by Shakespeare and broke it ;)

I uploaded Shall I compare thee to a summer’s .txt
and got...

Error: 500 | PANIC: runtime error: index out of range [0] with length 0 goroutine 11080 [running]: github.com/codegangsta/negroni.(*Recovery).ServeHTTP.func1() /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/recovery.go:159 +0xc7 panic({0x9446c0, 0xc02a4fcca8}) /usr/local/go/src/runtime/panic.go:838 +0x207 github.com/pashpashpash/vault/vault-web-server/postapi.UploadHandler({0x7f9d49f90508?, 0xc0002030d0}, 0xc0139eb200) /home/nik/vault-ai/vault-web-server/postapi/fileupload.go:125 +0x12bf net/http.HandlerFunc.ServeHTTP(0xc0139eb100?, {0x7f9d49f90508?, 0xc0002030d0?}, 0x2190?) /usr/local/go/src/net/http/server.go:2084 +0x2f github.com/gorilla/mux.(*Router).ServeHTTP(0xc000298000, {0x7f9d49f90508, 0xc0002030d0}, 0xc0139eb000) /home/nik/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0x1cf github.com/codegangsta/negroni.Wrap.func1({0x7f9d49f90508, 0xc0002030d0}, 0x907640?, 0xc0003ce940) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/negroni.go:46 +0x4b github.com/codegangsta/negroni.HandlerFunc.ServeHTTP(0xab6?, {0x7f9d49f90508?, 0xc0002030d0?}, 0x4315?, 0x795b?) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/negroni.go:29 +0x33 github.com/codegangsta/negroni.middleware.ServeHTTP({{0xa506c0?, 0xc00000e120?}, 0xc00000e168?}, {0x7f9d49f90508, 0xc0002030d0}, 0x4b7717?) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/negroni.go:38 +0xb6 github.com/codegangsta/negroni.(*Logger).ServeHTTP(0xc0000205a0, {0x7f9d49f90508?, 0xc0002030d0}, 0xc0139eb000, 0xc0003ce920) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/logger.go:62 +0x92 github.com/codegangsta/negroni.middleware.ServeHTTP({{0xa4f9c0?, 0xc0000205a0?}, 0xc00000e150?}, {0x7f9d49f90508, 0xc0002030d0}, 0x7f9d49f8cc18?) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/negroni.go:38 +0xb6 github.com/codegangsta/negroni.(*Recovery).ServeHTTP(0xc00004ac00?, {0x7f9d49f90508?, 0xc0002030d0?}, 0xc004c43c18?, 0xc004c43c18?) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/recovery.go:193 +0x86 github.com/codegangsta/negroni.middleware.ServeHTTP({{0xa4fa00?, 0xc000290050?}, 0xc00000e138?}, {0x7f9d49f90508, 0xc0002030d0}, 0xc001743368?) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/negroni.go:38 +0xb6 github.com/codegangsta/negroni.(*Negroni).ServeHTTP(0xc000020570, {0xa52f80?, 0xc0002030c8}, 0xc001743368?) /home/nik/go/pkg/mod/github.com/codegangsta/[email protected]/negroni.go:96 +0x125 net/http.serverHandler.ServeHTTP({0x652700?}, {0xa52f80, 0xc0002030c8}, 0xc0139eb000) /usr/local/go/src/net/http/server.go:2916 +0x43b net/http.initALPNRequest.ServeHTTP({{0xa53808?, 0xc00724e4b0?}, 0xc001743180?, {0xc0000220e0?}}, {0xa52f80, 0xc0002030c8}, 0xc0139eb000) /usr/local/go/src/net/http/server.go:3523 +0x245 net/http.(*http2serverConn).runHandler(0xa527c0?, 0xe3fd48?, 0x0?, 0x0?) /usr/local/go/src/net/http/h2_bundle.go:5911 +0x78 created by net/http.(*http2serverConn).processHeaders /usr/local/go/src/net/http/h2_bundle.go:5641 +0x59b

Error - MODULE_NOT_FOUND

When i run command npm run dev i am getting below error.

FBT@pcadmins-MacBook-Pro vault-ai % sudo npm run dev

[email protected] dev
webpack --progress --watch

[webpack-cli] Failed to load '/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/webpack.config.js' config
[webpack-cli] Error: Cannot find module 'os-browserify/browser'
Require stack:

/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/webpack.config.js
/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/lib/webpack-cli.js
/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/lib/bootstrap.js
/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/bin/cli.js
/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack/bin/webpack.js
at Module._resolveFilename (node:internal/modules/cjs/loader:1075:15)
at Function.resolve (node:internal/modules/cjs/helpers:116:19)
at Object. (/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/webpack.config.js:101:25)
at Module._compile (node:internal/modules/cjs/loader:1254:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
at Module.load (node:internal/modules/cjs/loader:1117:32)
at Module._load (node:internal/modules/cjs/loader:958:12)
at Module.require (node:internal/modules/cjs/loader:1141:19)
at require (node:internal/modules/cjs/helpers:110:18)
at WebpackCLI.tryRequireThenImport (/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/lib/webpack-cli.js:216:22) {
code: 'MODULE_NOT_FOUND',
requireStack: [
'/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/webpack.config.js',
'/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/lib/webpack-cli.js',
'/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/lib/bootstrap.js',
'/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack-cli/bin/cli.js',
'/Users/FBT/Desktop/Projects/Chatdoc/vault-ai/node_modules/webpack/bin/webpack.js'
]
}

can anyone please suggest on how to resolve this error ?

References to github.com/pashpashpash/vault cut over to https://github.com/pashpashpash/vault-ai

Hello!

Super interested in playing around with this. There are many references to "vault" repository, but it seems like it has been renamed to vault-ai.

This leads to
fatal: repository 'https://github.com/pashpashpash/vault/' not found

I am working through these but ran into other versioning issues with resty/v2. I am playing around and will make a PR if I can resolve, but figured I would document the issue.

security vulnerability cross realm object access

1 high severity vulnerability

To address all issues, run:
npm audit fix
(vault) peter@peter vault-ai % npm audit

npm audit report

webpack 5.0.0 - 5.75.0
Severity: high
Cross-realm object access in Webpack 5 - GHSA-hc6q-2mpp-qw7j
fix available via npm audit fix
node_modules/webpack

[feature request] create a docker image to run this locally?

Hi! Very excited about this project. You could potentially get a ton of adoption if you created a docker image to run this? Thanks!

localhost

Everything looks fine in my terminal but I definite can't see anything on localhost

"apple@appurunoiMac vault-ai-master % npm run dev

[email protected] dev
webpack --progress --watch

assets by chunk 169 KiB (id hint: vendors)
asset vendors-node_modules_react-dropzone_dist_es_index_js-node_modules_uuid_dist_esm-browser_v4_js.bundle.js 133 KiB [emitted] (id hint: vendors)
asset vendors-node_modules_react-router-dom_es_Link_js-node_modules_url-parse_index_js-node_modules-b6a711.bundle.js 35.9 KiB [emitted] (id hint: vendors)
asset bundle.js 1.18 MiB [emitted] (name: app)
asset components_Pages_LandingPage_index_jsx.bundle.js 105 KiB [emitted]
asset components_Header_index_jsx.bundle.js 24 KiB [emitted]
orphan modules 42 KiB [orphan] 27 modules
runtime modules 6.98 KiB 10 modules
modules by path ./node_modules/ 1.19 MiB 68 modules
modules by path ./components/ 93.9 KiB
modules by path ./components/.less 22.8 KiB 6 modules
modules by path ./components/Util/.jsx 7.81 KiB 4 modules
modules by path ./components/Pages/LandingPage/ 20.7 KiB 3 modules
modules by path ./components/Header/ 8.88 KiB 3 modules
modules by path ./components/Page/ 6.87 KiB 3 modules
modules by path ./components/Footer/ 19.2 KiB 3 modules
modules by path ./components/*.jsx 5.14 KiB
./components/index.jsx 430 bytes [built] [code generated]
./components/routes.jsx 4.72 KiB [built] [code generated]
./components/Go/index.jsx 2.48 KiB [built] [code generated]
webpack 5.64.0 compiled successfully in 3558 ms"

What File Types are supported?

Is there a possibility to also upload CSV oder Excel Files?

Plans for Fully OpenSource/Free Tools

Do you have any plans to use a fully free and opensource tools/api instead of OpenAI?
Also retrieve data from a private corpus, like Google Drive?

pdfinfo not found

Hi I am trying to run it on my windows and I got %PATH% is not found. I set up correctly my env variable in windows. Like this C:\Program Files\poppler-0.68.0\bin under system variable path.

Am I supposed to see "Enter your OpenAI API key here.." field on http://localhost:8100/ ?

After some problems I managed to run the server - but not like in the README.md , the page at localhost has additional field for OpenAI API key. Is this supposed to happen, as the pictures in instructions do not have that field?

I could not get the server to start with "npm start", so I used "go run vault-web-server/main.go", which seemed to work fine.

With or without providing the API key the memory is not functioning, the response is always just "Great, what can I assist you with?".
Here is output from the server:
...
2023/05/19 22:09:14 [upsertEmbeddingsToPinecone] Created pinecone upsert request with namespace = (redacted)
2023/05/19 22:09:14 Successfully added pinecone embeddings!
[negroni] May 19 22:02:54 | 200 | 6m19.876298409s
POST /upload
2023/05/19 22:19:10 [QuestionForm] Validated: What is vault-ai?
2023/05/19 22:19:10 [QuestionHandler] Question: What is vault-ai?
2023/05/19 22:19:10 [QuestionHandler] Model: GPT Turbo
2023/05/19 22:19:10 [QuestionHandler] UUID: (redacted)
2023/05/19 22:19:10 [QuestionHandler] ApiKey: (redacted)
2023/05/19 22:19:10 [QuestionHandler] Using provided custom API key: (redacted)
2023/05/19 22:19:12 [retrieve] Querying pinecone namespace: 1bd4e..(redacted)
2023/05/19 22:19:13 [QuestionHandler] Got matches from Pinecone: []
2023/05/19 22:19:13 [QuestionHandler] Retrieved context from Pinecone:
[]
2023/05/19 22:19:13 [QuestionHandler] Sending OpenAI api request...
Prompt:
2023/05/19 22:19:15 [QuestionHandler] OpenAI response:
Great, what can I assist you with?
[negroni] May 19 22:19:10 | 200 | 4.770990359s
POST /api/questions
[negroni] May 19 22:19:40 | 200 | 381.056µs
GET /

Error upserting embeddings to Pinecone

Just got the site up and running locally on windows. Whenever I try to upload a file, I either get a timeout after ~5 minutes or I receive the following error after a couple seconds. I've tried both txt and pdf files but am getting the same result.

Total chunks: 144
Total embeddings: 144
Embeddings length: 1536
2023/04/22 11:01:23 [upsertEmbeddingsToPinecone] Created pinecone upsert request with namespace =  78de054e-37fd-4669-9922-8dc1c8267755
2023/04/22 11:01:24 [UploadHandler ERR] Error upserting embeddings to Pinecone:
[negroni] Apr 22 11:01:21 | 200 | 3.0222837s
          POST /upload

Error regarding "source"

I attempted to run vault-ai locally, but I'm getting the following error, both on my Ubuntu and Windows install:

'source' is not recognized as an internal or external command,
operable program or batch file.
npm ERR! code 1
npm ERR! path C:\Users\user\Documents\vault-ai\vault-ai
npm ERR! command failed
npm ERR! command C:\WINDOWS\system32\cmd.exe /d /s /c source ./scripts/source-me.sh && ./scripts/go-compile.sh ./vault-web-server

I do have NPM 19.2.0 installed and the latest Golang.

I was thinking of making a Dockerfile, but I'm getting the same error when attempted to do that there, regardless of the base image.

How accurate is this?

How accurate will this be? Will it just do a search on the data and return the closest possible match? Or can I upload a large amount of data and have it analyse the data to give me inference from one possible document from the data?

Seems to forget about earlier documents

If I upload a few documents, then it seems to forget about ones that I uploaded earlier. Is there a limit to the number of documents or tokens it will store per user?

Use it from the command line

Any way to use this from the command line?

PDF not analyzed, always answers with "Great, how may I assist you today?"

Hi all,
no matter what document I upload, it never gives me an answer. Why might that be?

Thank you!

Error: NetworkError when attempting to fetch resource.

Keep getting this error every time I upload.
All endpoints seem to be correct.

quota exceeded...when I know it is a good openAI API key

Failed Files:

rutte speech.txt
Reason: Error getting embeddings: error, status code: 429, message: You exceeded your current quota, please check your plan and billing details.

npm install Error "sh: 1: source: not found"

I'm trying to install the packages on fresh WSL Ubuntu

$ uname -a
Linux Ms6RB 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ npm -v
9.6.5

node -v
v19.0.0

$ go version
go version go1.20.3 linux/amd64

when I try npm install:

> [email protected] postinstall
> source ./scripts/source-me.sh && ./scripts/go-compile.sh ./vault-web-server

sh: 1: source: not found
npm ERR! code 127
npm ERR! path /home/ms6rb/vault-ai
npm ERR! command failed
npm ERR! command sh -c source ./scripts/source-me.sh && ./scripts/go-compile.sh ./vault-web-server

npm ERR! A complete log of this run can be found in: /home/ms6rb/.npm/_logs/2023-04-22T10_52_12_207Z-debug-0.log

Log:

110 timing auditReport:init Completed in 105ms
111 timing reify:audit Completed in 671ms
112 timing reify Completed in 1344ms
113 timing command:install Completed in 1411ms
114 verbose stack Error: command failed
114 verbose stack     at ChildProcess.<anonymous> (/home/ms6rb/.nvm/versions/node/v19.0.0/lib/node_modules/dules/npm/node_modules/@npmcli/promise-spawn/lib/index.js:53:27)
114 verbose stack     at ChildProcess.emit (node:events:513:28)
114 verbose stack     at maybeClose (node:internal/child_process:1098:16)
114 verbose stack     at ChildProcess._handle.onexit (node:internal/child_process:304:5)
115 verbose pkgid [email protected]
116 verbose cwd /home/ms6rb/vault-ai
117 verbose Linux 5.15.90.1-microsoft-standard-WSL2
118 verbose node v19.0.0
119 verbose npm  v9.6.5
120 error code 127
121 error path /home/ms6rb/vault-ai
122 error command failed
123 error command sh -c source ./scripts/source-me.sh && ./scripts/go-compile.sh ./vault-web-server
124 verbose exit 127
125 timing npm Completed in 1560ms
126 verbose code 127
127 error A complete log of this run can be found in: /home/ms6rb/.npm/_logs/2023-04-22T10_52_12_207Z-debug-debug-0.log

Error 401

Getting the following error when trying to upload documents:

[UploadHandler ERR] Error getting embeddings: error, status code: 401, message: 
[negroni]

Running it locally.

EDIT: Looks like this error is related with outdated or expired API OpenAI credentials.

Comparison with GPT4 & LangChain Chatbot for large PDF docs

Curious how this project compares with the more popular GPT4 & LangChain Chatbot for large PDF docs by @mayooear

bin/vault-web-server does not exist

=> Environment Variables Loaded
-> Installing './vault-web-server' dependencies
-> Compiling './vault-web-server'
... done

sh: 1: ./bin/vault-web-server: not found

Any idea ?

Feature Request: Use Chroma instead of Pinecone

Rationale: I would like to set this up locally and Pinecone only has a paid subscription model, if you are not whitelisted through the waiting list.

I would like to set this up using a local running version of Chroma.

Feature Request: Make UUID optional

I see the database is now compartmentalized by UUID. This makes sense for your public demo, however if I am running my own private copy, I don't need this functionality (i.e. I'd like everyone to see the same DB).

Increase token length of 8191

after increasing MAX_FILE_SIZE and MAX_TOTAL_UPLOAD_SIZE in fileupload.go ,
I uploaded an epub and after a while ran into this error.

The book is 45mb.

^[[?1;0c^[[?1;0c^[[?1;0c^[[?1;0c^[[?1;0c^[[?1;0c2023/04/20 09:01:19 [UploadHandler ERR] Error getting embeddings: error, status code: 400, message: This model's maximum context length is 8191 tokens, however you requested 8230 tokens (8230 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
[negroni] Apr 20 08:54:39 | 200 | 6m40.613778887s
          POST /upload
          ```

is there a way to increase the amount of tokens, or chunk them appropriately for larger files?

Uploadhandler Error: API key is incorrect

When I tried to upload a PDF for testing, I got an error message saying my API was wrong, but I'm pretty confident that the API key I was using is correct. Any advice on this error? Thanks a lot!

"""
2023/04/23 16:52:36 [UploadHandler ERR] Error getting embeddings: error, status code: 401, message: Incorrect API key provided: echo "sk********************************************************************_key. You can find your API key at https://platform.openai.com/account/api-keys.
"""

MISSING OPENAI API KEY ENV VARIABLE

Getting the following error. Running it on Ubuntu.

root@localhost:~/vault-ai/secret# npm start

[email protected] start
bash -c 'source ./scripts/source-me.sh && ./scripts/go-compile.sh ./vault-web-server' && echo && ./bin/vault-web-server

=> Environment Variables Loaded
-> Installing './vault-web-server' dependencies
-> Compiling './vault-web-server'
... done

2023/04/19 16:24:50 [Config] Loaded ./config/ files
2023/04/19 16:24:50 MISSING OPENAI API KEY ENV VARIABLE
root@localhost:~/vault-ai/secret#

Deploy on Vercel

Is it possible to deploy the website on Vercel (similar to https://github.com/mckaywrigley/chatbot-ui)?

Do you support uploading code libraries (such as GitHub links) so that they can read and understand the source code?

I have discovered a C++source code library. What should I do to let GPT help me understand the source code?

Error extracting text from PDF exec: "pdftotext": executable file not found in %PATH%

I followed this tutorial for windows:
https://www.reddit.com/r/ChatGPT/comments/12qbrmw/comment/jhptv12/?utm_source=reddit&utm_medium=web2x&context=3

and I'm getting this error:

Error extracting text from PDF exec: "pdftotext": executable file not found in %PATH%
[negroni] Apr 18 14:44:58 | 200 | 26.1222ms
POST /upload

Error: 413 | Request body size exceeds the limit

Hi I am running the program locally and getting this error: Error: 413 | Request body size exceeds the limit whenever I upload a file of 26mb PDF. Please assist.

Context Does Not Provide Information?

Hello,

How might I improve response quality's reasoning? Is there a setting, prompt technique, or limitation I need to know more about? Does this solution lack some sort of access to ChatGPT's training data as context and reasoning capability?

I've attached screen shots. There seems to be a notable difference in response quality when I compare this solution's responses with ChatGPT responses.

Whereas ChatGPT has capacity to reason and improvise within its responses, this solution's responses seem more like web search results.

When I provide the same content as reference and use the same prompt, ChatGPT offers what I consider to be a much higher quality response. ChatGPT understood what 'urgent vs important' framework meant, without my assistance, and applied that framework to the reference material.

By contrast, this solution seems to be focused on where literal relevance in uploaded documents might be found.

Thank you for sharing this solution with the community. I am encouraged by its potential.

Failed Files - Reason: Error extracting text from PDF

Hi! I am running the code locally, and I keep getting the following error when trying to upload a PDF file:

Failed Files: Reason: Error extracting text from PDF

Has anyone encountered this problem and knows how to solve it?

Thanks!

Multiple file uploads overwriting previous embeddings

	vectors := make([]PineconeVector, len(embeddings))
	for i, embedding := range embeddings {
		chunk := chunks[i]
		vectors[i] = PineconeVector{
			ID:     fmt.Sprintf("id-%d", i),
			Values: embedding,
			Metadata: map[string]string{
				"file_name": chunk.Title,
				"start":     strconv.Itoa(chunk.Start),
				"end":       strconv.Itoa(chunk.End),
				"title":     chunk.Title,
				"text":      chunk.Text,
			},
		}
	}

This works well for batched uploads, though the previous embeddings are overwritten for multiple uploads. UUID's would allow for multiple uploads unless this was intentional to prevent your Pinecone instance from becoming massive. If that's the case, perhaps you could have a public flag, which would use some other ID scheme for private instances?

installation on m1 Mac changes to instructions

homebrew for the manual dependencies
brew install golang, poppler, node@19

Reason: Error upserting embeddings to Pinecone...unsupported protocol scheme ""

Error:

Failed Files:

    <foo>.pdf
    Reason: Error upserting embeddings to Pinecone: Post "<url>.pinecone.io/vectors/upsert": unsupported protocol scheme ""

To reproduce:
Follow the setup instructions as per README on 20/4/23, upload a small file (mine was ~4mb after changing the allowed filesize), then try to ask a question - this error would then show up on the UI

Delete vector

Thank you so much for sharing. Any chance of adding a delete vector call from the front end? (Unless I'm missing something).

2 PDF articles: error, status code: 400, message: This model's maximum context length is 4097 tokens

Uploading 2 PDF docs to test. Returns the error

error, status code: 400, message: This model's maximum context length is 4097 tokens. However, you requested 4126 tokens (3614 in the messages, 512 in the completion). Please reduce the length of the messages or completion.

Deploy to Azure

Is it possible to deploy the app on Azure? If so can you provide instructions?

"Problem loading page"

I've been trying to setup vault-ai within a docker container, and I'm able to install everything with latest Go, Node v19, poppler.

After resolving the "source" issue mentioned here, I progressed to the actual install, start, and run commands. With two docker terminals running, one is listening on :8100 and the other is running npm run dev. However when I visit http://localhost:8100, I get a problem loading the page.

This seems like a similar issue from this comment.

This is the output of my npm run dev:

> [email protected] dev
> webpack --progress --watch

assets by status 1.31 MiB [emitted]
  asset bundle.js 1.18 MiB [emitted] (name: app)
  asset vendors-node_modules_react-dropzone_dist_es_index_js-node_modules_uuid_dist_esm-browser_v4_js.bundle.js 133 KiB [emitted] [compared for emit] (id hint: vendors)
asset components_Pages_LandingPage_index_jsx.bundle.js 105 KiB [compared for emit]
asset vendors-node_modules_react-router-dom_es_Link_js-node_modules_url-parse_index_js-node_modules-b6a711.bundle.js 35.9 KiB [compared for emit] (id hint: vendors)
asset components_Header_index_jsx.bundle.js 24 KiB [compared for emit]
orphan modules 42 KiB [orphan] 27 modules
runtime modules 7.01 KiB 11 modules
modules by path ./node_modules/ 1.19 MiB 80 modules
modules by path ./components/ 93.9 KiB
  modules by path ./components/*.less 22.8 KiB 6 modules
  modules by path ./components/Util/*.jsx 7.81 KiB 4 modules
  modules by path ./components/Pages/LandingPage/ 20.7 KiB 3 modules
  modules by path ./components/Header/ 8.88 KiB 3 modules
  modules by path ./components/Page/ 6.87 KiB 3 modules
  modules by path ./components/Footer/ 19.2 KiB 3 modules
  modules by path ./components/*.jsx 5.14 KiB
    ./components/index.jsx 430 bytes [built] [code generated]
    ./components/routes.jsx 4.72 KiB [built] [code generated]
  ./components/Go/index.jsx 2.48 KiB [built] [code generated]
webpack 5.80.0 compiled successfully in 1530 ms

Anyone have any tips?

Issue: pdftotext not found in %PATH%

I see no mention of it in this repo, but I can't upload files as I get this error

          GET /js/components_Pages_LandingPage_index_jsx.bundle.js
2023/04/18 14:44:58 [UploadHandler] UUID= b9258226-d99b-4925-a100-c6c065555aa1
2023/04/18 14:44:58 [UploadHandler ERR] Error extracting text from PDF exec: "pdftotext": executable file not found in %PATH%
[negroni] Apr 18 14:44:58 | 200 | 26.1222ms
          POST /upload

I've copied pdftotext.ext and updated the path.  The problem is on windows it needs to be called with ./pdftotext.ext.  Would it be plausible to switch to https://pypi.org/project/poppler-utils/ to simplify deployment?  This would help with containerizing the app.

looks like pdfinfo.exe is need.

for a quick fix I added import "os" and added the path to the upload function:
```func UploadHandler(w http.ResponseWriter, r *http.Request) {

	os.Setenv("PATH", os.Getenv("PATH")+";C:\\Users\\[username]")```

Stuck in the step Installing './vault-web-server' dependencies

Hi, there
I was Stuck in the very first step, Installing './vault-web-server' dependencies。
I have tried many times, and changed a few different VPN, but none of them worked.
I am using go version go1.20.4 darwin/amd64, and node version v18.16.0, which is not the same version (V19) used by the developer pashpashpash
Is the node version made this error?

Evrytime it ended like the following errors:
=> Environment Variables Loaded
-> Installing './vault-web-server' dependencies
go: code.sajari.com/[email protected]: Get "https://proxy.golang.org/code.sajari.com/docconv/@v/v1.3.5.mod": 
dial tcp 172.217.160.113:443: i/o timeout
   ... error
npm ERR! code 1

Looking for help, I would be very grateful！

Chunking code makes odd assumption

func CreateChunks(fileContent string, window int, stride int, title string) []Chunk {
	sentences := strings.Split(fileContent, ".") // assuming sentences end with a period
	newData := make([]Chunk, 0)

	for i := 0; i < len(sentences)-window; i += stride {
		iEnd := i + window
		text := strings.Join(sentences[i:iEnd], ". ")
		start := 0
		end := 0

		if i > 0 {
			start = len(strings.Join(sentences[:i], ". ")) + 2 // +2 for the period and space
		}

		end = len(strings.Join(sentences[:iEnd], ". "))

		newData = append(newData, Chunk{
			Start: start,
			End:   end,
			Title: title,
			Text:  text,
		})
	}

	return newData
}

Based on the source, this seems to assume that the minimum document will have 20 sentences. Anything less than 20 sentences does not appear to create any embeddings. This probably isn't the desired result. It would probably be better to chunk based on token count rather than sentence count.

Documents not saving to knowledge base?

I have gotten the project up and running and the functionality works correctly - but when I upload multiple documents. It does not appear to be referencing them in their entirety. Only the last document that was uploaded.

It also appears to not be saving the data that was stored.

It's entirely likely I am doing something wrong - but I was curious if there's anything I can check to ensure it's working properly.

Node v19 - can we use node v18?

Odd numbers are non-LTS versions.

npm unable to find file

npm install fails, logs have

117 verbose Linux 5.10.0-5mx-amd64
118 verbose node v19.9.0
119 verbose npm  v9.6.5
120 error code ENOENT
121 error syscall spawn "/bin/bash"
122 error path /home/jthancoc/git/llms/vault-ai
123 error errno -2
124 error enoent spawn "/bin/bash" ENOENT
125 error enoent This is related to npm not being able to find a file.
125 error enoent

Responses appear to contain erroneous and inconsistent counts

Issue: When prompting OP Vault to count the number of times a given word appears in the uploaded 1.7 MB XML data file, OP vault returns an erroneous and inconsistent count.

Steps to reproduce:

Upload a document.
Prompt OP Vault: "How many times does the word interoperability appear?"
2.1 OP Vault responds: The word "interoperability" appears 6 times in the given context
Prompt OP Vault: "Count the number of times the word interoperability appears"
3.1 OP Vault responds: The word "interoperability" appears 9 times in the given context

In fact, the word interoperability appears 281 times in the given context.

Add instructions on what to setup in Pinecone

I've not used Pinecone before so I don't know the right config and I keep getting issues. Can you add a guide so that a beginner can create a pinecone index please 👍