10-kGPT

IMPORTANT: this project is very early in development. Expect issues!!!

Background

I started this aas a TypeScript app but I got a CORS block from the SEC. So I moved it to Node. The script will scrape the SEC site for 10Q's for Palantir then ask GPT to summerize and save the ouput to the responses directory.

It's important to note you can't just pass the entire filling or you will hit GPT token limits. This is why I parse the tables from the page HTML, which is the most dogshit HTML I have ever seen.

Help Wanted

I'd like to:

Figure out how to summerize the responses and get a final score on whether to invest
Store SEC filings, the inputs to OpenAI, and responses in a vector database
Implement LangChain or Haystack to generate and retrieve summaries using a supported LLM
Process all the returned filings

The responses need to be improved as well. We need to track the URL of the 10-Q that the page comes from. Also, I have ssen the model outputing wierd lage numbers at the bootom of the page in some cases. This prompt may need improvement: Indicate the page number using ${fileName}. The unltimate goal is to audit the responses and rate them 0-7 for quality after a manual review of the 10-Q.

It's important to note you can't just pass the entire filling or you will hit GPT token limits.

Setup

npm install

Run

Create the pages directory. This is where pages are saved for research puposes. I ignore this directory since it will result in a large number of files. After creating the pages directory
Creatre your .env file and add OPENAI_API_KEY=<YOUR_KEY>, SEC_API_KEY=<YOUR_KEY> and SEC_API_ENDPOINT=https://api.sec-api.io/
run npm run start $TICKER passing the ticker symbol of the ocmpany you want to analyze.

This will run node node-index passing the ticker symbok you specify and save responses to the responses directory. The script currently only analyzes the most rescent 10-Q.

You can view output in the responses directory. The chunks used to generate each response is under responses/secAPI. This can be useful to audit what inputs were used to generatr the response.

Please note all output will be overwritten with each run. This is to reduce file build up.

haifengchui / 10-kgpt Goto Github PK

10-kgpt's Introduction

10-kGPT

Background

Help Wanted

Setup

Run

10-kgpt's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent