Giter Site home page Giter Site logo

haifengchui / 10-kgpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from doriansmiley/10-kgpt

0.0 0.0 0.0 730 KB

Analyze 10-Q and 10-K fillings with GPT

Home Page: https://github.com/doriansmiley/10-kGPT

JavaScript 3.10% TypeScript 1.51% CSS 0.01% HTML 95.38%

10-kgpt's Introduction

10-kGPT

IMPORTANT: this project is very early in development. Expect issues!!!

Background

I started this aas a TypeScript app but I got a CORS block from the SEC. So I moved it to Node. The script will scrape the SEC site for 10Q's for Palantir then ask GPT to summerize and save the ouput to the responses directory.

It's important to note you can't just pass the entire filling or you will hit GPT token limits. This is why I parse the tables from the page HTML, which is the most dogshit HTML I have ever seen.

Help Wanted

I'd like to:

  • Figure out how to summerize the responses and get a final score on whether to invest
  • Store SEC filings, the inputs to OpenAI, and responses in a vector database
  • Implement LangChain or Haystack to generate and retrieve summaries using a supported LLM
  • Process all the returned filings

The responses need to be improved as well. We need to track the URL of the 10-Q that the page comes from. Also, I have ssen the model outputing wierd lage numbers at the bootom of the page in some cases. This prompt may need improvement: Indicate the page number using ${fileName}. The unltimate goal is to audit the responses and rate them 0-7 for quality after a manual review of the 10-Q.

It's important to note you can't just pass the entire filling or you will hit GPT token limits.

Setup

npm install

Run

  • Create the pages directory. This is where pages are saved for research puposes. I ignore this directory since it will result in a large number of files. After creating the pages directory
  • Creatre your .env file and add OPENAI_API_KEY=<YOUR_KEY>, SEC_API_KEY=<YOUR_KEY> and SEC_API_ENDPOINT=https://api.sec-api.io/
  • run npm run start $TICKER passing the ticker symbol of the ocmpany you want to analyze.

This will run node node-index passing the ticker symbok you specify and save responses to the responses directory. The script currently only analyzes the most rescent 10-Q.

You can view output in the responses directory. The chunks used to generate each response is under responses/secAPI. This can be useful to audit what inputs were used to generatr the response.

Please note all output will be overwritten with each run. This is to reduce file build up.

10-kgpt's People

Contributors

doriansmiley avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.