Giter Site home page Giter Site logo

gptpdf's Introduction

gptpdf

CN doc EN doc

Using VLLM (like GPT-4o) to parse PDF into markdown.

Our approach is very simple (only 293 lines of code), but can almost perfectly parse typography, math formulas, tables, pictures, charts, etc.

Average cost per page: $0.013

This package use GeneralAgent lib to interact with OpenAI API.

Process steps

  1. Use the PyMuPDF library to parse the PDF to find all non-text areas and mark them, for example:

  1. Use a large visual model (such as GPT-4o) to parse and get a markdown file.

DEMO

See examples/attention_is_all_you_need/output.md for PDF examples/attention_is_all_you_need.pdf.

Installation

pip install gptpdf

Usage

from gptpdf import parse_pdf
api_key = 'Your OpenAI API Key'
content, image_paths = parse_pdf(pdf_path, api_key=api_key)
print(content)

See more in test/test.py

API

parse_pdf(pdf_path, output_dir='./', api_key=None, base_url=None, model='gpt-4o', verbose=False)

parse pdf file to markdown file, and return markdown content and all image paths.

  • pdf_path: pdf file path

  • output_dir: output directory. store all images and markdown file

  • api_key: OpenAI API Key (optional). If not provided, Use OPENAI_API_KEY environment variable.

  • base_url: OpenAI Base URL. (optional). If not provided, Use OPENAI_BASE_URL environment variable.

  • model: OpenAI Vision Large Model, default is 'gpt-4o'. You also can use qwen-vl-max, GLM-4V by change the OPENAI_API_BASE or specify base_url.

  • verbose: verbose mode

  • gpt_worker: gpt parse worker number. default is 1. If your machine performance is good, you can increase it appropriately to improve parsing speed.

gptpdf's People

Contributors

cosmosshadow avatar zrzrzrzrzrzrzr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.