Giter Site home page Giter Site logo

lazzzer / llm-structurizer Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 4.0 903 KB

LLM-Structurizer is an API that allows you to structure your data with the power of Large Language Models.

License: MIT License

JavaScript 0.65% TypeScript 98.71% Dockerfile 0.65%
langchain-js large-language-models llm nestjs structured-data typescript

llm-structurizer's Introduction

Hi there, I'm Lazar πŸ‘‹

I'm a Software Engineer from Switzerland πŸ‡¨πŸ‡­. I'm a graduate from HEIG-VD with a Bachelor's degree in Computer Science πŸŽ“. I'm passionate about every aspect of web development and have a growing interest in iOS development.

Feel free to check out my repositories and connect with me on LinkedIn !

🌱 I’m currently learning

Spring

Swift



πŸ”₯ Tech Stack

Languages

Typescript

PHP

Java

Kotlin

Go

Rust

C++


Frameworks

React

Vue

Next.js

Nuxt

Laravel

Adonis

Nest

TailwindCSS



πŸŽ“ HEIG-VD Projects


Lazzzer's GitHub stats

llm-structurizer's People

Contributors

dependabot[bot] avatar lazzzer avatar

Stargazers

 avatar

Watchers

 avatar

llm-structurizer's Issues

Create an authentication guard

Description

As a developer, I would like to to protect my endpoints with an auth guard so only registered applications can use the available resources.

Acceptance criteria

  • The API has a auth guard requiring an API key to continue the processing of a request.
  • The controllers can use the guard to refuse unauthenticated requests.
  • (Nice to have) the API keys are stored in a SQL table.
  • The guard contains appropriate tests.

Requirements

  • None.

Notes

The first acceptable implementation can have hard-coded API keys. If there is time, the implementation will be improved to verify the given keys in the database.

Links :

Create a llm service using LangChain tools

Description

As a developer, I would like to have a service capable of communicating with LLMs through the LangChain library so I can use it for my data structuring endpoints.

Acceptance criteria

  • The service is designed to accept multiple configured LLMs. Only OpenAI models are required.
  • The service is designed to be agnostic about its data format outputs.
  • The service uses Chains and PromptTemplates to generate its requests to the LLMs.
  • The service should be able process text in multiple steps if needed. The Refine technique is required.
  • The service implementation contains appropriate tests.

Requirements

  • None.

Notes

Link :

Create structured-data resource

Description

As a user, I would like to use endpoints of the API to process my unstructured texts so I can obtain processable data in specific formats such as JSON.

Acceptance criteria

  • The structured-data resource exposes REST endpoints that accept text inputs.
  • The structured-data resource is correctly documented in the OpenAPI specification.
  • The structured-data resource must at least provide JSON objects as structured data.
  • The structured-data POST endpoints enable pertinent fields to configure the text structuration.
  • The structured-data resource exposes endpoint(s) that can verify previous text structurations.
  • The structured-data resource implementation contains appropriate tests.

Requirements

Notes

Links :

Implement modifications discussed in sprint 1 review

Description

As a developer, I would like to have the implemented modifications from the last sprint review so my API has the desired features and specifications.

Acceptance criteria

  • The global app rate-limiter can be increased.
  • The PDF validation doesn't need to check if the content starts with the magic number.
  • 422 error codes of PDF text parsing need to be better worded.
  • Users give their own API key for the language model they want to use. They can choose any available model.
  • The refine option in json text structuring is a nullable object used to specify the wanted chunking size and overlap.
  • The refine option should display the number of calls made to the LLM in the result.
  • The text structuring endpoints accept a nullable debug field to append the generation logs to the result.
  • 422 errors for text structuring should be more explanatory (request failed because of context size?) if possible.
  • The analysis endpoint can also provide an explanation field.

Requirements

  • None.

Create an endpoint to verify a json output

Description

As a user, I would like to have an endpoint that analyse a generated output and give suggestions for potential corrections so I can speed up the verification of the generated structured data.

Acceptance criteria

  • The controller provides an endpoint accepting the json output and the original text and returning an analysis.
  • The controller implementation contains appropriate tests.

Requirements

Handle PDF load from urls

Description

As a user, I would like to have an endpoint accepting urls of my PDFs so I can easily provide my files to process them afterwards for the text extraction.

Acceptance criteria

  • The API exposes an endpoints allowing giving an url of the PDF location.
  • The API should handle invalid urls resulting from wrong formats and files too big to process.

Requirements

Create generic prompt endpoint

Description

As a user, I would like to have a generic endpoint that accept any kind of prompt and return a specific output so I can use LLM functionalities that are not available with current endpoints.

Acceptance criteria

  • The controller provides an endpoint accepting a prompt and return a correctly formatted response.
  • The controller implementation contains appropriate tests.

Requirements

Notes

This functionnality is the chosen simplified implementation of the planned SQL queries processing capability of the service. It will not be part of the retrievers module and will be instead a simplified and generic option dealing with json objects.

Write documentation in readme

Description

As a developer, I would like to have a clear and concise readme for this application so that new contributors can easily understand the project, run it and contribute to it.

Acceptance criteria

  • The readme should clearly describe the project’s purpose and features.
  • The readme should include instructions on how to set up the development environment.
  • The readme should include clear and concise examples of how to use the project and run the tests.

Requirements

  • None.

Create parsers resource

Description

As a user, I would like to use endpoints of the API to process my files so I can extract the unstructured text from them.

Acceptance criteria

  • The parsers resource exposes REST endpoints that accept file uploads or urls.
  • The parsers resource is correctly documented in the OpenAPI specification.
  • The parsers resource must at least be able to process searchable PDF files.
  • The parsers resource processes safely the files and return the generated text in its responses.
  • The parsers resource implementation contains appropriate tests.

Requirements

Create endpoints to output json structured data

Description

As a user, I would like to have endpoints producing json objects so I can extract structured data in this format.

Acceptance criteria

  • The controller provides fields to configure which available LLM to choose.
  • The controller accepts valid json schemas for its extractions.
  • The controller provides an endpoint to make one-shot extractions.
  • The controller implementation contains appropriate tests.

Requirements

Create nice to have REST resources

Description

As an user, I would like to have additional endpoints and REST resources so I can register and authorise my application and enjoy additional functionalities provided by LLMs such as question answering over my data.

Acceptance criteria

  • The API exposes a retrievers resource that can query external data stores and return response in natural language.
  • The API exposes a applications resource that handles the registration and authorisation of applications.
  • The API exposes a auth resource that handles API keys generations and management.

Requirements

Notes:

Update [2023-07-11] :

The criterias will not be fullfilled for the last sprint of this project. A generic prompt implementation will be available instead as a nice to have feature.

Create must-have REST resources

Description

As a user, I would like to use the resources of the API at my disposal so I can process my unstructured data and extract
meaningful informations from it.

Acceptance criteria

  • The API exposes a parsers resource that handles text extraction from files such as PDF.
  • The API exposes a structured-data resource that process unstructured text and returns structured object like JSON files.

Requirements

Create docker-compose file for local deployment

Description

As a developer, I would like to have a docker-compose file ready to run so I can easily run the project locally with all its dependencies configured.

Acceptance criteria

  • The docker-compose file has an custom docker image of the API.
  • The Dockerfile of the custom is based on a Ubuntu image.

Requirements

  • None.

Handle PDF uploads

Description

As a developer, I would like to have a service handling the uploads of PDF file so I can process them easily afterwards for the text extraction.

Acceptance criteria

  • The API exposes an endpoints allowing PDF uploads.
  • The API should handle invalid uploads resulting from wrong formats and files too big to process.

Requirements

Notes

Link :

Set up Github CI

Description

As a developer, I would like to implement a CI pipeline so I can automatize my testing workflow as much as possible.

Acceptance criteria

  • The CI should launch all the unit tests in Pull Request.
  • The CI should launch a build of the project in Pull Request.
  • The set up of the CI should follow the recommandations of the blog post in the notes.

Requirements

  • None.

Notes

Link :

Configure a logger

Description

As a developer, I would like to have a logger so I can easily track and monitor the execution of my API.

Acceptance criteria

  • The logger is configured with different level of logs depending of the env.
  • The logger should never display sensitive data such as the users API key in production.
  • The logs are displayed on the console.

Requirements

  • None.

Notes

Link :

Handle PDF text extraction and post-processing

Description

As a developer, I would like to use a service or a package to extract the text from uploaded PDFs so I can post-process the text and add it to the endpoint response.

Acceptance criteria

  • The API should extract text from searchable PDF while preserving the layout of the file as much as possible.
  • The API handles gracefully any error of PDF extraction.
  • The API post-process the initial text to reduce its size but with a minimum of layout configuration.

Requirements

Notes

Link :

Set up and configure the API

Description

As a developer, I need to set up multiple packages such an ORM or OpenAPI and create configurations so I can develop easily my API by leveraging the tools at my disposal.

Acceptance criteria

  • The API handles its database with an ORM
  • The API generates a OpenAPI specification
  • The API is versioned and configured with environment variables
  • The API has security features such as entity validation, CORS and rate limiters
  • The developers can easily write tests

Requirements

  • None.

Notes

Links :

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.