Giter Site home page Giter Site logo

doctran's People

Contributors

ayan-bandyopadhyay avatar jasonwcfan avatar maciejwie avatar mikeg0 avatar shuyttr avatar zym66 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

doctran's Issues

Get Token Usage

Like in Langchain and OpenAI API usage, we can get the token usage, is there any functionality in Doctran for the same or anyway to get the token usage?

Add more documentation to onboard ppl

** Requirements

  • python .. required (upgrade to 3.10)
  • poetry .. required (using same version of python)
  • OpenAI key to test

** Installation

  • poetry install && poetry shell

** Tests

  • Show how to test --> get your emails in a folder etc..
  • Show expected results from tests - show examples--> Add a Readme for each test
  • Maybe Add notebooks for each tests

** More

  • Add links to understand the library hierarchy
  • Add links to OpenAI function calling tutos and stuff

Missing [] around `properties` in README

This code block from the README didn't work for me

from doctran import ExtractProperty

properties = ExtractProperty(
    name="millenial_or_boomer", 
    description="A prediction of whether this document was written by a millenial or boomer",
    type="string",
    enum=["millenial", "boomer"],
    required=True
)
document = await document.extract(properties=properties).execute()

Had to do

document = await document.extract(properties=[properties]).execute()

instead

asynchronous support

The code on github is out of sync with that in pypi, but
It seems that async is not used when calling the openai.ChatCompletion.create function, as follows:

# doctran/transformers/transformers.py
completion = self.config.openai.ChatCompletion.create(**function_call.dict())
arguments = completion.choices[0].message["function_call"]["arguments"]

Maybe it should?

# doctran/transformers/transformers.py
completion = await self.config.openai.ChatCompletion.acreate(**function_call.dict())
arguments = completion.choices[0].message["function_call"]["arguments"]

AttributeError: 'coroutine' object has no attribute 'extracted_properties' ​

I'm trying to replicate the tutorial but got this error on Extract Properties section:

AttributeError                            Traceback (most recent call last)
Cell In[37], line 34
      4 properties = [ExtractProperty(
      5             name="contact_info", 
      6             description="A list of each person mentioned and their contact information",
   (...)
     30             required=True
     31         )]
     33 transformed_document = document.extract(properties=properties).execute()
---> 34 print(json.dumps(transformed_document.extracted_properties, indent=2))

AttributeError: 'coroutine' object has no attribute 'extracted_properties'```

"poetry add doctran" does not work

Using version ^0.0.14 for doctran

Updating dependencies
Resolving dependencies... (0.0s)

Because no versions of doctran match >0.0.14,<0.0.15
 and doctran (0.0.14) depends on openai (>=0.27.8,<0.28.0), doctran (>=0.0.14,<0.0.15) requires openai (>=0.27.8,<0.28.0).
So, because [project name] depends on both openai (^1.12.0) and doctran (^0.0.14), version solving failed.

Please fix it so that it works with the latest openai version(^1.12.0)

Refactor the transformation methods into classes

Refactor the transformation methods into classes that inherit from a DocumentTransformer abstract class to make it easier to extend the library without making potentially breaking changes to the Doctran class

Error w/ Py version < 3.10

Related to this PR, running notebook (Python 3.9.16) I see import error, which appears to be an error in Py < 3.10:

│ ❱ 8 from doctran import Doctran │
│ 9 │
│ 10 │
│ 11 class DocumentInterrogator(BaseDocumentTransformer, BaseModel): │
│ │
│ /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/doctran/init.py:1 in │
│ │
│ ❱ 1 from .doctran import Doctran, Document, DoctranConfig, ContentType, ExtractProperty, Rec │
│ 2 │
│ │
│ /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/doctran/doctran.py:14 in │
│ │
│ 11 from typing import List, Optional, Dict, Any, Literal │
│ 12 from pydantic import BaseModel │
│ 13 │
│ ❱ 14 class ExtractProperty(BaseModel): │
│ 15 │ name: str │
│ 16 │ description: str │
│ 17 │ type: Literal["string", "number", "boolean", "array", "object"] │
│ │
│ /Users/rlm/anaconda3/envs/lcn2/lib/python3.9/site-packages/doctran/doctran.py:18 in │
│ ExtractProperty │
│ │
│ 15 │ name: str │
│ 16 │ description: str │
│ 17 │ type: Literal["string", "number", "boolean", "array", "object"] │
│ ❱ 18 │ items: Optional[List | Dict[str, Any]] │
│ 19 │ enum: Optional[List[str]] │
│ 20 │ required: bool = True │
│ 21

Transformers.py no longer compatible with openai >= 1.0.0

My transformer stopped working with latest version of openai. The error message spells it out pretty clearly:

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

from langchain.schema import Document
from langchain_community.document_transformers import DoctranQATransformer

documents = [Document(page_content=input_text)]
qa_transformer = DoctranQATransformer()
transformed_document = qa_transformer.transform_documents(documents)
python transform.py
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/doctran/transformers/transformers.py", line 70, in executeOpenAICall
    completion = self.config.openai.ChatCompletion.create(**function_call.dict())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/openai/lib/_old_api.py", line 39, in __call__
    raise APIRemovedInV1(symbol=self._symbol)
openai.lib._old_api.APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/doctran/doctran.py", line 136, in execute
    transformed_document = transformer.transform(transformed_document)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/doctran/transformers/transformers.py", line 55, in transform
    return self.executeOpenAICall(document)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/doctran/transformers/transformers.py", line 87, in executeOpenAICall
    raise Exception(f"OpenAI function call failed: {e}")
Exception: OpenAI function call failed: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/joe/Code/chunker/transform.py", line 40, in <module>
    transformed_document = qa_transformer.transform_documents(documents)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/langchain_community/document_transformers/doctran_text_qa.py", line 56, in transform_documents
    doctran_doc = doctran.parse(content=d.page_content).interrogate().execute()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/doctran/doctran.py", line 140, in execute
    raise Exception(f"Error executing transformation {transformation}: {e}")
Exception: Error executing transformation (<Transformation.interrogate: 'DocumentInterrogator'>, {}): OpenAI function call failed: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

Support for Azure OpenAI

There is a previous issue that seems to have added this functionality, however it's not clear to me how I can use it.

How can I use Doctran with Azure OpenAI models?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.