Giter Site home page Giter Site logo

jxnl / instructor Goto Github PK

View Code? Open in Web Editor NEW
5.1K 42.0 394.0 67.38 MB

structured outputs for llms

Home Page: https://python.useinstructor.com/

License: MIT License

Python 97.52% TypeScript 2.44% Shell 0.04%
openai python pydantic-v2 openai-functions validation openai-function-calli

instructor's Introduction

Instructor: Structured LLM Outputs

Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!

Twitter Follow Discord Downloads

Key Features

  • Response Models: Specify Pydantic models to define the structure of your LLM outputs
  • Retry Management: Easily configure the number of retry attempts for your requests
  • Validation: Ensure LLM responses conform to your expectations with Pydantic validation
  • Streaming Support: Work with Lists and Partial responses effortlessly
  • Flexible Backends: Seamlessly integrate with various LLM providers beyond OpenAI

Get Started in Minutes

Install Instructor with a single command:

pip install -U instructor

Now, let's see Instructor in action with a simple example:

import instructor
from pydantic import BaseModel
from openai import OpenAI


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

print(user_info.name)
#> John Doe
print(user_info.age)
#> 30

Using Anthropic Models

import instructor
from anthropic import Anthropic
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_anthropic(Anthropic())

# note that client.chat.completions.create will also work
resp = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

Using Cohere Models

Make sure to install cohere and set your system environment variable with export CO_API_KEY=<YOUR_COHERE_API_KEY>.

pip install cohere
import instructor
import cohere
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_cohere(cohere.Client())

# note that client.chat.completions.create will also work
resp = client.chat.completions.create(
    model="command-r-plus",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

Using Litellm

import instructor
from litellm import completion
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_litellm(completion)

resp = client.chat.completions.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

Type are inferred correctly

This was the dream of instructor but due to the patching of openai, it wasnt possible for me to get typing to work well. Now, with the new client, we can get typing to work well! We've also added a few create_* methods to make it easier to create iterables and partials, and to access the original completion.

Calling create

import openai
import instructor
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_openai(openai.OpenAI())

user = client.chat.completions.create(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

Now if you use a IDE, you can see the type is correctly inferred.

type

Handling async: await create

This will also work correctly with asynchronous clients.

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.AsyncOpenAI())


class User(BaseModel):
    name: str
    age: int


async def extract():
    return await client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {"role": "user", "content": "Create a user"},
        ],
        response_model=User,
    )

Notice that simply because we return the create method, the extract() function will return the correct user type.

async

Returning the original completion: create_with_completion

You can also return the original completion object

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


user, completion = client.chat.completions.create_with_completion(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

with_completion

Streaming Partial Objects: create_partial

In order to handle streams, we still support Iterable[T] and Partial[T] but to simply the type inference, we've added create_iterable and create_partial methods as well!

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


user_stream = client.chat.completions.create_partial(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

for user in user_stream:
    print(user)
    #> name=None age=None
    #> name=None age=None
    #> name=None age=None
    #> name=None age=None
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name='John Doe' age=25
    # name=None age=None
    # name='' age=None
    # name='John' age=None
    # name='John Doe' age=None
    # name='John Doe' age=30

Notice now that the type inferred is Generator[User, None]

generator

Streaming Iterables: create_iterable

We get an iterable of objects when we want to extract multiple objects.

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


users = client.chat.completions.create_iterable(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create 2 users"},
    ],
    response_model=User,
)

for user in users:
    print(user)
    #> name='John' age=30
    #> name='Jane' age=25
    # User(name='John Doe', age=30)
    # User(name='Jane Smith', age=25)

iterable

We invite you to contribute to evals in pytest as a way to monitor the quality of the OpenAI models and the instructor library. To get started check out the evals for anthropic and OpenAI and contribute your own evals in the form of pytest tests. These evals will be run once a week and the results will be posted.

Contributing

If you want to help, checkout some of the issues marked as good-first-issue or help-wanted found here. They could be anything from code improvements, a guest blog post, or a new cookbook.

CLI

We also provide some added CLI functionality for easy convinience:

  • instructor jobs : This helps with the creation of fine-tuning jobs with OpenAI. Simple use instructor jobs create-from-file --help to get started creating your first fine-tuned GPT3.5 model

  • instructor files : Manage your uploaded files with ease. You'll be able to create, delete and upload files all from the command line

  • instructor usage : Instead of heading to the OpenAI site each time, you can monitor your usage from the cli and filter by date and time period. Note that usage often takes ~5-10 minutes to update from OpenAI's side

License

This project is licensed under the terms of the MIT License.

Contributors

instructor's People

Contributors

anmol6 avatar bllchmbrs avatar cristobalcl avatar cruppelt avatar daaniyaan avatar daveokpare avatar dhruv-anand-aintech avatar ethanleifer avatar fpingham avatar gao-hongnan avatar inn-0 avatar ivanleomk avatar jlondonobo avatar jpetrantoni avatar jxnl avatar lakshyaag avatar lazyhope avatar leobeeson avatar medott29 avatar phibrandon avatar phodaie avatar rgbkrk avatar ryanhalliday avatar savarin avatar shanktt avatar shreya-51 avatar tedfulk avatar toolittlecakes avatar zboyles avatar zby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

instructor's Issues

Bug - cannot import name 'FieldValidationInfo' from 'pydantic'

Describe the bug
I get the following error: cannot import name 'FieldValidationInfo' from 'pydantic'.

When doing:

from instructor import OpenAISchema

To Reproduce

from instructor import OpenAISchema

Expected behavior

Expected it to not crash

Screenshots

Screenshot 2023-10-15 at 8 36 19 PM

Desktop (please complete the following information):
Version 0.2.8
Macbook Pro - Intel
Chrome

The multi classification does not actually work as intended.

Describe the bug
The multi classification does not actually work as intended.

To Reproduce
I copy paste the example for multi prediction and the outputs result in all the labels being predicted always. No matter the classes declared, the prompt used, the result is the same. All classes are predicted.

examples/classification/multi_prediction.py

Where to inject the few shot examples?

Where should I put the few shot examples into the prompt to improve accuracy? Should I put it in the model docstring or somewhere else? Can you provide an example?

Thanks.

Bug: openai_schema removes properties named title

openai_schema removes properties/fields named title from json schema

Example:

class Author(OpenAISchema):
  """Class representing an author. 
  This class is used to extract author's name and
  poem's title from a text"""
  name: str = Field(..., description="Name of the author")
  title: str = Field(..., description="Title of the article")
Author.openai_schema

# output:
"""
{'name': 'Author',
 'description': "Class representing an author. \nThis class is used to extract author's name and
\npoem's title from a text",
 'parameters': {'type': 'object',
  'properties': {'name': {'description': 'Name of the author',
    'type': 'string'}},
  'required': ['name']}}
"""

pip install instructor has dependency conflicts in Colab

Describe the bug

Running !pip install instructor in Colab creates the following dependency conflicts:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.

To Reproduce
Steps to reproduce the behavior:

  1. Create a new colab
  2. Run !pip install instructor

Expected behavior
Clean install without dependency conflicts.

Adding a lightweight prompt abstraction to the SchemaClass

Sure! Here's the updated proposal where PromptConfig has the model as a required argument and all other attributes as optional. The default model is set to "gpt3.5-turbo-0613":

from pydantic import BaseModel
from typing import Optional

class OpenAISchema(BaseModel):
    class PromptConfig:
        model: str = "gpt3.5-turbo-0613"
        system: Optional[str]
        message: Optional[str]
        temperature: Optional[float]
        max_tokens: Optional[int]

    @classmethod
    def from_response(cls, response):
        # Implementation based on the actual response format.

    @classmethod
    def create(cls, message=None, *args, force_function=False, **kwargs):
        messages = kwargs.get("messages", [])

        if not messages and hasattr(cls, "PromptConfig"):
            if cls.PromptConfig.system:
                messages.append({
                    "role": "system",
                    "content": cls.PromptConfig.system
                })
            if cls.PromptConfig.message:
                messages.append({
                    "role": "user",
                    "content": cls.PromptConfig.message
                })

        if message:
            messages.append({
                "role": "user",
                "content": message
            })

        if force_function:
            kwargs['function_call'] = {"name": cls.openai_schema["name"]}

        kwargs['messages'] = messages

        if hasattr(cls, "PromptConfig"):
            kwargs.setdefault('model', cls.PromptConfig.model)
            kwargs.setdefault('temperature', cls.PromptConfig.temperature)
            kwargs.setdefault('max_tokens', cls.PromptConfig.max_tokens)

        completion = openai.ChatCompletion.create(
            functions=[cls.openai_schema],
            **kwargs
        )
        return cls.from_response(completion)

class Search(OpenAISchema):
    # Implementation remains the same

class MultiSearch(OpenAISchema):
    class PromptConfig:
        system = "You are a capable algorithm designed to correctly segment search requests."
        message = "Correctly segment the following search request"
        model = "gpt3.5-turbo-0613"
        temperature = 0.5
        max_tokens = 1000

    # Implementation remains the same

# Example of usage:
queries = MultiSearch.create(
    "Please send me the video from last week about the investment case study and also documents about your GPDR policy."
)
queries.execute()

This revision makes the PromptConfig more flexible and easier to use with the default model set and all other parameters as optional. This configuration can be overridden on a per-class basis, as shown in the MultiSearch.PromptConfig example.

Bounty: Streaming function calls

To be considered checkout : https://replit.com/bounties/@jxnl/streaming-json-parse

I'd like to have the capability of parsing functions calls as they stream out for MultiTask when doing streaming function calls. You can use any existing python library. Must work for nested and deep objects.

Below is some code that won't work, since theres no good way of doing this:

from pydantic import BaseModel

class Task(BaseModel):
    id: int
    title: str

# This is your existing generator that yields chunks of JSON string
def json_chunks(json_string):
    for i in range(0, len(json_string), 5):  # replace 5 with the chunk size you want
         chunk = json_string[i:i+5]
         print("yield chunk:", chunk)
         yield chunk

def tasks_from_chunks(json_chunks: Generator[str, None, None]):
     # do something to get a single task_json
     task = Task.parse_raw(**task_json)
     print("yield task", task)
     yield task
     
json_string = '{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'

for task in tasks_from_chunks(json_chunks(json_string)):
     print(task)

Success criteria

  1. tasks are yielded as soon as they are parsed, there for task 1 should yield before all jsons chunks are yielded
  2. must contain a few examples to show it works correctly.

[Bounty] Instructor finetuning CLI needs to support validation_file and hyperparameters

Is your feature request related to a problem? Please describe.

We need to be able to pass in the hyperparameters and validation file here:
https://github.com/jxnl/instructor/blob/main/instructor/cli/jobs.py#L135

It should basically look like: https://platform.openai.com/docs/api-reference/fine-tuning/create#fine-tuning-create-hyperparameters

Describe the solution you'd like

  1. make a PR to add it into the cli
  2. update the documentation in the finetune docs page here: https://github.com/jxnl/instructor/blob/main/docs/cli/finetune.md

Support parameters docstring for `@openai_function` annotation

Is your feature request related to a problem? Please describe.
I want to be able to define a good old python function to use it both for the schema and execution, but if I want to add description to the parameters. Right now, I have to use a class definition. This could be solved by supporting the standard parameters parsing from docstrings.

Describe the solution you'd like
E.g., this should work:

@openai_function
def get_current_weather(
    location: str, format: Literal["celsius", "fahrenheit"] = "celsius"
) -> WeatherReturn:
    """
    Gets the current weather in a given location, use this function for any questions related to the weather

    Parameters
    ----------
    location
        The city to get the weather, e.g. San Francisco. Guess the location from user messages

    format
        A string with the full content of what the given role said
    """

    return WeatherReturn(
        location=location,
        forecast="sunny",
        temperature="25 C" if format == "celsius" else "77 F",
    )

But right now the description of the parameters goes into the function description, not into the parameters description.

How it is right now:

{
    'name': 'get_current_weather',
    'description': '\n    Gets the current weather in a given location, use this function for any questions related to the weather\n\n    Parameters\n    ----------\n    location\n        The city to get the weather, e.g. San Francisco. Guess the location from user messages\n\n    format\n        A string with the full content of what the given role said\n    ',
    'parameters': {
        'properties': {
            'location': {'type': 'string'},
            'format': {
                'default': 'celsius',
                'enum': ['celsius', 'fahrenheit'],
                'type': 'string'
            }
        },
        'required': ['format', 'location'],
        'type': 'object'
    }
}

How I expect it:

{
  'name': 'get_current_weather',
  'description': 'Gets the current weather in a given location, use this function for any questions related to the weather',
  'parameters': {
      'properties': {
          'location': {
              'description': 'The city to get the weather, e.g. San Francisco. Guess the location from user messages',
              'type': 'string'
          },
          'format': {
              'description': 'A string with the full content of what the given role said',
              'default': 'celsius',
              'enum': ['celsius', 'fahrenheit'],
              'type': 'string'
          }
      },
      'required': ['location'],
      'type': 'object'
  }
}

Weird usecase where pydantic model has field that represents code but gets invalid json characters, failing model_validate_json

Is your feature request related to a problem? Please describe.
I have a weirdish use case, where one of the fields of the pydantic model represents code.
The code is often returned with a bunch of invalid json characters in it, like control characters (\u0000-\u001F).

This makes instructor fail on errors like this:
File "/opt/homebrew/lib/python3.11/site-packages/pydantic/main.py", line 530, in model_validate_json return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for RustCode Invalid JSON: control character (\u0000-\u001F) found while parsing a string at line 4 column 0 [type=json_invalid, input_value='\n{\n"generated_code": "...xample_output": "37"\n}', input_type=str] For further information visit https://errors.pydantic.dev/2.4/v/json_invalid joelkronander@MacBook-Pro-5 swissknife %

Describe the solution you'd like
Maybe one could handle cases like this with some form of "pre-validators" that could for example run byte64 encoding on those non-json compatible strings? Not sure how it would fit in exactly.

Additional context
Instructor is nice.

Small error in `openai_function`

Was getting error:
AttributeError: 'openai_function' object has no attribute 'schema'

Fixed by changing line 30 to:
assert message["function_call"]["name"] == self.openai_schema["name"], "Function name does not match"

Thanks for putting this up, this code is super useful.

Add LLM based citation

It will be nice to have Fact generated with semantic citations (not the Regex-based ones that you have in the cookbook). We can do this with a custom validation function that invokes an LLM call.

Doc improvement: *why* would one use distillation?

I was reading through the docs and saw https://jxnl.github.io/instructor/distillation/ . The page explains the "what" and the "how", but not the "why" - I assume this feature caters to some usecases, but it's not clear to me at all what those would be? The examples given seem like a ridiculously bad idea - replacing instantaneous, deterministic on-device calculations with slow, prone to hallucination api calls? Why would I ever want to use an LLM to perform simple math? I get they're just examples, but maybe it would be nice to have a paragraph explaining real usecases for this.

Upcoming openai-python 1.0.0 release

Hello. Thanks for your great work on Instructor. Really appreciate that it's thoughtfully constructed for use in production.

I wanted to check what your plans are for the upcoming openai-python 1.0.0 release (openai/openai-python#631). Instructor currently has a dependency on <0.28.

Thanks!

Async might not be properly handled in latest instructor/openai versions?

I am using instructor = "^0.3.1" and openai = "^1.2.0".

I initialize my client as:

client = instructor.patch(AsyncOpenAI(
    api_key=OPENAI_API_KEY,
))

And then call it as:

async def myfunc():
    ...
                response = await client.chat.completions.create(
                    model=model_name,
                    messages=messages,
                    response_model=response_model, # type: ignore
                    max_retries=2
                )

This gives me an error: Error in getting response from model: 'coroutine' object has no attribute 'choices'.
I stepped through the code in a debugger and it seems like wrap_chatcomplete wraps the AsyncOpenAI().chat.completion.create as a sync function, not an async one?

image

Exclude properties with defaults from required

Suggestion:
parameters["required"] = sorted(k for k, v in parameters.get("properties", {}).items() if not "default" in v)
instead of
parameters["required"] = sorted(parameters["properties"])

That would allow us to:
data: Any = Field(None, description="Optional data attached")

Decoupling the llm backend

Is your feature request related to a problem? Please describe.
I see the library is tightly coupled with openai function calling. but it would be good to decouple the model from pydantic way of doing things and use any model (llms from langchain) that way we can experiment with smaller/self-hosted/other cloud models

Describe the solution you'd like
ability to pass pydantic structures to any llm and get results back. for eample, something like using langchain tools where function calling is isolated from llm.

Describe alternatives you've considered
custom tools in langchain implementation for function calling

Additional context
not sure its already possible. I haven't experimented yet, but it looks like its coupled based on the repo subtitles /examples

Compatibility with Langchain

Is your feature request related to a problem? Please describe.
Would like to resolve dependency incompatibility between langchain and openai_function_call

Describe the solution you'd like
langchain and openai_function_call to be compatiable

Describe alternatives you've considered
None

Additional context

  Because no versions of openai-function-call match >0.2.0,<0.3.0
   and openai-function-call (0.2.0) depends on pydantic (>=2.0.2,<3.0.0), openai-function-call (>=0.2.0,<0.3.0) requires pydantic (>=2.0.2,<3.0.0).
  And because langchain (0.0.238) depends on pydantic (>=1,<2)
   and no versions of langchain match >0.0.238,<0.0.239, openai-function-call (>=0.2.0,<0.3.0) is incompatible with langchain (>=0.0.238,<0.0.239).
  So, because nira-ai depends on both langchain (^0.0.238) and openai-function-call (^0.2.0), version solving failed.

support for completions endpoint

Is your feature request related to a problem? Please describe.
The recent -instruct models are instruction tuned rather than dialogue tuned and should be very useful for most use cases of this library.

class UserDetail(BaseModel):
    name: str
    age: int

user: UserDetail = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Extract Jason is 25 years old"},
    ]
)

This should work.

Describe the solution you'd like
Patch should also patch the openai.Completion.create method.

Logic error in ChatCompletion __or__

for class ChatCompletion(BaseModel):

def or(self, other: Union[Message, OpenAISchema]) -> "ChatCompletion":
if isinstance(other, Message):
if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
self.system_message = other

should be

if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
else:
self.system_message = other

Typer version too old

Describe the bug
Is there a reason why Typer version ^0.4.0 is used while the latest version is 0.9.0 ?
It might conflict with other packages that required more recent version of Typer

JsonDecoderError at the specific place

Describe the bug
When Using the instructor, at some input. It will raise json error fault.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A way to fix the bug

Screenshots
image

Desktop (please complete the following information):
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core

Default description for generated schema

When getting the .openai_schema from an OpenAISchema (BaseModel) class, if the class has a docstring, then that is used as the description. If there is no docstring, one is automatically added. The current default description (no docstring) is this - a description about the extraction process rather than a description of the object.

For example, if I define an Address as

class Address(City):
    country: str
    state: str
    city: str
    street: str

Then the .openai_schema is

{'name': 'Address',
 'description': 'Correctly extracted `Address` with all the required parameters with correct types',
 'parameters': {'properties': {'country': {'type': 'string'},
   'state': {'type': 'string'},
   'city': {'type': 'string'},
   'street': {'type': 'string'}},
  'required': ['city', 'country', 'state', 'street'],
  'type': 'object'}}

However, if I add a docstring to the type, like

class Address(City):
    """An address"""
    country: str
    state: str
    city: str
    street: str

then the .openai_schema is

{'name': 'Address',
 'description': 'An address',
 'parameters': {'properties': {'country': {'type': 'string'},
   'state': {'type': 'string'},
   'city': {'type': 'string'},
   'street': {'type': 'string'}},
  'required': ['city', 'country', 'state', 'street'],
  'type': 'object'}}

The current default string doesn't really have the same use case as the description when a docstring is present.

I think a better default description would be the empty string ("") or maybe just the class name. In most cases, I think it would be preferable that the language model is given no description of the type than one about the schema generation process.

Help: Reorganize module strucutre

would be nice to have a structure where theres a directory per example so we can have a readme.md for each example and a list of evals to run.

create func of ChatCompletion does not return completion if self.function is None

in dsl/completion.py shouldn't create return completion?

def create(self):
"""
Create a chat response from the OpenAI API

    Returns:
        response (OpenAISchema): The response from the OpenAI API
    """
    kwargs = self.kwargs
    completion = openai.ChatCompletion.create(**kwargs)
    if self.function:
        return self.function.from_response(completion)
   **return completion**

Base example doesnt work?

Hi jason, watched your Pydantic talk and thought I'd check it out. Seems like a fantastic idea but on openai==1.1.0 and instructor==0.3.0 raises a TypeError. This of course does not arise when using the "unpatched" openai client and sending the request, without the response_model kwarg

user = client.chat.completions.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'classmethod' object is not callable

thanks! and great talk

Default parameters to pydantic model

Is your feature request related to a problem? Please describe.
I'm always frustrated when I need send default parameters to pydantic response_model

Describe the solution you'd like
I want to send for example default sex to model (don't extract data with ChatCompletion), because I know Jason's sex ๐Ÿ˜„ :

class UserDetail(BaseModel, sex):
    weight: int
    sex: str
    def is_obese(self):
        if self.sex=='female' and self.weight>100:
            return True
        if self.sex=='male' and self.weight>120:
            return True
        return False


user: UserDetail = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    parameters={'sex': 'male'}
    messages=[
        {"role": "user", "content": "Extract Jason 200kg"},
    ]
)

Describe alternatives you've considered
I'd considered create another pydantic class to complete properties for user. But it is not correct way, because UserDetail should have all user properties, some extracted from ChatCompletion and others send for me.

Maybe, I have lost something. I'm not expert using pydantic. If you can share me another option I would be grateful.

Function does not obey Enums

Describe the bug
I set a enum for one of the function inputs.
I have a pydantic class that refers to the enum.
The output args show that the enum is not followed.

Expected behavior
I would expect that the generated args obey the enum I set for that field.

Bugs in `Example 2: Schema Extraction`

There are two bugs in Example 2: Schema Extraction.

  1. There's a missing comma character after functions=[UserDetails.openai_schema]
  2. Missing import, from pydantic import Field

Does instructor support Azure OpenAI API ?

When I use Azure OpenAI, I often encounter errors, but occasionally it succeeds. I am not sure if the current instructor can use the Azure OpenAI API. Below is the function and frequent error message.

new_updates = openai.ChatCompletion.create(
        response_model=Report,
        deployment_id= dep.GPT_4,
        max_retries=2,
        messages=[
                {
                    "role": "system",
                    "content": SYSTEM_PROMPT_KG_SYT
                },
                {
                    "role": "user",
                    "content": f"""Extract any new events from the following:
                    # Part {i}/{num_iterations} of the input:

                    {inp}"""
                },
                {
                    "role": "user",
                    "content": f"""Here is the current state of the report:
                    {cur_state.model_dump_json(indent=2)}"""
                }
            ],
        
    )  # type: ignore

Describe the bug
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Traceback (most recent call last):
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 92, in
ade_report: Report = generate_report(text_chunks)
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 47, in generate_report
new_updates = openai.ChatCompletion.create(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 162, in new_chatcompletion_sync
response, error = retry_sync(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 117, in retry_sync
response = func(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 299, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 710, in _interpret_response
self._interpret_response_line(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'

Desktop (please complete the following information):

  • OS: Windows

Additional context
Azure OpenAI version : 2023-08-01-preview

Changing Patch behavior

I think there are a few ways to add the response_model and other capabilies.

Monkey Patch Global

import instructor

instructor.patch()

resp = openai.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)

Monkey Patch Context

with instructor.patch():
      resp = openai.ChatComplete.create(..., response_model=Model)
      assert isinstance(resp, Model)

Import Custom SDK

from instructor import client (#as openai)

resp = client.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)

I think we need to be concious of how other tools also patch the client.

pip install instructor has dependency conflicts in Colab

Describe the bug

Running !pip install instructor in Colab creates the following dependency conflicts:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.

To Reproduce
Steps to reproduce the behavior:

  1. Create a new colab
  2. Run !pip install instructor

Expected behavior
Clean install without dependency conflicts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.