jxnl / instructor Goto Github PK

View Code? Open in Web Editor NEW

5.1K 42.0 394.0 67.38 MB

structured outputs for llms

Home Page: https://python.useinstructor.com/

License: MIT License

Python 97.52% TypeScript 2.44% Shell 0.04%

openai python pydantic-v2 openai-functions validation openai-function-calli

instructor's Introduction

Instructor: Structured LLM Outputs

Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!

Key Features

Response Models: Specify Pydantic models to define the structure of your LLM outputs
Retry Management: Easily configure the number of retry attempts for your requests
Validation: Ensure LLM responses conform to your expectations with Pydantic validation
Streaming Support: Work with Lists and Partial responses effortlessly
Flexible Backends: Seamlessly integrate with various LLM providers beyond OpenAI

Get Started in Minutes

Install Instructor with a single command:

pip install -U instructor

Now, let's see Instructor in action with a simple example:

import instructor
from pydantic import BaseModel
from openai import OpenAI


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

print(user_info.name)
#> John Doe
print(user_info.age)
#> 30

Using Anthropic Models

import instructor
from anthropic import Anthropic
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_anthropic(Anthropic())

# note that client.chat.completions.create will also work
resp = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

Using Cohere Models

Make sure to install cohere and set your system environment variable with export CO_API_KEY=<YOUR_COHERE_API_KEY>.

pip install cohere

import instructor
import cohere
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_cohere(cohere.Client())

# note that client.chat.completions.create will also work
resp = client.chat.completions.create(
    model="command-r-plus",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

Using Litellm

import instructor
from litellm import completion
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_litellm(completion)

resp = client.chat.completions.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

assert isinstance(resp, User)
assert resp.name == "Jason"
assert resp.age == 25

Type are inferred correctly

This was the dream of instructor but due to the patching of openai, it wasnt possible for me to get typing to work well. Now, with the new client, we can get typing to work well! We've also added a few create_* methods to make it easier to create iterables and partials, and to access the original completion.

Calling `create`

import openai
import instructor
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_openai(openai.OpenAI())

user = client.chat.completions.create(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

Now if you use a IDE, you can see the type is correctly inferred.

Handling async: `await create`

This will also work correctly with asynchronous clients.

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.AsyncOpenAI())


class User(BaseModel):
    name: str
    age: int


async def extract():
    return await client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {"role": "user", "content": "Create a user"},
        ],
        response_model=User,
    )

Notice that simply because we return the create method, the extract() function will return the correct user type.

Returning the original completion: `create_with_completion`

You can also return the original completion object

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


user, completion = client.chat.completions.create_with_completion(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

Streaming Partial Objects: `create_partial`

In order to handle streams, we still support Iterable[T] and Partial[T] but to simply the type inference, we've added create_iterable and create_partial methods as well!

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


user_stream = client.chat.completions.create_partial(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

for user in user_stream:
    print(user)
    #> name=None age=None
    #> name=None age=None
    #> name=None age=None
    #> name=None age=None
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name='John Doe' age=25
    # name=None age=None
    # name='' age=None
    # name='John' age=None
    # name='John Doe' age=None
    # name='John Doe' age=30

Notice now that the type inferred is Generator[User, None]

Streaming Iterables: `create_iterable`

We get an iterable of objects when we want to extract multiple objects.

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


users = client.chat.completions.create_iterable(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create 2 users"},
    ],
    response_model=User,
)

for user in users:
    print(user)
    #> name='John' age=30
    #> name='Jane' age=25
    # User(name='John Doe', age=30)
    # User(name='Jane Smith', age=25)

Evals

We invite you to contribute to evals in pytest as a way to monitor the quality of the OpenAI models and the instructor library. To get started check out the evals for anthropic and OpenAI and contribute your own evals in the form of pytest tests. These evals will be run once a week and the results will be posted.

Contributing

If you want to help, checkout some of the issues marked as good-first-issue or help-wanted found here. They could be anything from code improvements, a guest blog post, or a new cookbook.

CLI

We also provide some added CLI functionality for easy convinience:

instructor jobs : This helps with the creation of fine-tuning jobs with OpenAI. Simple use instructor jobs create-from-file --help to get started creating your first fine-tuned GPT3.5 model
instructor files : Manage your uploaded files with ease. You'll be able to create, delete and upload files all from the command line
instructor usage : Instead of heading to the OpenAI site each time, you can monitor your usage from the cli and filter by date and time period. Note that usage often takes ~5-10 minutes to update from OpenAI's side

License

This project is licensed under the terms of the MIT License.

Contributors

instructor's People

Contributors

Stargazers

Watchers

Forkers

jphme youminxue dattgoswami daveokpare samvas-codes shresht8 codeaudit bluntworks tomchapin d3287t328 neuroscigeek77 techthiyanes zeno1408 shaunwei sambosis pursuityp know-it-all-marketing-llc amikos-tech guyschlider gmh5225 madhatter92 project-hero-tech cristobalcl kingsframe gerarduffy mharris717 anamhira47 zshancs nirantk apollohuang1 adriangalilea awtkns marcosmagallanes atemaguer phiweger michaltorma almyai bllchmbrs haikuoxin mz0in lecole rayfernando1337-ai-forks svats2k dnonline koljab samiur paxhumana-prime raymow97 stjordanis aiexanderdicke thinker007 kamilnowakflyps neilneuwirth hieutrluu lliwcwill phodaie realsrisri haoyitedaniu dhruv-anand-aintech jordanmaneval johnnysands zhiyu-01 kennethcassel corticalstack aresti amorriscode jlondonobo atbe alteredentropy nish1001 touristshaun f901107 krrishdholakia clsta j-94 lan2720 hbcbh1999 ivanleomk jeff3071 mboyanna w0lveri9 tonywhite11 swappybizz zboyles jeromyjsmith icdev2dev rgbkrk pablopalafox rogervaas maxjeblick ohadrubin python-popular-repos joshdey daaniyaan quito96 klyap lemmaleisa torobirgitte rkp64 dvieral

instructor's Issues

Add some basic ci for pytest

Just to know stuff isn't breaking

upgrade to pydantic v2

We should upgrade to pydantic v2

Request: Pydantic types from openapi.json to make restful agent

there should be some tools that can get pydantic from openapi.json

would love to see an example like

endpoints = Endpoints.from("www.website.com/openapi.json")

completion = openai.ChatCompletion(
   function_call=endpoints
   ...
   )

Example link 404

Describe the bug
Link to examples in README is currently broken.

To Reproduce
Steps to reproduce the behavior:

Click on "To see more examples of how we can create interesting models check out some examples." link in the README
Links to https://github.com/jxnl/instructor/blob/main/examples/index.md

Expected behavior
Links to https://jxnl.github.io/instructor/examples/

openai.error.InvalidRequestError: Unrecognized request argument supplied: functions

Run example in Azure openai, following error occurs:
openai.error.InvalidRequestError: Unrecognized request argument supplied: functions

Can someone give some opinions on this? Thanks in advance.

Bug - cannot import name 'FieldValidationInfo' from 'pydantic'

Describe the bug
I get the following error: cannot import name 'FieldValidationInfo' from 'pydantic'.

When doing:

from instructor import OpenAISchema

To Reproduce

from instructor import OpenAISchema

Expected behavior

Expected it to not crash

Screenshots

Desktop (please complete the following information):
Version 0.2.8
Macbook Pro - Intel
Chrome

The multi classification does not actually work as intended.

Describe the bug
The multi classification does not actually work as intended.

To Reproduce
I copy paste the example for multi prediction and the outputs result in all the labels being predicted always. No matter the classes declared, the prompt used, the result is the same. All classes are predicted.

examples/classification/multi_prediction.py

Where to inject the few shot examples?

Where should I put the few shot examples into the prompt to improve accuracy? Should I put it in the model docstring or somewhere else? Can you provide an example?

Thanks.

Bug: openai_schema removes properties named title

openai_schema removes properties/fields named title from json schema

Example:

class Author(OpenAISchema):
  """Class representing an author. 
  This class is used to extract author's name and
  poem's title from a text"""
  name: str = Field(..., description="Name of the author")
  title: str = Field(..., description="Title of the article")

Author.openai_schema

# output:
"""
{'name': 'Author',
 'description': "Class representing an author. \nThis class is used to extract author's name and
\npoem's title from a text",
 'parameters': {'type': 'object',
  'properties': {'name': {'description': 'Name of the author',
    'type': 'string'}},
  'required': ['name']}}
"""

Clarify docs - is there a difference between calling a ChatCompletion with response_model parameter vs. using model.openai_schema and then using a function_call in the ChatCompletion?

https://jxnl.github.io/instructor/#section-2-adding-additional-prompting Both options are shown, it's unclear if both can be used interchangeably? Or can OpenAISchema's only be used with function_call parameters of ChatCompletion s?

pip install instructor has dependency conflicts in Colab

Describe the bug

Running !pip install instructor in Colab creates the following dependency conflicts:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.

To Reproduce
Steps to reproduce the behavior:

Create a new colab
Run !pip install instructor

Expected behavior
Clean install without dependency conflicts.

json.decoder.JSONDecodeError: Invalid control character at: line 2 column 16 (char 17)

i am using openai_function_call to generate code. ran into this error, opening an issue for it. but cant identify a repro yet

Create a fastapi example to decomstrsted shared type

Would be cool to create an example where a fastapi and function call example shared the same object type and there was a great standardized way

Of allowing openai to call the endpoint as code

Add support for mkdocs

So we can get started on building our examples and dsl docs.

Adding a lightweight prompt abstraction to the SchemaClass

Sure! Here's the updated proposal where PromptConfig has the model as a required argument and all other attributes as optional. The default model is set to "gpt3.5-turbo-0613":

from pydantic import BaseModel
from typing import Optional

class OpenAISchema(BaseModel):
    class PromptConfig:
        model: str = "gpt3.5-turbo-0613"
        system: Optional[str]
        message: Optional[str]
        temperature: Optional[float]
        max_tokens: Optional[int]

    @classmethod
    def from_response(cls, response):
        # Implementation based on the actual response format.

    @classmethod
    def create(cls, message=None, *args, force_function=False, **kwargs):
        messages = kwargs.get("messages", [])

        if not messages and hasattr(cls, "PromptConfig"):
            if cls.PromptConfig.system:
                messages.append({
                    "role": "system",
                    "content": cls.PromptConfig.system
                })
            if cls.PromptConfig.message:
                messages.append({
                    "role": "user",
                    "content": cls.PromptConfig.message
                })

        if message:
            messages.append({
                "role": "user",
                "content": message
            })

        if force_function:
            kwargs['function_call'] = {"name": cls.openai_schema["name"]}

        kwargs['messages'] = messages

        if hasattr(cls, "PromptConfig"):
            kwargs.setdefault('model', cls.PromptConfig.model)
            kwargs.setdefault('temperature', cls.PromptConfig.temperature)
            kwargs.setdefault('max_tokens', cls.PromptConfig.max_tokens)

        completion = openai.ChatCompletion.create(
            functions=[cls.openai_schema],
            **kwargs
        )
        return cls.from_response(completion)

class Search(OpenAISchema):
    # Implementation remains the same

class MultiSearch(OpenAISchema):
    class PromptConfig:
        system = "You are a capable algorithm designed to correctly segment search requests."
        message = "Correctly segment the following search request"
        model = "gpt3.5-turbo-0613"
        temperature = 0.5
        max_tokens = 1000

    # Implementation remains the same

# Example of usage:
queries = MultiSearch.create(
    "Please send me the video from last week about the investment case study and also documents about your GPDR policy."
)
queries.execute()

This revision makes the PromptConfig more flexible and easier to use with the default model set and all other parameters as optional. This configuration can be overridden on a per-class basis, as shown in the MultiSearch.PromptConfig example.

Install and leverage erdantic

It would allow us to make a. Diagram per example and show how modeling gets us these nice to haves.

Bounty: Streaming function calls

To be considered checkout : https://replit.com/bounties/@jxnl/streaming-json-parse

I'd like to have the capability of parsing functions calls as they stream out for MultiTask when doing streaming function calls. You can use any existing python library. Must work for nested and deep objects.

Below is some code that won't work, since theres no good way of doing this:

from pydantic import BaseModel

class Task(BaseModel):
    id: int
    title: str

# This is your existing generator that yields chunks of JSON string
def json_chunks(json_string):
    for i in range(0, len(json_string), 5):  # replace 5 with the chunk size you want
         chunk = json_string[i:i+5]
         print("yield chunk:", chunk)
         yield chunk

def tasks_from_chunks(json_chunks: Generator[str, None, None]):
     # do something to get a single task_json
     task = Task.parse_raw(**task_json)
     print("yield task", task)
     yield task
     
json_string = '{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'

for task in tasks_from_chunks(json_chunks(json_string)):
     print(task)

Success criteria

tasks are yielded as soon as they are parsed, there for task 1 should yield before all jsons chunks are yielded
must contain a few examples to show it works correctly.

Support Headers in Chat completion api

For those that need to add additional information for example things like LLMObs should be good to be able to add headers

[Bounty] Instructor finetuning CLI needs to support validation_file and hyperparameters

Is your feature request related to a problem? Please describe.

We need to be able to pass in the hyperparameters and validation file here:
https://github.com/jxnl/instructor/blob/main/instructor/cli/jobs.py#L135

It should basically look like: https://platform.openai.com/docs/api-reference/fine-tuning/create#fine-tuning-create-hyperparameters

Describe the solution you'd like

make a PR to add it into the cli
update the documentation in the finetune docs page here: https://github.com/jxnl/instructor/blob/main/docs/cli/finetune.md

Support parameters docstring for `@openai_function` annotation

Is your feature request related to a problem? Please describe.
I want to be able to define a good old python function to use it both for the schema and execution, but if I want to add description to the parameters. Right now, I have to use a class definition. This could be solved by supporting the standard parameters parsing from docstrings.

Describe the solution you'd like
E.g., this should work:

@openai_function
def get_current_weather(
    location: str, format: Literal["celsius", "fahrenheit"] = "celsius"
) -> WeatherReturn:
    """
    Gets the current weather in a given location, use this function for any questions related to the weather

    Parameters
    ----------
    location
        The city to get the weather, e.g. San Francisco. Guess the location from user messages

    format
        A string with the full content of what the given role said
    """

    return WeatherReturn(
        location=location,
        forecast="sunny",
        temperature="25 C" if format == "celsius" else "77 F",
    )

But right now the description of the parameters goes into the function description, not into the parameters description.

How it is right now:

{
    'name': 'get_current_weather',
    'description': '\n    Gets the current weather in a given location, use this function for any questions related to the weather\n\n    Parameters\n    ----------\n    location\n        The city to get the weather, e.g. San Francisco. Guess the location from user messages\n\n    format\n        A string with the full content of what the given role said\n    ',
    'parameters': {
        'properties': {
            'location': {'type': 'string'},
            'format': {
                'default': 'celsius',
                'enum': ['celsius', 'fahrenheit'],
                'type': 'string'
            }
        },
        'required': ['format', 'location'],
        'type': 'object'
    }
}

How I expect it:

{
  'name': 'get_current_weather',
  'description': 'Gets the current weather in a given location, use this function for any questions related to the weather',
  'parameters': {
      'properties': {
          'location': {
              'description': 'The city to get the weather, e.g. San Francisco. Guess the location from user messages',
              'type': 'string'
          },
          'format': {
              'description': 'A string with the full content of what the given role said',
              'default': 'celsius',
              'enum': ['celsius', 'fahrenheit'],
              'type': 'string'
          }
      },
      'required': ['location'],
      'type': 'object'
  }
}

Weird usecase where pydantic model has field that represents code but gets invalid json characters, failing model_validate_json

Is your feature request related to a problem? Please describe.
I have a weirdish use case, where one of the fields of the pydantic model represents code.
The code is often returned with a bunch of invalid json characters in it, like control characters (\u0000-\u001F).

This makes instructor fail on errors like this:
File "/opt/homebrew/lib/python3.11/site-packages/pydantic/main.py", line 530, in model_validate_json return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for RustCode Invalid JSON: control character (\u0000-\u001F) found while parsing a string at line 4 column 0 [type=json_invalid, input_value='\n{\n"generated_code": "...xample_output": "37"\n}', input_type=str] For further information visit https://errors.pydantic.dev/2.4/v/json_invalid joelkronander@MacBook-Pro-5 swissknife %

Describe the solution you'd like
Maybe one could handle cases like this with some form of "pre-validators" that could for example run byte64 encoding on those non-json compatible strings? Not sure how it would fit in exactly.

Additional context
Instructor is nice.

Request: Trees of thought implemented as function calls w/ a search loop!

i think the 24 game or a cross word example would be awesome

Small error in `openai_function`

Was getting error:
AttributeError: 'openai_function' object has no attribute 'schema'

Fixed by changing line 30 to:
assert message["function_call"]["name"] == self.openai_schema["name"], "Function name does not match"

Thanks for putting this up, this code is super useful.

Add LLM based citation

It will be nice to have Fact generated with semantic citations (not the Regex-based ones that you have in the cookbook). We can do this with a custom validation function that invokes an LLM call.

Doc improvement: why would one use distillation?

I was reading through the docs and saw https://jxnl.github.io/instructor/distillation/ . The page explains the "what" and the "how", but not the "why" - I assume this feature caters to some usecases, but it's not clear to me at all what those would be? The examples given seem like a ridiculously bad idea - replacing instantaneous, deterministic on-device calculations with slow, prone to hallucination api calls? Why would I ever want to use an LLM to perform simple math? I get they're just examples, but maybe it would be nice to have a paragraph explaining real usecases for this.

Upcoming openai-python 1.0.0 release

Hello. Thanks for your great work on Instructor. Really appreciate that it's thoughtfully constructed for use in production.

I wanted to check what your plans are for the upcoming openai-python 1.0.0 release (openai/openai-python#631). Instructor currently has a dependency on <0.28.

Thanks!

Async might not be properly handled in latest instructor/openai versions?

I am using instructor = "^0.3.1" and openai = "^1.2.0".

I initialize my client as:

client = instructor.patch(AsyncOpenAI(
    api_key=OPENAI_API_KEY,
))

And then call it as:

async def myfunc():
    ...
                response = await client.chat.completions.create(
                    model=model_name,
                    messages=messages,
                    response_model=response_model, # type: ignore
                    max_retries=2
                )

This gives me an error: Error in getting response from model: 'coroutine' object has no attribute 'choices'.
I stepped through the code in a debugger and it seems like wrap_chatcomplete wraps the AsyncOpenAI().chat.completion.create as a sync function, not an async one?

Exclude properties with defaults from required

Suggestion:
parameters["required"] = sorted(k for k, v in parameters.get("properties", {}).items() if not "default" in v)
instead of
parameters["required"] = sorted(parameters["properties"])

That would allow us to:
data: Any = Field(None, description="Optional data attached")

Automate doc building process

We should add in some automation in actions to 'mkdocs gh-deploy' I wanna new tag when we publish a new version

Decoupling the llm backend

Is your feature request related to a problem? Please describe.
I see the library is tightly coupled with openai function calling. but it would be good to decouple the model from pydantic way of doing things and use any model (llms from langchain) that way we can experiment with smaller/self-hosted/other cloud models

Describe the solution you'd like
ability to pass pydantic structures to any llm and get results back. for eample, something like using langchain tools where function calling is isolated from llm.

Describe alternatives you've considered
custom tools in langchain implementation for function calling

Additional context
not sure its already possible. I haven't experimented yet, but it looks like its coupled based on the repo subtitles /examples

Compatibility with Langchain

Is your feature request related to a problem? Please describe.
Would like to resolve dependency incompatibility between langchain and openai_function_call

Describe the solution you'd like
langchain and openai_function_call to be compatiable

Describe alternatives you've considered
None

Additional context

  Because no versions of openai-function-call match >0.2.0,<0.3.0
   and openai-function-call (0.2.0) depends on pydantic (>=2.0.2,<3.0.0), openai-function-call (>=0.2.0,<0.3.0) requires pydantic (>=2.0.2,<3.0.0).
  And because langchain (0.0.238) depends on pydantic (>=1,<2)
   and no versions of langchain match >0.0.238,<0.0.239, openai-function-call (>=0.2.0,<0.3.0) is incompatible with langchain (>=0.0.238,<0.0.239).
  So, because nira-ai depends on both langchain (^0.0.238) and openai-function-call (^0.2.0), version solving failed.

support for completions endpoint

Is your feature request related to a problem? Please describe.
The recent -instruct models are instruction tuned rather than dialogue tuned and should be very useful for most use cases of this library.

class UserDetail(BaseModel):
    name: str
    age: int

user: UserDetail = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Extract Jason is 25 years old"},
    ]
)

This should work.

Describe the solution you'd like
Patch should also patch the openai.Completion.create method.

Logic error in ChatCompletion or

for class ChatCompletion(BaseModel):

def or(self, other: Union[Message, OpenAISchema]) -> "ChatCompletion":
if isinstance(other, Message):
if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
self.system_message = other

should be

if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
else:
self.system_message = other

Typer version too old

Describe the bug
Is there a reason why Typer version ^0.4.0 is used while the latest version is 0.9.0 ?
It might conflict with other packages that required more recent version of Typer

Change openai base schema to a decorator so models only inherit from base model.

JsonDecoderError at the specific place

Describe the bug
When Using the instructor, at some input. It will raise json error fault.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A way to fix the bug

Screenshots

Desktop (please complete the following information):
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core

Default description for generated schema

When getting the .openai_schema from an OpenAISchema (BaseModel) class, if the class has a docstring, then that is used as the description. If there is no docstring, one is automatically added. The current default description (no docstring) is this - a description about the extraction process rather than a description of the object.

For example, if I define an Address as

class Address(City):
    country: str
    state: str
    city: str
    street: str

Then the .openai_schema is

{'name': 'Address',
 'description': 'Correctly extracted `Address` with all the required parameters with correct types',
 'parameters': {'properties': {'country': {'type': 'string'},
   'state': {'type': 'string'},
   'city': {'type': 'string'},
   'street': {'type': 'string'}},
  'required': ['city', 'country', 'state', 'street'],
  'type': 'object'}}

However, if I add a docstring to the type, like

class Address(City):
    """An address"""
    country: str
    state: str
    city: str
    street: str

then the .openai_schema is

{'name': 'Address',
 'description': 'An address',
 'parameters': {'properties': {'country': {'type': 'string'},
   'state': {'type': 'string'},
   'city': {'type': 'string'},
   'street': {'type': 'string'}},
  'required': ['city', 'country', 'state', 'street'],
  'type': 'object'}}

The current default string doesn't really have the same use case as the description when a docstring is present.

I think a better default description would be the empty string ("") or maybe just the class name. In most cases, I think it would be preferable that the language model is given no description of the type than one about the schema generation process.

OpenDNS considers onrender.com a security threat

re tweet at https://twitter.com/jxnlco/status/1677907692122259456?s=20

A number of people on Cisco-administered networks may see some version of this error when attempting to access:

openai-function-call.onrender.com is classified as a potential security risk and access is restricted

May want to publish to a different domain/github pages?

Help: Reorganize module strucutre

would be nice to have a structure where theres a directory per example so we can have a readme.md for each example and a list of evals to run.

Wrong package name in README

MY BAD :I(

create func of ChatCompletion does not return completion if self.function is None

in dsl/completion.py shouldn't create return completion?

def create(self):
"""
Create a chat response from the OpenAI API

    Returns:
        response (OpenAISchema): The response from the OpenAI API
    """
    kwargs = self.kwargs
    completion = openai.ChatCompletion.create(**kwargs)
    if self.function:
        return self.function.from_response(completion)
   **return completion**

Base example doesnt work?

Hi jason, watched your Pydantic talk and thought I'd check it out. Seems like a fantastic idea but on openai==1.1.0 and instructor==0.3.0 raises a TypeError. This of course does not arise when using the "unpatched" openai client and sending the request, without the response_model kwarg

user = client.chat.completions.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'classmethod' object is not callable

thanks! and great talk

Default parameters to pydantic model

Is your feature request related to a problem? Please describe.
I'm always frustrated when I need send default parameters to pydantic response_model

Describe the solution you'd like
I want to send for example default sex to model (don't extract data with ChatCompletion), because I know Jason's sex 😄 :

class UserDetail(BaseModel, sex):
    weight: int
    sex: str
    def is_obese(self):
        if self.sex=='female' and self.weight>100:
            return True
        if self.sex=='male' and self.weight>120:
            return True
        return False


user: UserDetail = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    parameters={'sex': 'male'}
    messages=[
        {"role": "user", "content": "Extract Jason 200kg"},
    ]
)

Describe alternatives you've considered
I'd considered create another pydantic class to complete properties for user. But it is not correct way, because UserDetail should have all user properties, some extracted from ChatCompletion and others send for me.

Maybe, I have lost something. I'm not expert using pydantic. If you can share me another option I would be grateful.

Function does not obey Enums

Describe the bug
I set a enum for one of the function inputs.
I have a pydantic class that refers to the enum.
The output args show that the enum is not followed.

Expected behavior
I would expect that the generated args obey the enum I set for that field.

Documentation: add some details on prompting other models using ```json and other tricks.

Bugs in `Example 2: Schema Extraction`

There are two bugs in Example 2: Schema Extraction.

There's a missing comma character after functions=[UserDetails.openai_schema]
Missing import, from pydantic import Field

Help: Add links to the advanced usage

adding the links from files to the readme would be helpful, also adding more code snippets.

Does instructor support Azure OpenAI API ?

When I use Azure OpenAI, I often encounter errors, but occasionally it succeeds. I am not sure if the current instructor can use the Azure OpenAI API. Below is the function and frequent error message.

new_updates = openai.ChatCompletion.create(
        response_model=Report,
        deployment_id= dep.GPT_4,
        max_retries=2,
        messages=[
                {
                    "role": "system",
                    "content": SYSTEM_PROMPT_KG_SYT
                },
                {
                    "role": "user",
                    "content": f"""Extract any new events from the following:
                    # Part {i}/{num_iterations} of the input:

                    {inp}"""
                },
                {
                    "role": "user",
                    "content": f"""Here is the current state of the report:
                    {cur_state.model_dump_json(indent=2)}"""
                }
            ],
        
    )  # type: ignore

Describe the bug
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Traceback (most recent call last):
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 92, in
ade_report: Report = generate_report(text_chunks)
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 47, in generate_report
new_updates = openai.ChatCompletion.create(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 162, in new_chatcompletion_sync
response, error = retry_sync(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 117, in retry_sync
response = func(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 299, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 710, in _interpret_response
self._interpret_response_line(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'

Desktop (please complete the following information):

OS: Windows

Additional context
Azure OpenAI version : 2023-08-01-preview

Changing Patch behavior

I think there are a few ways to add the response_model and other capabilies.

Monkey Patch Global

import instructor

instructor.patch()

resp = openai.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)

Monkey Patch Context

with instructor.patch():
      resp = openai.ChatComplete.create(..., response_model=Model)
      assert isinstance(resp, Model)

Import Custom SDK

from instructor import client (#as openai)

resp = client.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)

I think we need to be concious of how other tools also patch the client.

pip install instructor has dependency conflicts in Colab

Describe the bug

Running !pip install instructor in Colab creates the following dependency conflicts:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.

To Reproduce
Steps to reproduce the behavior:

Create a new colab
Run !pip install instructor

Expected behavior
Clean install without dependency conflicts.

jxnl / instructor Goto Github PK

instructor's Introduction

Instructor: Structured LLM Outputs

Key Features

Get Started in Minutes

Using Anthropic Models

Using Cohere Models

Using Litellm

Type are inferred correctly

Calling create

Handling async: await create

Returning the original completion: create_with_completion

Streaming Partial Objects: create_partial

Streaming Iterables: create_iterable

Contributing

CLI

License

Contributors

instructor's People

Contributors

Stargazers

Watchers

Forkers

instructor's Issues

Success criteria

How it is right now:

How I expect it:

openai-function-call.onrender.com is classified as a potential security risk and access is restricted

Monkey Patch Global

Monkey Patch Context

Import Custom SDK

Recommend Projects

Recommend Topics

Recommend Org

Calling `create`

Handling async: `await create`

Returning the original completion: `create_with_completion`

Streaming Partial Objects: `create_partial`

Streaming Iterables: `create_iterable`