jxnl / instructor Goto Github PK
View Code? Open in Web Editor NEWstructured outputs for llms
Home Page: https://python.useinstructor.com/
License: MIT License
structured outputs for llms
Home Page: https://python.useinstructor.com/
License: MIT License
Where should I put the few shot examples into the prompt to improve accuracy? Should I put it in the model docstring or somewhere else? Can you provide an example?
Thanks.
Would be cool to create an example where a fastapi and function call example shared the same object type and there was a great standardized way
Of allowing openai to call the endpoint as code
It will be nice to have Fact
generated with semantic citations (not the Regex-based ones that you have in the cookbook). We can do this with a custom validation function that invokes an LLM call.
Describe the bug
I get the following error: cannot import name 'FieldValidationInfo' from 'pydantic'.
When doing:
from instructor import OpenAISchema
To Reproduce
from instructor import OpenAISchema
Expected behavior
Expected it to not crash
Screenshots
Desktop (please complete the following information):
Version 0.2.8
Macbook Pro - Intel
Chrome
Is your feature request related to a problem? Please describe.
We need to be able to pass in the hyperparameters and validation file here:
https://github.com/jxnl/instructor/blob/main/instructor/cli/jobs.py#L135
It should basically look like: https://platform.openai.com/docs/api-reference/fine-tuning/create#fine-tuning-create-hyperparameters
Describe the solution you'd like
I was reading through the docs and saw https://jxnl.github.io/instructor/distillation/ . The page explains the "what" and the "how", but not the "why" - I assume this feature caters to some usecases, but it's not clear to me at all what those would be? The examples given seem like a ridiculously bad idea - replacing instantaneous, deterministic on-device calculations with slow, prone to hallucination api calls? Why would I ever want to use an LLM to perform simple math? I get they're just examples, but maybe it would be nice to have a paragraph explaining real usecases for this.
Is your feature request related to a problem? Please describe.
Would like to resolve dependency incompatibility between langchain
and openai_function_call
Describe the solution you'd like
langchain
and openai_function_call
to be compatiable
Describe alternatives you've considered
None
Additional context
Because no versions of openai-function-call match >0.2.0,<0.3.0
and openai-function-call (0.2.0) depends on pydantic (>=2.0.2,<3.0.0), openai-function-call (>=0.2.0,<0.3.0) requires pydantic (>=2.0.2,<3.0.0).
And because langchain (0.0.238) depends on pydantic (>=1,<2)
and no versions of langchain match >0.0.238,<0.0.239, openai-function-call (>=0.2.0,<0.3.0) is incompatible with langchain (>=0.0.238,<0.0.239).
So, because nira-ai depends on both langchain (^0.0.238) and openai-function-call (^0.2.0), version solving failed.
https://jxnl.github.io/instructor/#section-2-adding-additional-prompting Both options are shown, it's unclear if both can be used interchangeably? Or can OpenAISchema's only be used with function_call parameters of ChatCompletion s?
MY BAD :I(
Was getting error:
AttributeError: 'openai_function' object has no attribute 'schema'
Fixed by changing line 30 to:
assert message["function_call"]["name"] == self.openai_schema["name"], "Function name does not match"
Thanks for putting this up, this code is super useful.
openai_schema
removes properties/fields named title
from json schema
Example:
class Author(OpenAISchema):
"""Class representing an author.
This class is used to extract author's name and
poem's title from a text"""
name: str = Field(..., description="Name of the author")
title: str = Field(..., description="Title of the article")
Author.openai_schema
# output:
"""
{'name': 'Author',
'description': "Class representing an author. \nThis class is used to extract author's name and
\npoem's title from a text",
'parameters': {'type': 'object',
'properties': {'name': {'description': 'Name of the author',
'type': 'string'}},
'required': ['name']}}
"""
Describe the bug
The multi classification does not actually work as intended.
To Reproduce
I copy paste the example for multi prediction and the outputs result in all the labels being predicted always. No matter the classes declared, the prompt used, the result is the same. All classes are predicted.
examples/classification/multi_prediction.py
Hi jason, watched your Pydantic talk and thought I'd check it out. Seems like a fantastic idea but on openai==1.1.0 and instructor==0.3.0 raises a TypeError. This of course does not arise when using the "unpatched" openai client and sending the request, without the response_model kwarg
user = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'classmethod' object is not callable
thanks! and great talk
I think there are a few ways to add the response_model
and other capabilies.
import instructor
instructor.patch()
resp = openai.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)
with instructor.patch():
resp = openai.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)
from instructor import client (#as openai)
resp = client.ChatComplete.create(..., response_model=Model)
assert isinstance(resp, Model)
I think we need to be concious of how other tools also patch the client.
To be considered checkout : https://replit.com/bounties/@jxnl/streaming-json-parse
I'd like to have the capability of parsing functions calls as they stream out for MultiTask when doing streaming function calls. You can use any existing python library. Must work for nested and deep objects.
Below is some code that won't work, since theres no good way of doing this:
from pydantic import BaseModel
class Task(BaseModel):
id: int
title: str
# This is your existing generator that yields chunks of JSON string
def json_chunks(json_string):
for i in range(0, len(json_string), 5): # replace 5 with the chunk size you want
chunk = json_string[i:i+5]
print("yield chunk:", chunk)
yield chunk
def tasks_from_chunks(json_chunks: Generator[str, None, None]):
# do something to get a single task_json
task = Task.parse_raw(**task_json)
print("yield task", task)
yield task
json_string = '{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'
for task in tasks_from_chunks(json_chunks(json_string)):
print(task)
Run example in Azure openai, following error occurs:
openai.error.InvalidRequestError: Unrecognized request argument supplied: functions
Can someone give some opinions on this? Thanks in advance.
Describe the bug
Running !pip install instructor
in Colab creates the following dependency conflicts:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.
To Reproduce
Steps to reproduce the behavior:
!pip install instructor
Expected behavior
Clean install without dependency conflicts.
Describe the bug
Running !pip install instructor
in Colab creates the following dependency conflicts:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.8.0 which is incompatible.
To Reproduce
Steps to reproduce the behavior:
!pip install instructor
Expected behavior
Clean install without dependency conflicts.
There are two bugs in Example 2: Schema Extraction.
functions=[UserDetails.openai_schema]
from pydantic import Field
When getting the .openai_schema
from an OpenAISchema (BaseModel) class, if the class has a docstring, then that is used as the description. If there is no docstring, one is automatically added. The current default description (no docstring) is this - a description about the extraction process rather than a description of the object.
For example, if I define an Address as
class Address(City):
country: str
state: str
city: str
street: str
Then the .openai_schema
is
{'name': 'Address',
'description': 'Correctly extracted `Address` with all the required parameters with correct types',
'parameters': {'properties': {'country': {'type': 'string'},
'state': {'type': 'string'},
'city': {'type': 'string'},
'street': {'type': 'string'}},
'required': ['city', 'country', 'state', 'street'],
'type': 'object'}}
However, if I add a docstring to the type, like
class Address(City):
"""An address"""
country: str
state: str
city: str
street: str
then the .openai_schema
is
{'name': 'Address',
'description': 'An address',
'parameters': {'properties': {'country': {'type': 'string'},
'state': {'type': 'string'},
'city': {'type': 'string'},
'street': {'type': 'string'}},
'required': ['city', 'country', 'state', 'street'],
'type': 'object'}}
The current default string doesn't really have the same use case as the description when a docstring is present.
I think a better default description would be the empty string ("") or maybe just the class name. In most cases, I think it would be preferable that the language model is given no description of the type than one about the schema generation process.
When I use Azure OpenAI, I often encounter errors, but occasionally it succeeds. I am not sure if the current instructor can use the Azure OpenAI API. Below is the function and frequent error message.
new_updates = openai.ChatCompletion.create(
response_model=Report,
deployment_id= dep.GPT_4,
max_retries=2,
messages=[
{
"role": "system",
"content": SYSTEM_PROMPT_KG_SYT
},
{
"role": "user",
"content": f"""Extract any new events from the following:
# Part {i}/{num_iterations} of the input:
{inp}"""
},
{
"role": "user",
"content": f"""Here is the current state of the report:
{cur_state.model_dump_json(indent=2)}"""
}
],
) # type: ignore
Describe the bug
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
Traceback (most recent call last):
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 92, in
ade_report: Report = generate_report(text_chunks)
File "C:\Users\yubo.he\Desktop\LLM_AE_Extrator\run.py", line 47, in generate_report
new_updates = openai.ChatCompletion.create(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 162, in new_chatcompletion_sync
response, error = retry_sync(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\instructor\patch.py", line 117, in retry_sync
response = func(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 299, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 710, in _interpret_response
self._interpret_response_line(
File "C:\Users\yubo.he\AppData\Local\Continuum\anaconda3\envs\syngenta\lib\site-packages\openai\api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: 'content' is a required property - 'messages.3'
Desktop (please complete the following information):
Additional context
Azure OpenAI version : 2023-08-01-preview
I am using instructor = "^0.3.1" and openai = "^1.2.0".
I initialize my client as:
client = instructor.patch(AsyncOpenAI(
api_key=OPENAI_API_KEY,
))
And then call it as:
async def myfunc():
...
response = await client.chat.completions.create(
model=model_name,
messages=messages,
response_model=response_model, # type: ignore
max_retries=2
)
This gives me an error: Error in getting response from model: 'coroutine' object has no attribute 'choices'
.
I stepped through the code in a debugger and it seems like wrap_chatcomplete
wraps the AsyncOpenAI().chat.completion.create
as a sync function, not an async one?
for class ChatCompletion(BaseModel):
def or(self, other: Union[Message, OpenAISchema]) -> "ChatCompletion":
if isinstance(other, Message):
if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
self.system_message = other
should be
if isinstance(other, SystemMessage):
if self.system_message:
self.system_message.content += "\n\n" + other.content
else:
self.system_message = other
Sure! Here's the updated proposal where PromptConfig
has the model as a required argument and all other attributes as optional. The default model is set to "gpt3.5-turbo-0613":
from pydantic import BaseModel
from typing import Optional
class OpenAISchema(BaseModel):
class PromptConfig:
model: str = "gpt3.5-turbo-0613"
system: Optional[str]
message: Optional[str]
temperature: Optional[float]
max_tokens: Optional[int]
@classmethod
def from_response(cls, response):
# Implementation based on the actual response format.
@classmethod
def create(cls, message=None, *args, force_function=False, **kwargs):
messages = kwargs.get("messages", [])
if not messages and hasattr(cls, "PromptConfig"):
if cls.PromptConfig.system:
messages.append({
"role": "system",
"content": cls.PromptConfig.system
})
if cls.PromptConfig.message:
messages.append({
"role": "user",
"content": cls.PromptConfig.message
})
if message:
messages.append({
"role": "user",
"content": message
})
if force_function:
kwargs['function_call'] = {"name": cls.openai_schema["name"]}
kwargs['messages'] = messages
if hasattr(cls, "PromptConfig"):
kwargs.setdefault('model', cls.PromptConfig.model)
kwargs.setdefault('temperature', cls.PromptConfig.temperature)
kwargs.setdefault('max_tokens', cls.PromptConfig.max_tokens)
completion = openai.ChatCompletion.create(
functions=[cls.openai_schema],
**kwargs
)
return cls.from_response(completion)
class Search(OpenAISchema):
# Implementation remains the same
class MultiSearch(OpenAISchema):
class PromptConfig:
system = "You are a capable algorithm designed to correctly segment search requests."
message = "Correctly segment the following search request"
model = "gpt3.5-turbo-0613"
temperature = 0.5
max_tokens = 1000
# Implementation remains the same
# Example of usage:
queries = MultiSearch.create(
"Please send me the video from last week about the investment case study and also documents about your GPDR policy."
)
queries.execute()
This revision makes the PromptConfig
more flexible and easier to use with the default model set and all other parameters as optional. This configuration can be overridden on a per-class basis, as shown in the MultiSearch.PromptConfig
example.
Describe the bug
I set a enum for one of the function inputs.
I have a pydantic class that refers to the enum.
The output args show that the enum is not followed.
Expected behavior
I would expect that the generated args obey the enum I set for that field.
adding the links from files to the readme would be helpful, also adding more code snippets.
Is your feature request related to a problem? Please describe.
I want to be able to define a good old python function to use it both for the schema and execution, but if I want to add description to the parameters. Right now, I have to use a class definition. This could be solved by supporting the standard parameters parsing from docstrings.
Describe the solution you'd like
E.g., this should work:
@openai_function
def get_current_weather(
location: str, format: Literal["celsius", "fahrenheit"] = "celsius"
) -> WeatherReturn:
"""
Gets the current weather in a given location, use this function for any questions related to the weather
Parameters
----------
location
The city to get the weather, e.g. San Francisco. Guess the location from user messages
format
A string with the full content of what the given role said
"""
return WeatherReturn(
location=location,
forecast="sunny",
temperature="25 C" if format == "celsius" else "77 F",
)
But right now the description of the parameters goes into the function description, not into the parameters description.
{
'name': 'get_current_weather',
'description': '\n Gets the current weather in a given location, use this function for any questions related to the weather\n\n Parameters\n ----------\n location\n The city to get the weather, e.g. San Francisco. Guess the location from user messages\n\n format\n A string with the full content of what the given role said\n ',
'parameters': {
'properties': {
'location': {'type': 'string'},
'format': {
'default': 'celsius',
'enum': ['celsius', 'fahrenheit'],
'type': 'string'
}
},
'required': ['format', 'location'],
'type': 'object'
}
}
{
'name': 'get_current_weather',
'description': 'Gets the current weather in a given location, use this function for any questions related to the weather',
'parameters': {
'properties': {
'location': {
'description': 'The city to get the weather, e.g. San Francisco. Guess the location from user messages',
'type': 'string'
},
'format': {
'description': 'A string with the full content of what the given role said',
'default': 'celsius',
'enum': ['celsius', 'fahrenheit'],
'type': 'string'
}
},
'required': ['location'],
'type': 'object'
}
}
Is your feature request related to a problem? Please describe.
I see the library is tightly coupled with openai function calling. but it would be good to decouple the model from pydantic way of doing things and use any model (llms from langchain) that way we can experiment with smaller/self-hosted/other cloud models
Describe the solution you'd like
ability to pass pydantic structures to any llm and get results back. for eample, something like using langchain tools where function calling is isolated from llm.
Describe alternatives you've considered
custom tools in langchain implementation for function calling
Additional context
not sure its already possible. I haven't experimented yet, but it looks like its coupled based on the repo subtitles /examples
For those that need to add additional information for example things like LLMObs should be good to be able to add headers
It would allow us to make a. Diagram per example and show how modeling gets us these nice to haves.
Hello. Thanks for your great work on Instructor. Really appreciate that it's thoughtfully constructed for use in production.
I wanted to check what your plans are for the upcoming openai-python 1.0.0 release (openai/openai-python#631). Instructor currently has a dependency on <0.28.
Thanks!
re tweet at https://twitter.com/jxnlco/status/1677907692122259456?s=20
A number of people on Cisco-administered networks may see some version of this error when attempting to access:
May want to publish to a different domain/github pages?
We should upgrade to pydantic v2
Is your feature request related to a problem? Please describe.
I'm always frustrated when I need send default parameters to pydantic response_model
Describe the solution you'd like
I want to send for example default sex to model (don't extract data with ChatCompletion), because I know Jason's sex ๐ :
class UserDetail(BaseModel, sex):
weight: int
sex: str
def is_obese(self):
if self.sex=='female' and self.weight>100:
return True
if self.sex=='male' and self.weight>120:
return True
return False
user: UserDetail = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
parameters={'sex': 'male'}
messages=[
{"role": "user", "content": "Extract Jason 200kg"},
]
)
Describe alternatives you've considered
I'd considered create another pydantic class to complete properties for user. But it is not correct way, because UserDetail should have all user properties, some extracted from ChatCompletion and others send for me.
Maybe, I have lost something. I'm not expert using pydantic. If you can share me another option I would be grateful.
would be nice to have a structure where theres a directory per example so we can have a readme.md for each example and a list of evals to run.
there should be some tools that can get pydantic from openapi.json
would love to see an example like
endpoints = Endpoints.from("www.website.com/openapi.json")
completion = openai.ChatCompletion(
function_call=endpoints
...
)
Just to know stuff isn't breaking
Is your feature request related to a problem? Please describe.
I have a weirdish use case, where one of the fields of the pydantic model represents code.
The code is often returned with a bunch of invalid json characters in it, like control characters (\u0000-\u001F).
This makes instructor fail on errors like this:
File "/opt/homebrew/lib/python3.11/site-packages/pydantic/main.py", line 530, in model_validate_json return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for RustCode Invalid JSON: control character (\u0000-\u001F) found while parsing a string at line 4 column 0 [type=json_invalid, input_value='\n{\n"generated_code": "...xample_output": "37"\n}', input_type=str] For further information visit https://errors.pydantic.dev/2.4/v/json_invalid joelkronander@MacBook-Pro-5 swissknife %
Describe the solution you'd like
Maybe one could handle cases like this with some form of "pre-validators" that could for example run byte64 encoding on those non-json compatible strings? Not sure how it would fit in exactly.
Additional context
Instructor is nice.
We should add in some automation in actions to 'mkdocs gh-deploy' I wanna new tag when we publish a new version
Describe the bug
When Using the instructor, at some input. It will raise json error fault.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A way to fix the bug
Desktop (please complete the following information):
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core
Is your feature request related to a problem? Please describe.
The recent -instruct
models are instruction tuned rather than dialogue tuned and should be very useful for most use cases of this library.
class UserDetail(BaseModel):
name: str
age: int
user: UserDetail = openai.Completion.create(
model="gpt-3.5-turbo-instruct",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
This should work.
Describe the solution you'd like
Patch should also patch the openai.Completion.create
method.
Describe the bug
Is there a reason why Typer version ^0.4.0 is used while the latest version is 0.9.0 ?
It might conflict with other packages that required more recent version of Typer
So we can get started on building our examples and dsl docs.
in dsl/completion.py shouldn't create return completion?
def create(self):
"""
Create a chat response from the OpenAI API
Returns:
response (OpenAISchema): The response from the OpenAI API
"""
kwargs = self.kwargs
completion = openai.ChatCompletion.create(**kwargs)
if self.function:
return self.function.from_response(completion)
**return completion**
Suggestion:
parameters["required"] = sorted(k for k, v in parameters.get("properties", {}).items() if not "default" in v)
instead of
parameters["required"] = sorted(parameters["properties"])
That would allow us to:
data: Any = Field(None, description="Optional data attached")
Describe the bug
Link to examples in README is currently broken.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Links to https://jxnl.github.io/instructor/examples/
i think the 24 game or a cross word example would be awesome
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.