Giter Site home page Giter Site logo

berriai / litellm Goto Github PK

View Code? Open in Web Editor NEW
8.7K 8.7K 962.0 231.19 MB

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

Home Page: https://docs.litellm.ai/docs/

License: Other

Python 91.60% Dockerfile 0.11% Shell 0.12% HTML 1.89% JavaScript 0.15% CSS 0.01% TypeScript 6.07% Smarty 0.06%
anthropic langchain langchain-python llm llmops openai

litellm's People

Contributors

bufferoverflow avatar canada4663 avatar clarkbenham avatar coconut49 avatar dependabot[bot] avatar dragosmc91 avatar geekyayush avatar ishaan-jaff avatar krrishdholakia avatar kylehh avatar manouchehri avatar mateocamara avatar maxdeichmann avatar mc-marcocheng avatar n1lanjan avatar nirga avatar psu3d0 avatar renalu avatar samyxdev avatar shaunmaher avatar ti3x avatar toniengelhardt avatar udit-001 avatar ushuz avatar vincelwt avatar vivek-athina avatar wallies avatar williamespegren avatar yujonglee avatar zakhar-kogan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

litellm's Issues

expose error types

expose error types so people can try-except based off of them, instead of needing to import from openai

Screenshot 2023-08-07 at 11 28 33 AM

return usage for all providers - As OpenAI does

OpenAI returns usage for requests, but other providers - eg. Cohere does not.

{
  "id": "chatcmpl-7hPIZUYBst5jQA4odnYgxgZ9iPZ51",
  "object": "chat.completion",
  "created": 1690579707,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm an AI language model, so I don't have feelings, but I'm here to help you with any questions or conversations you have. How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 38,
    "total_tokens": 51
  }

Current Cohere return val from litellm - it's missing usage - which I need to calculate costs $

{'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': cohere.Generation {
	id: 3369b9a7-5755-4a4b-9da6-1ebab7c7924a
	prompt: Hello, how are you?
	text:  I am doing well, thank you. How can I help you today?
	likelihood: None
	finish_reason: None
	token_likelihoods: None
}, 'role': 'assistant'}}]}

Add the ability to set the key based on the model

Usage

litellm.set_key(model)

Reason:
Not everyone sets their keys in OPENAI_API_KEY, eg openai.api_key = get_api_key("OPENAI")

litellm should handle setting the appropriate env variables for a given model

Map timeout exceptions to something readable

cc @krrishdholakia since you were working on exceptions

It's really hard to detect if there was a timeout exception when i use litellm

I want something like this - just an easy way to detect if there was a timeout error.

    except Exception as e:
        print(f"in replicate llama, got error {e}")
        pass
        if e == "FunctionTimedOut":
            pass
        else:
            pytest.fail(f"Error occurred: {e}")

currently I need to go and understand how func_timeout works to catch the timeout error😭

update docs on new way of calling llm_client

from main import completion 

main.success_callback=["posthog"]
main.failure_callback=["posthog", "sentry", "slack"]

response = completion(model="gpt-3.5-turbo", messages=messages)

Guarantee format of exceptions

each LLM provider has a diff exception format, most builders need to implement their own try/except handling for each provider they use

handle max tokens

Anthropic has:

completion = anthropic.completions.create(
    model="claude-2",
    max_tokens_to_sample=300,
    prompt=f"{HUMAN_PROMPT} how does a court case get to the Supreme Court? {AI_PROMPT}",
)

Replicate has:

    output = replicate.run(
      "replicate/flan-t5-xl:eec2f71c986dfa3b7a5d842d22e1130550f015720966bec48beaae059b19ef4c",
      # "replicate/llama-7b:2014ee1247354f2e81c0b3650d71ca715bc1e610189855f134c30ecb841fae21",
      input={
        "prompt": prompt,
        "max_length": 256
      })

i believe they both achieve similar purposes.

Link to docs: Anthropic

Replicate

fix response format response

a common way to parse the openai response is

response.choices[0].message.content

this doesn't work with litellm. only this works -

['choices'][0]['message']['content']

check valid api key

Someone can use the openai sdk to validate access key through something like this. This is something we should be able to support

 def verify_access_key(self):
        """
        Verify the access key is valid.

        Returns:
            bool: True if the access key is valid, False otherwise.
        """
        try:
            models = openai.Model.list()
            return True
        except Exception as exception:
            logger.info("OpenAi Exception:", exception)
            return False

Integrate with my secret manager

When I use litellm, I don't want to always read in my secrets. IT should just check my secret manager for keys.

Magic wand moment:

import litellm

litellm.secret_manager = InfisicalClient(token="good_morning ")

response = litellm.completion(model="gpt-3.5-turbo", messages=messages)

Build out async functionality

return acreate functions instead of a wrapped create function for the providers that support returning async functions

Handle add args to Non OpenAI models

for ex if user makes a call

response = completion(
    model='command-nightly'
    messages=messages,
    max_tokens=self.config['max_tokens'],
    temperature=self.config['temperature'],
    top_p=self.config['top_p'],
    request_timeout=self.config['request_timeout'],
)

this could potentially fail, ensure it does not

modify additional_details for llm_client

I don't really want to create and maintain separate dictionaries across my code as i'm calling llm_client.completions.. can i not just set identifiers in one place and be done with it?

I like how sentry does this

from sentry_sdk import set_user

set_user({"email": "[email protected]"})

Track pricing per request

I need to map a completion / embedding request to $. Need a simple way to get '$' cost and write the cost to my DB.

Currently doing


input_text = " ".join([message["content"] for message in messages])
input_tokens = count_tokens(input_text)

response_text = response['choices'][0]['message']['content']
response_tokens = count_tokens(response_text)

input_tokens_cost = input_tokens_cost_map[model]
output_tokens_cost = output_tokens_cost_map[model]

total_cost = input_tokens * input_tokens_cost + response_tokens * output_tokens_cost

        
############### MODEL Cost Mapping ##################
input_tokens_cost_map = {
   'gpt-3.5-turbo': 0.0015,
   'gpt-4': 0.03,
   'chatgpt-test': 0.0015,
   'chatgpt-v-2': 0.0015,
}


output_tokens_cost_map = {
   'gpt-3.5-turbo': 0.002 ,
   'gpt-4': 0.06,
   'chatgpt-test': 0.002,
   'chatgpt-v-2': 0.002,
    
}

#####################################################

Do you address token counting?

Not sure if litellm is in the right layer to count tokens. But if we switch to using litellm rather than implement our own support for multiple models, it is not as simple as replacing the completion function calls. We also need to count tokens used by different models while they rely on different tokenizers.

In any event, please keep litellm as a thin layer as possible. Don't take it if you believe it is not the right job for litellm. That would ruin litellm easier than lacking functionality. Consider adding some util functions? Happy to discuss.

Note: we are building DevChat, a potential user of litellm. Thanks for your great work!

Simplify llm_client

Critiques of existing approach:

  • maintaining 2 functions with the same args where 1 calls the other, etc. is difficult β†’ leads to mapping errors, etc.
    • Since litellm_client.completion is essentially a wrapper around the raw completion function β†’ why not treat it like one

v1?

  • how do you simplify the function calling but keep the ease of letting me send my data to other places?
from main import completion 

main.success_callback=["posthog"]
main.failure_callback=["posthog", "sentry", "slack"]

response = completion(model="gpt-3.5-turbo", messages=messages)

Just create a wrapper around the completion and embedding endpoints, which should help simplify the core calling logic

Add state handlers / observability

When I call litellm there's always just 2 things I need to know - did the query succeed or fail?

If it succeeds I'll log it to PostHog

If it fails I'll log it to PostHog + Slack

Wonder if litellm can help simplify this for me.

add aispend integration

get the API token from here: https://aispend.io/profile

then run https://aispend.io/api/v1/accounts to get the account ID, then post to https://aispend.io/api/v1/accounts/YOUR-ACCOUNT-ID/data and push cost data. I've included an example structure in the Postman request

[
    {
        "requests": 10,
        "requests_context": 10,
        "context_tokens": 102,
        "requests_generated": 10,
        "generated_tokens": 1053,
        "recorded_date": "2023-07-31",
        "model_id": "gpt-3.5-turbo",
        "generated_tokens_cost_usd_cent": 23,
        "context_tokens_cost_usd_cent": 20
    },
    {
        "requests": 20,
        "requests_context": 20,
        "context_tokens": 1234,
        "requests_generated": 10,
        "generated_tokens": 1234,
        "recorded_date": "2023-07-30",
        "model_id": "gpt-3.5-turbo",
        "generated_tokens_cost_usd_cent": 13,
        "context_tokens_cost_usd_cent": 30
    }
]

Add support for replicate [code in ticket]

Here's my code for replicate:

elif "replicate" in model: 
    prompt = " ".join([message["content"] for message in messages])
    output = replicate.run(
      model,
      input={
        "prompt": prompt,
      })
    print(f"output: {output}")
    response = ""
    for item in output: 
      print(f"item: {item}")
      response += item
    new_response = {
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "message": {
              "content": response,
              "role": "assistant"
          }
        }
      ]
    }
    print(f"new response: {new_response}")
    response = new_response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.