gkamradt / langchain-tutorials Goto Github PK

View Code? Open in Web Editor NEW

6.5K 106.0 1.9K 43.19 MB

Overview and tutorial of the LangChain Library

Jupyter Notebook 94.25% Python 5.29% Shell 0.46%

langchain-tutorials's Introduction

Learn LangChain

Overview, Tutorial, and Examples of LangChain

See the accompanying tutorials on YouTube

If you want to get updated when new tutorials are out, get them delivered to your inbox

If you're new to Jupyter Notebooks or Colab, check out this video

New To LangChain?

Recommended Learning Path:

LangChain CookBook Part 1: 7 Core Concepts - Code, Video
LangChain CookBook Part 2: 9 Use Cases - Code, Video
Explore the projects below and jump into the deep dives

Prompt Engineering (my favorite resources):

Prompt Engineering Overview by Elvis Saravia
ChatGPT Prompt Engineering for Developers - Prompt engineering basics straight from OpenAI
Brex's Prompt Engineering Guide

🤖 Project Gallery

🐇 Beginner = Entry level projects to practice LangChain

🐒 Intermediate = In depth use of LangChain

🦈 Advanced = Advanced or custom implementations of LangChain

📝 Summarization - Deep Dive: Code, Video

Project	Contact	Difficulty	Open Sourced?	Notes
SummarizePaper.com	Quentin Kral	🐒 Intermediate	✅ Code	Summarize arXiv papers

❓ Question and Answering Over Documents

Project	Contact	Difficulty	Open Sourced?	Notes
ChatPDF	Ashish Talati	🐒 Intermediate	✅ Code	Chat and Ask on your own data

📦 Extraction

Project	Contact	Difficulty	Open Sourced?	Notes
Kor	Eugene Yurtsev	🐒 Intermediate	✅ Code	This is a half-baked prototype that “helps” you extract structured data from text using large language models (LLMs) 🧩.
OpeningAttributes	@gregkamradt	🐇 Beginner	✅ Code	Extract technologies & tools from job descriptions

🔍 Evaluation

Project	Contact	Difficulty	Open Sourced?	Notes
Auto-Evaluator	@RLanceMartin	🦈 Advanced	✅ Code	Evaluate Q&A Chains

📊 Querying Tabular Data

Project	Contact	Difficulty	Open Sourced?	Notes
TBD

💻 Code Understanding

Project	Contact	Difficulty	Open Sourced?	Notes
TBD

🌐 Interacting with APIs

Project	Contact	Difficulty	Open Sourced?	Notes
TBD

💬 Chatbots

Project	Contact	Difficulty	Open Sourced?	Notes
LangChain ChatBot	David Peterson	🐒 Intermediate	✅ Code	Input your PDF documents and analyze, ask questions, or do calculations on the data.

🤖 Agents

Project	Contact	Difficulty	Open Sourced?	Notes
Agents Via Vocode	@vocode	🐒 Intermediate	✅ Code	Agents making phone calls to order pizza
NexusGPT	@achammah1	🐒 Intermediate		AI Freelancer Platform. Discord

👽 Other 👽

Project	Contact	Difficulty	Open Sourced?	Notes
Slack-GPT	@martinseanhunt	🐒 Intermediate	✅ Code	A simple starter for a Slack app / chatbot that uses the Bolt.js Slack app framework, Langchain, openAI and a Pinecone vectorstore to provide LLM generated answers to user questions based on a custom data set.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of updating code, better documentation, or project to feature.

Submit a PR with notes.

This repo and series is provided by DataIndependent and run by Greg Kamradt

langchain-tutorials's People

Contributors

Stargazers

Watchers

Forkers

vvr-rao cruiserein22 tan2line kumar045 theama1 esraamadi newmedia2 agplusman pivkindann sergtaima hectorabyx dadoo-ai damujen ardabck ishankgp jcgmer rajagopal17 mac4281 christophe-garon 5ong techthiyanes c00renut rpatil524 dendadon aldiakhou renoaldocosta mingleejiang jefedeoro konformal videogrammer hanalia reben80 rourkemind essejtobor steelblu ashahidul riteshji aidev97 refind-email nathankylesmith rakshit-ti updatedai jejanov fixingpixels cordo-van-saviour sdcodeman richardpeterson8 alekhyadaspet mammarai krishpop secretagentgit jjxu217 kumars99 octag0no gobozion joshuafortini2 jamesconway98 isamelb itrapnauskas fernandonula geniuszpp weryzebra-yue rahulm043 ahmedharbaoui balldekdee jianming admariner creyesbalza tooniez ssyzyg jason-luc jacobgoldenart bnodnarb saxoji znygithub rajvira10 faressouissi arnasltlt davidhutt lenowak avkumar sankeerthrao alwinraju abonia1 gabrielserrao e46humza taltaf913 orinthianblade worktimer theogbrand chips5 sujnesh-m narender-geniemode bmwas jiangcongtao andreabos asanchez75 wuaikaiyuan jovanta oruaro-o

langchain-tutorials's Issues

pinecone now closed to free users - can you show how to use this with another alt system

They just closed pinecone's free system to now a waiting list ( and it is $70/month for the next level) - can some explain how to get this working with an alternative system to pinecone.

Run 5 levels of summarization-level 3 mapreduce code got an error

when i run level 3 map reduce code, i got an error like :

ValueError: OpenAIChat currently only supports single prompt, got

I think it comes form code:

output = summary_chain.run(docs)

i have search from the interenet, and still not find a solution, so how can i solved this.

my local environment:

Python 3.9.5 (default, May 18 2021, 12:31:01)
langchain 0.0.167
openai 0.27.6

Embed_with_retry in 4.0 seconds as it raised RateLimitError following "Ask A Book Questions.ipynb"

Running the exact same code as "Ask A Book Questions.ipynb", I ran into the following error:

Diagnosis (they all return the above error):

I topped up my OpenAI account for $20. The request didn't charge a single dollar from my account. I guess it's not my account's problem.
I reduced the size of the PDF to one sentence "This is a stupidly small file". I guess it's not due to the size of the texts.

What is the problem?

TypeError: issubclass() arg 1 must be a class

I try to run cookbook 1 and get this error: TypeError: issubclass() arg 1 must be a class

Any idea how to solve this?

Ask question that can retrieve answer from multiple chunks

Suppose i have multiple chunks and I want to build an application where I ask questions that require it to fetch across multiple chunks. For example, I have detailed experience reports of a trek from 100 people and i want to query how many of them went prepared with a first aid kit and how many of them needed to use it. What type of chunking and retrieval is the most appropriate for it?

Add 'pip install' step to top of notebooks

It would be helpful to explicitly add a !pip install langchain openai cell to the top of the LangChain Cookbooks. Otherwise, users have to play whack-a-mole with installing some packages as they work down the notebook.

'UnstructuredPDFLoader' is not defined

Hallo
If I try to execute the tutorial either on colab or local I always get the following error
NameError: name 'UnstructuredPDFLoader' is not defined
even if I install all packages as shown on https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/unstructured_file.html

Code Understanding Use case, running in limit of max tokens

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4114 tokens (3858 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Not sure how to reduce the max_tokens, or prompt size.

Langchain Cookbook Part 1: The VectorStore object not used in the VectorStores section

Thanks for the cookbook. Pretty insightful.

In the section for VectorStores (under Indexes), the embeddings of the text are created using
embeddings.embed_documents()
but the vectorstore (FAISS) class is imported but not used as:
db = FAISS.from_documents(texts, embeddings).

Maybe the section should include creation of the vectorstore and its usage

ValidationError: 1 validation error for SQLDatabaseToolkit

I got this error. The version of langchain is 0.0.169. I want to know how to fix this error.

ValidationError Traceback (most recent call last)
Cell In[2], line 3
1 # db = SQLDatabase.from_uri("sqlite:///../../../../notebooks/Chinook.db")
2 db = SQLDatabase.from_uri("sqlite:///../../notebooks/Chinook.db")
----> 3 toolkit = SQLDatabaseToolkit(db=db)
4 agent_executor = create_sql_agent(
5 llm = OpenAI(temperature=0, openai_api_key="sk-xxx"),
6 toolkit=toolkit,
7 verbose=True
8 )

File ~/miniconda3/envs/aigc/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.init()

ValidationError: 1 validation error for SQLDatabaseToolkit
llm
field required (type=value_error.missing)

Mongo Loader

Hi. I have a quick question. Do you have any example with the SimpleMongo DB loader? have you tested this connector?
I am finding difficulties to connect on Mongo db with username and password as per documentation is using only host and port. And also can i use the html reader to read a website and using llang agent to store content locally?

How to control the length of final summary using map_reduce?

Sometimes I find the final summary is too short, how can make it longer?
Thanks!

AttributeError: 'tuple' object has no attribute 'page_content' when running a `load_summarize_chain` on an my Document generated from PyPDF Loader

Code:

loader_book = PyPDFLoader("D:/PaperPal/langchain-tutorials/data/The Attention Merchants_ The Epic Scramble to Get Inside Our Heads ( PDFDrive ) (1).pdf")
test = loader_book.load()
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(test[0])

I get the following error even when the test[0] is a Document object

> Entering new MapReduceDocumentsChain chain...
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?6f60f6d3-3206-4586-b2b2-d8a0f86e1aa0)---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[d:\PaperPal\langchain-tutorials\chains\Chain](file:///D:/PaperPal/langchain-tutorials/chains/Chain) Types.ipynb Cell 19 in ()
----> [1](vscode-notebook-cell:/d%3A/PaperPal/langchain-tutorials/chains/Chain%20Types.ipynb#X16sZmlsZQ%3D%3D?line=0) chain.run(test[0])

File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:213](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:213), in Chain.run(self, *args, **kwargs)
    211     if len(args) != 1:
    212         raise ValueError("`run` supports only one positional argument.")
--> 213     return self(args[0])[self.output_keys[0]]
    215 if kwargs and not args:
    216     return self(kwargs)[self.output_keys[0]]

File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:116](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:116), in Chain.__call__(self, inputs, return_only_outputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116     raise e
    117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
    118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:113](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:113), in Chain.__call__(self, inputs, return_only_outputs)
    107 self.callback_manager.on_chain_start(
    108     {"name": self.__class__.__name__},
    109     inputs,
    110     verbose=self.verbose,
    111 )
...
--> 141         [{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs]
    142     )
    143     return self._process_results(results, docs, token_max, **kwargs)

AttributeError: 'tuple' object has no attribute 'page_content'

NameError: name 'chain' is not defined

Hello! I am receiving this NameError after this line: chain.run(input_documents=docs, question=query)

NameError Traceback (most recent call last)
Cell In[32], line 1
----> 1 chain.run(input_documents=docs, question=query)

NameError: name 'chain' is not defined

In GPT4 I get this answer:

The NameError indicates that the interpreter is unable to find a defined variable or function named chain. This error occurs when the name is not defined in the current scope, or there is a typo in the name.

To fix the error, you need to ensure that the variable chain is defined in the current scope. Check to see if you have defined chain earlier in the code or in a different module that you may have forgotten to import.

In some cases, running the cells from the beginning after selecting "Restart & Clear Output" may resolve the issue [1]. It may also be helpful to review the documentation for the package or module you are using to ensure that you are using it correctly. Checking for any typos in the variable name may also help resolve the issue.

Overall, the NameError indicates that the interpreter is unable to find a defined variable or function. To fix the error, ensure that the name is defined in the current scope or imported from another module, and check for any typos in the name.

Question on Pinecone index in 'Ask A Book Questions.ipynb'

Hey Gregory,

Thank you for the great series on YouTube. I have a question regarding the notebook 'Ask A Book Questions.ipynb' that you used to demonstrate querying some custom knowledge from PDF files.

In the 11th cell, you used a code to load the vectors into Pinecone:

docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

Subsequently, you used docsearch again in your query:

query = "What are examples of good data science teams?"
docs = docsearch.similarity_search(query, include_metadata=True)

My question is would this be using the index from Pinecone? In your example here, you've loaded the vector into Pinecone earlier so the data is already in docsearch but for a use case where you would want to read the index directly without loading any documents from Pinecone, would you use from_existing_index instead? E.g.:

docsearch = Pinecone.from_existing_index(pinecone_index_name, embeddings)

UnstructuredPDFLoader zipfile.BadZipFile: File is not a zip file

Hi there, I was trying Ask a book question tutorial. However I was stuck in the third line
data = loader.load().
Do you have any idea why it says my document was not a zip file? It is loading a PDF actually.
here is the stacktrace:

Traceback (most recent call last):
  File "/Users/serena/Documents/langchain-tutorials/data_generation/chatPDF.py", line 5, in <module>
    data = loader.load()
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/document_loaders/unstructured.py", line 61, in load
    elements = self._get_elements()
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/document_loaders/pdf.py", line 27, in _get_elements
    from unstructured.partition.pdf import partition_pdf
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/pdf.py", line 19, in <module>
    from unstructured.partition.text import partition_text
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/text.py", line 16, in <module>
    from unstructured.partition.text_type import (
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/text_type.py", line 21, in <module>
    from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/nlp/tokenize.py", line 32, in <module>
    _download_nltk_package_if_not_present(package_name, package_category)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/nlp/tokenize.py", line 21, in _download_nltk_package_if_not_present
    nltk.find(f"{package_category}/{package_name}")
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 555, in find
    return find(modified_name, paths)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 542, in find
    return ZipFilePathPointer(p, zipentry)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/compat.py", line 41, in _decorator
    return init_func(*args, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 394, in __init__
    zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/compat.py", line 41, in _decorator
    return init_func(*args, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 935, in __init__
    zipfile.ZipFile.__init__(self, filename)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Unclear input and passed variables

Hi, I am goint through your email tutorial and one I like it a lot however one thing remains unclear and it is left without any comment. Perhpas it would be good to clarify?

Input variables are 'input_documents', 'company' etc.. but map template uses 'text' as well as reduce template uses 'text'

Which text is it? is it the same value? I guess not, but it is not mentioned and actually this is the only part of your code that is leaving me with some questions.

Is it recognized by position?

Which python version does these program run?

Would be nice to lock the versions in the Requirement file version, i.e. via Pip freeze because the long chain is changing so much and breaking sometime.

NewConnectionError

MaxRetryError: HTTPSConnectionPool(host='controller.pinecone_api_env.pinecone.io', port=443): Max retries exceeded with url: /databases (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe2b3ed8250>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Ask a Book Questions - errors in Pinecone integration

https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Ask%20A%20Book%20Questions.ipynb

The notebook is out of sync with the current version of Pinecone. Here are some thoughts:

from langchain.vectorstores import Chroma, Pinecone: I think it's better to install langchain_pinecone and use from langchain_pinecone import PineconeVectorStore: https://python.langchain.com/docs/integrations/vectorstores/pinecone
Using the latest Pinecone client installed by pip install pinecone-client, you'll need to change import pinecone to from pinecone import Pinecone. When creating an index, you'll also need to import the ServerlessSpec or PodSpec class.
With an up-to-date client, initialization is a little different and doesn't include environment. Instead of:

pinecone.init(
     api_key=PINECONE_API_KEY,  # find at app.pinecone.io
     environment=PINECONE_API_ENV  # next to api key in console
)

you use:

pc = Pinecone(api_key=PINECONE_API_KEY)

Also, the notebook seems to assume a Pinecone index already exists. If you want to point people at guidance on creating indexes, you could use https://docs.pinecone.io/docs/manage-indexes#create-a-serverless-index.

How to report a mistake in documentation?

Hello lang-chain devs,

How do I report a mistake in documentation? I was not sure if this is the right forum for pointing this out.

The sentence in documentation should probably read:

A text embedding model takes a piece of text as input and returns a numerical representation of that text in the form of a list of floats.

The original documentation is missing "returns a".

fix spelling mistake in README.md

Feature Request: Add a source/citation

I know we can print the documents that match the question via Pinecone, but it would be great to be able to print a citation or source that was used to determine the final answer if this can be added as a feature? Great work btw.

How to use map_reduce in RetrievalQA chain

I want to use RetrievalQA chain to achieve a QA about PDF，how I don't know why my response always be split although
it‘s total tokens just 1800 or 2500 such as

so I want to use a map_reduce，but it give me a error

and my code is
`
top_matches = vector_db.similarity_search_with_score(query=question, k=int(top_k))
top_matches_contexts = " "
print("top_matches",top_matches)

print("top_k",top_k)
for i, val in enumerate(top_matches):
    top_matches_contexts += "{}、{}\n".format(i+1, val[0].page_content)
if top_matches_contexts == []:
    return "请详细描述你的问题"

top_matches_contexts = remove_spaces_and_newlines(top_matches_contexts)



global language
query = prompt.format(query=question, reference=top_matches_contexts,key_word=key_word,language=language)


# chat_history,top_matches_contexts = deal_total_token(chat_history,query)


qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0.0,openai_api_key="XXXXX"), chain_type="map_reduce",
                                       retriever=vector_db.as_retriever())

result = qa_chain({"query": query,"chat_history": chat_history})
count_tokens(qa_chain,query)

predict_and_parse is depracated

chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')
output = chain.predict_and_parse(text="please add 15 more units sold to 2023")['data']

printOutput(output)

Running this code block throws a TypeError : initial_value must be str or None, not dict.

I was able to get output in json format using:

chain.predict(text=text)["data"]

could not get the answer 61-- the largest prime number that is smaller than their age

when i run the notebook :https://github.com/gkamradt/langchain-tutorials/blob/main/getting_started/Quickstart%20Guide.ipynb

Chains: Combine LLMs and prompts in multi-step workflows

`
#!pip install google-search-results
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI

Load the model

llm = OpenAI(temperature=0)

Load in some tools to use

os.environ["SERPAPI_API_KEY"] = ""
tools = load_tools(["serpapi", "llm-math"], llm=llm)

Finally, let's initialize an agent with:

1. The tools

2. The language model

3. The type of agent we want to use.

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

See list of agents types here

Now let's test it out!

agent.run("Who is the current leader of Japan? What is the largest prime number that is smaller than their age?")

here is the return:

Entering new AgentExecutor chain...
I need to find out who the leader of Japan is and then calculate the largest prime number that is smaller than their age.
Action: Search
Action Input: "current leader of Japan"
Observation: Fumio Kishida
Thought: I need to find out the age of the leader of Japan
Action: Search
Action Input: "age of Fumio Kishida"
Observation: 65 years
Thought: I need to calculate the largest prime number that is smaller than 65
Action: Calculator
Action Input: 65

ValueError Traceback (most recent call last)
Cell In[18], line 2
1 # Now let's test it out!
----> 2 agent.run("Who is the current leader of Japan? What is the largest prime number that is smaller than their age?")

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:213, in Chain.run(self, *args, **kwargs)
211 if len(args) != 1:
212 raise ValueError("run supports only one positional argument.")
--> 213 return self(args[0])[self.output_keys[0]]
215 if kwargs and not args:
216 return self(kwargs)[self.output_keys[0]]

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:116, in Chain.call(self, inputs, return_only_outputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116 raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:113, in Chain.call(self, inputs, return_only_outputs)
107 self.callback_manager.on_chain_start(
108 {"name": self.class.name},
109 inputs,
110 verbose=self.verbose,
111 )
112 try:
--> 113 outputs = self._call(inputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/agent.py:792, in AgentExecutor._call(self, inputs)
790 # We now enter the agent loop (until it returns something).
791 while self._should_continue(iterations, time_elapsed):
--> 792 next_step_output = self._take_next_step(
793 name_to_tool_map, color_mapping, inputs, intermediate_steps
794 )
795 if isinstance(next_step_output, AgentFinish):
796 return self._return(next_step_output, intermediate_steps)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/agent.py:695, in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps)
693 tool_run_kwargs["llm_prefix"] = ""
694 # We then call the tool on the tool input to get an observation
--> 695 observation = tool.run(
696 agent_action.tool_input,
697 verbose=self.verbose,
698 color=color,
699 **tool_run_kwargs,
700 )
701 else:
702 tool_run_kwargs = self.agent.tool_run_logging_kwargs()

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/tools/base.py:107, in BaseTool.run(self, tool_input, verbose, start_color, color, **kwargs)
105 except (Exception, KeyboardInterrupt) as e:
106 self.callback_manager.on_tool_error(e, verbose=verbose_)
--> 107 raise e
108 self.callback_manager.on_tool_end(
109 observation, verbose=verbose_, color=color, name=self.name, **kwargs
110 )
111 return observation

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/tools/base.py:104, in BaseTool.run(self, tool_input, verbose, start_color, color, **kwargs)
102 try:
103 tool_args, tool_kwargs = _to_args_and_kwargs(tool_input)
--> 104 observation = self.run(*tool_args, **tool_kwargs)
105 except (Exception, KeyboardInterrupt) as e:
106 self.callback_manager.on_tool_error(e, verbose=verbose)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/tools.py:31, in Tool._run(self, *args, **kwargs)
29 def _run(self, *args: Any, **kwargs: Any) -> str:
30 """Use the tool."""
---> 31 return self.func(*args, **kwargs)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/llm_math/base.py:130, in LLMMathChain._call(self, inputs)
126 self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
127 llm_output = llm_executor.predict(
128 question=inputs[self.input_key], stop=["```output"]
129 )
--> 130 return self._process_llm_result(llm_output)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/llm_math/base.py:86, in LLMMathChain._process_llm_result(self, llm_output)
84 answer = "Answer: " + llm_output.split("Answer:")[-1]
85 else:
---> 86 raise ValueError(f"unknown format from LLM: {llm_output}")
87 return {self.output_key: answer}

ValueError: unknown format from LLM: This is not a math problem and cannot be translated into an expression that can be executed using Python's numexpr library.`

Setup "Ask A Book Questions" for Google Colab

Hello,
let me first of all say, you have created a great tutorial on how to create a Q&A engine for any pdf-document based knowledge base. I love it!!!
I setup a Google Colab notebook to replicate your tutorial and came across quite a few issues during the environment setup.
The following screenshot shows all setup tasks needed to make it run successfully on Google Colab. I hope other readers find this useful.

If I have some mapping already, how let the LLM know and use existing data?

in Clean and Standardize Data section. If I have some mapping data already. And the amount exceed token limit so i can not use prompt example. Is there any method to let LLM know these datas and use them?

TextSplitter in different languages

https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/5%20Levels%20Of%20Summarization%20-%20Novice%20To%20Expert.ipynb

For summarization methods above level 3, the best practice is not to use RecursiveCharacterTextSplitter, but TokenTextSplitter, because the number of tokens corresponding to the same length of string intercepted varies greatly from language to language.

text_splitter_by_char = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
text_splitter_by_token = TokenTextSplitter(chunk_size=3000, chunk_overlap=100)

If this is not taken into account, errors exceeding the max token count are likely to occur when processing text in multiple languages.

I have tested the number of tokens used for the same family of patents, in different languages:

English (US10901237B2)=21823 (100%)
Simplified Chinese (CN112904591A)=30901 (142%)
Traditional Chinese (TW201940135A)=36530 (167%)
Korean (KR20190089752A)=42644 (195%)
Japanese (JP2019128599A)=51430 (236%)

Minor markdown change in Twitter Reply Bot ipynb file

Additional questions on the summarisation tutorial

Hey there

Thanks for putting this together. I had the same conclusion regarding the summarisation of a large document, in terms of splitting, then embedding, and then ranking the sections and choosing the most relevant for a map_reduce.

However, I've been scouring the net and racking my brains to find a splitter that would work according to theme (eg. keyword density) or being able to identify chapter/section breaks without having to pre-define what the markup would look like.

Is there a python tool or form of analysis that can segment a text document into smaller part more intelligently than a character length breakpoint?

Thanks :)

The class CallbackManager had moved from langchain.callbacks.base to langchain.callbacks.manager

When I ran the code for 'With Streaming' in ChatAPI + LangChain Basics.ipynb, I encountered an error: 'cannot import name 'CallbackManager' from 'langchain.callbacks.base'.'
Upon further investigation in the LangChain documentation, I discovered that the package containing CallbackManager has been modified.

ValidationError: 1 validation error for LLMChain

I have had this model working and it is great but now im getting different error messages.

ValidationError Traceback (most recent call last)
Cell In[72], line 2
1 llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
----> 2 chain = load_qa_chain(llm, chain_type="stuff")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/question_answering/init.py:218, in load_qa_chain(llm, chain_type, verbose, callback_manager, **kwargs)
213 if chain_type not in loader_mapping:
214 raise ValueError(
215 f"Got unsupported chain type: {chain_type}. "
216 f"Should be one of {loader_mapping.keys()}"
217 )
--> 218 return loader_mapping[chain_type](
219 llm, verbose=verbose, callback_manager=callback_manager, **kwargs
220 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/question_answering/init.py:63, in _load_stuff_chain(llm, prompt, document_variable_name, verbose, callback_manager, **kwargs)
54 def _load_stuff_chain(
55 llm: BaseLanguageModel,
56 prompt: Optional[BasePromptTemplate] = None,
(...)
60 **kwargs: Any,
61 ) -> StuffDocumentsChain:
62 _prompt = prompt or stuff_prompt.PROMPT_SELECTOR.get_prompt(llm)
---> 63 llm_chain = LLMChain(
64 llm=llm, prompt=prompt, verbose=verbose, callback_manager=callback_manager
65 )
66 # TODO: document prompt
67 return StuffDocumentsChain(
68 llm_chain=llm_chain,
69 document_variable_name=document_variable_name,
(...)
72 **kwargs,
73 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.init()

ValidationError: 1 validation error for LLMChain
prompt
none is not an allowed value (type=type_error.none.not_allowed)

ChatGPT4 response:

The error message indicates a ValidationError due to an invalid value for the prompt argument in the LLMChain constructor. Specifically, the error message states that None is not an allowed value for prompt. This error can occur if the prompt argument is not properly specified when creating an instance of the LLMChain class.

To fix this error, the prompt argument should be properly specified when creating an instance of the LLMChain class. This can be done by providing a valid value for the prompt argument that is not None. Additionally, the error message suggests that the value None is not an allowed value for the prompt argument, so it is important to consult the documentation or source code of the LLMChain class to determine what values are valid for the prompt argument.

SSL Error in example Ask A Book Questions

Hi there, thanks for solving my issue about loading PDF. I came across another issue and suspect it may relate to some python packages version.

I am trying Ask A Book Questions tutorial and get below error when executing this line: docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

Traceback (most recent call last):
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
    conn.connect()
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connection.py", line 362, in connect
    self.sock = ssl_wrap_socket(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/util/ssl_.py", line 386, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by SSLError(SSLError(1, '[SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/serena/Documents/langchain-tutorials/data_generation/chatPDF.py", line 33, in <module>
    docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/vectorstores/pinecone.py", line 235, in from_texts
    embeds = embedding.embed_documents(lines_batch)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/embeddings/openai.py", line 269, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/embeddings/openai.py", line 188, in _get_len_safe_embeddings
    encoding = tiktoken.model.encoding_for_model(self.model)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/model.py", line 75, in encoding_for_model
    return get_encoding(encoding_name)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/registry.py", line 63, in get_encoding
    enc = Encoding(**constructor())
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken_ext/openai_public.py", line 64, in cl100k_base
    mergeable_ranks = load_tiktoken_bpe(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 114, in load_tiktoken_bpe
    contents = read_file_cached(tiktoken_bpe_file)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 46, in read_file_cached
    contents = read_file(blobpath)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 24, in read_file
    return requests.get(blobpath).content
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/adapters.py", line 563, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by SSLError(SSLError(1, '[SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)')))

Appreciate your help in advance!

ValidationError: 1 validation error for FewShotPromptTemplate example_selector instance of BaseExampleSelector expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseExampleSelector)

Anyone know how I fix this error?

ValidationError: 1 validation error for FewShotPromptTemplate
example_selector
instance of BaseExampleSelector expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseExampleSelector)

Typo in the Cookbook part 1

Your vectorstore store your embeddings (☝️) and make "the" easily searchable
I guess it should be: "Your vectorstore store your embeddings (☝️) and make "them" easily searchable" :)
Thanks