Giter Site home page Giter Site logo

langchain-tutorials's Introduction

Learn LangChain

Overview, Tutorial, and Examples of LangChain

See the accompanying tutorials on YouTube

If you want to get updated when new tutorials are out, get them delivered to your inbox

If you're new to Jupyter Notebooks or Colab, check out this video

New To LangChain?

Recommended Learning Path:

  1. LangChain CookBook Part 1: 7 Core Concepts - Code, Video
  2. LangChain CookBook Part 2: 9 Use Cases - Code, Video
  3. Explore the projects below and jump into the deep dives

Prompt Engineering (my favorite resources):

  1. Prompt Engineering Overview by Elvis Saravia
  2. ChatGPT Prompt Engineering for Developers - Prompt engineering basics straight from OpenAI
  3. Brex's Prompt Engineering Guide

๐Ÿค– Project Gallery

๐Ÿ‡ Beginner = Entry level projects to practice LangChain

๐Ÿ’ Intermediate = In depth use of LangChain

๐Ÿฆˆ Advanced = Advanced or custom implementations of LangChain

๐Ÿ“ Summarization - Deep Dive: Code, Video

Project Contact Difficulty Open Sourced? Notes
SummarizePaper.com Quentin Kral ๐Ÿ’ Intermediate โœ… Code Summarize arXiv papers

โ“ Question and Answering Over Documents

Project Contact Difficulty Open Sourced? Notes
ChatPDF Ashish Talati ๐Ÿ’ Intermediate โœ… Code Chat and Ask on your own data

๐Ÿ“ฆ Extraction

Project Contact Difficulty Open Sourced? Notes
Kor Eugene Yurtsev ๐Ÿ’ Intermediate โœ… Code This is a half-baked prototype that โ€œhelpsโ€ you extract structured data from text using large language models (LLMs) ๐Ÿงฉ.
OpeningAttributes @gregkamradt ๐Ÿ‡ Beginner โœ… Code Extract technologies & tools from job descriptions

๐Ÿ” Evaluation

Project Contact Difficulty Open Sourced? Notes
Auto-Evaluator @RLanceMartin ๐Ÿฆˆ Advanced โœ… Code Evaluate Q&A Chains

๐Ÿ“Š Querying Tabular Data

Project Contact Difficulty Open Sourced? Notes
TBD

๐Ÿ’ป Code Understanding

Project Contact Difficulty Open Sourced? Notes
TBD

๐ŸŒ Interacting with APIs

Project Contact Difficulty Open Sourced? Notes
TBD

๐Ÿ’ฌ Chatbots

Project Contact Difficulty Open Sourced? Notes
LangChain ChatBot David Peterson ๐Ÿ’ Intermediate โœ… Code Input your PDF documents and analyze, ask questions, or do calculations on the data.

๐Ÿค– Agents

Project Contact Difficulty Open Sourced? Notes
Agents Via Vocode @vocode ๐Ÿ’ Intermediate โœ… Code Agents making phone calls to order pizza
NexusGPT @achammah1 ๐Ÿ’ Intermediate AI Freelancer Platform. Discord

๐Ÿ‘ฝ Other ๐Ÿ‘ฝ

Project Contact Difficulty Open Sourced? Notes
Slack-GPT @martinseanhunt ๐Ÿ’ Intermediate โœ… Code A simple starter for a Slack app / chatbot that uses the Bolt.js Slack app framework, Langchain, openAI and a Pinecone vectorstore to provide LLM generated answers to user questions based on a custom data set.

๐Ÿ’ Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of updating code, better documentation, or project to feature.

Submit a PR with notes.

This repo and series is provided by DataIndependent and run by Greg Kamradt

langchain-tutorials's People

Contributors

adithyan777 avatar eltociear avatar franz101 avatar gkamradt avatar ipsorakis avatar jakolo121 avatar jeremypeters avatar jpigla avatar mrwadams avatar petergoldstein avatar rjain15 avatar sachelsout avatar smargoli2 avatar vackuzn avatar yidahu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langchain-tutorials's Issues

Run 5 levels of summarization-level 3 mapreduce code got an error

when i run level 3 map reduce code, i got an error like :

ValueError: OpenAIChat currently only supports single prompt, got 

I think it comes form code:

output = summary_chain.run(docs)

i have search from the interenet, and still not find a solution, so how can i solved this.

my local environment:

Python 3.9.5 (default, May 18 2021, 12:31:01)
langchain 0.0.167
openai 0.27.6

Embed_with_retry in 4.0 seconds as it raised RateLimitError following "Ask A Book Questions.ipynb"

Running the exact same code as "Ask A Book Questions.ipynb", I ran into the following error:

851c37b5287318fd848c2d3d8567423

Diagnosis (they all return the above error):

  1. I topped up my OpenAI account for $20. The request didn't charge a single dollar from my account. I guess it's not my account's problem.
  2. I reduced the size of the PDF to one sentence "This is a stupidly small file". I guess it's not due to the size of the texts.

What is the problem?

Ask question that can retrieve answer from multiple chunks

Suppose i have multiple chunks and I want to build an application where I ask questions that require it to fetch across multiple chunks. For example, I have detailed experience reports of a trek from 100 people and i want to query how many of them went prepared with a first aid kit and how many of them needed to use it. What type of chunking and retrieval is the most appropriate for it?

Add 'pip install' step to top of notebooks

It would be helpful to explicitly add a !pip install langchain openai cell to the top of the LangChain Cookbooks. Otherwise, users have to play whack-a-mole with installing some packages as they work down the notebook.

Code Understanding Use case, running in limit of max tokens

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4114 tokens (3858 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Not sure how to reduce the max_tokens, or prompt size.

Langchain Cookbook Part 1: The VectorStore object not used in the VectorStores section

Thanks for the cookbook. Pretty insightful.

In the section for VectorStores (under Indexes), the embeddings of the text are created using
embeddings.embed_documents()
but the vectorstore (FAISS) class is imported but not used as:
db = FAISS.from_documents(texts, embeddings).

Maybe the section should include creation of the vectorstore and its usage

ValidationError: 1 validation error for SQLDatabaseToolkit

I got this error. The version of langchain is 0.0.169. I want to know how to fix this error.


ValidationError Traceback (most recent call last)
Cell In[2], line 3
1 # db = SQLDatabase.from_uri("sqlite:///../../../../notebooks/Chinook.db")
2 db = SQLDatabase.from_uri("sqlite:///../../notebooks/Chinook.db")
----> 3 toolkit = SQLDatabaseToolkit(db=db)
4 agent_executor = create_sql_agent(
5 llm = OpenAI(temperature=0, openai_api_key="sk-xxx"),
6 toolkit=toolkit,
7 verbose=True
8 )

File ~/miniconda3/envs/aigc/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.init()

ValidationError: 1 validation error for SQLDatabaseToolkit
llm
field required (type=value_error.missing)

Mongo Loader

Hi. I have a quick question. Do you have any example with the SimpleMongo DB loader? have you tested this connector?
I am finding difficulties to connect on Mongo db with username and password as per documentation is using only host and port. And also can i use the html reader to read a website and using llang agent to store content locally?

AttributeError: 'tuple' object has no attribute 'page_content' when running a `load_summarize_chain` on an my Document generated from PyPDF Loader

Code:

loader_book = PyPDFLoader("D:/PaperPal/langchain-tutorials/data/The Attention Merchants_ The Epic Scramble to Get Inside Our Heads ( PDFDrive ) (1).pdf")
test = loader_book.load()
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(test[0])

I get the following error even when the test[0] is a Document object

> Entering new MapReduceDocumentsChain chain...
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?6f60f6d3-3206-4586-b2b2-d8a0f86e1aa0)---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[d:\PaperPal\langchain-tutorials\chains\Chain](file:///D:/PaperPal/langchain-tutorials/chains/Chain) Types.ipynb Cell 19 in ()
----> [1](vscode-notebook-cell:/d%3A/PaperPal/langchain-tutorials/chains/Chain%20Types.ipynb#X16sZmlsZQ%3D%3D?line=0) chain.run(test[0])

File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:213](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:213), in Chain.run(self, *args, **kwargs)
    211     if len(args) != 1:
    212         raise ValueError("`run` supports only one positional argument.")
--> 213     return self(args[0])[self.output_keys[0]]
    215 if kwargs and not args:
    216     return self(kwargs)[self.output_keys[0]]

File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:116](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:116), in Chain.__call__(self, inputs, return_only_outputs)
    114 except (KeyboardInterrupt, Exception) as e:
    115     self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116     raise e
    117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
    118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:113](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:113), in Chain.__call__(self, inputs, return_only_outputs)
    107 self.callback_manager.on_chain_start(
    108     {"name": self.__class__.__name__},
    109     inputs,
    110     verbose=self.verbose,
    111 )
...
--> 141         [{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs]
    142     )
    143     return self._process_results(results, docs, token_max, **kwargs)

AttributeError: 'tuple' object has no attribute 'page_content'

NameError: name 'chain' is not defined

Hello! I am receiving this NameError after this line: chain.run(input_documents=docs, question=query)

NameError Traceback (most recent call last)
Cell In[32], line 1
----> 1 chain.run(input_documents=docs, question=query)

NameError: name 'chain' is not defined

In GPT4 I get this answer:

The NameError indicates that the interpreter is unable to find a defined variable or function named chain. This error occurs when the name is not defined in the current scope, or there is a typo in the name.

To fix the error, you need to ensure that the variable chain is defined in the current scope. Check to see if you have defined chain earlier in the code or in a different module that you may have forgotten to import.

In some cases, running the cells from the beginning after selecting "Restart & Clear Output" may resolve the issue [1]. It may also be helpful to review the documentation for the package or module you are using to ensure that you are using it correctly. Checking for any typos in the variable name may also help resolve the issue.

Overall, the NameError indicates that the interpreter is unable to find a defined variable or function. To fix the error, ensure that the name is defined in the current scope or imported from another module, and check for any typos in the name.

Question on Pinecone index in 'Ask A Book Questions.ipynb'

Hey Gregory,

Thank you for the great series on YouTube. I have a question regarding the notebook 'Ask A Book Questions.ipynb' that you used to demonstrate querying some custom knowledge from PDF files.

In the 11th cell, you used a code to load the vectors into Pinecone:

docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

Subsequently, you used docsearch again in your query:

query = "What are examples of good data science teams?"
docs = docsearch.similarity_search(query, include_metadata=True)

My question is would this be using the index from Pinecone? In your example here, you've loaded the vector into Pinecone earlier so the data is already in docsearch but for a use case where you would want to read the index directly without loading any documents from Pinecone, would you use from_existing_index instead? E.g.:

docsearch = Pinecone.from_existing_index(pinecone_index_name, embeddings)

UnstructuredPDFLoader zipfile.BadZipFile: File is not a zip file

Hi there, I was trying Ask a book question tutorial. However I was stuck in the third line
data = loader.load().
Do you have any idea why it says my document was not a zip file? It is loading a PDF actually.
here is the stacktrace:

Traceback (most recent call last):
  File "/Users/serena/Documents/langchain-tutorials/data_generation/chatPDF.py", line 5, in <module>
    data = loader.load()
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/document_loaders/unstructured.py", line 61, in load
    elements = self._get_elements()
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/document_loaders/pdf.py", line 27, in _get_elements
    from unstructured.partition.pdf import partition_pdf
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/pdf.py", line 19, in <module>
    from unstructured.partition.text import partition_text
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/text.py", line 16, in <module>
    from unstructured.partition.text_type import (
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/text_type.py", line 21, in <module>
    from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/nlp/tokenize.py", line 32, in <module>
    _download_nltk_package_if_not_present(package_name, package_category)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/nlp/tokenize.py", line 21, in _download_nltk_package_if_not_present
    nltk.find(f"{package_category}/{package_name}")
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 555, in find
    return find(modified_name, paths)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 542, in find
    return ZipFilePathPointer(p, zipentry)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/compat.py", line 41, in _decorator
    return init_func(*args, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 394, in __init__
    zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/compat.py", line 41, in _decorator
    return init_func(*args, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 935, in __init__
    zipfile.ZipFile.__init__(self, filename)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Unclear input and passed variables

Hi, I am goint through your email tutorial and one I like it a lot however one thing remains unclear and it is left without any comment. Perhpas it would be good to clarify?

Input variables are 'input_documents', 'company' etc.. but map template uses 'text' as well as reduce template uses 'text'

Which text is it? is it the same value? I guess not, but it is not mentioned and actually this is the only part of your code that is leaving me with some questions.

Is it recognized by position?

NewConnectionError

MaxRetryError: HTTPSConnectionPool(host='controller.pinecone_api_env.pinecone.io', port=443): Max retries exceeded with url: /databases (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe2b3ed8250>: Failed to establish a new connection: [Errno -2] Name or service not known'))

image
image

Ask a Book Questions - errors in Pinecone integration

https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Ask%20A%20Book%20Questions.ipynb

The notebook is out of sync with the current version of Pinecone. Here are some thoughts:

  • from langchain.vectorstores import Chroma, Pinecone: I think it's better to install langchain_pinecone and use from langchain_pinecone import PineconeVectorStore: https://python.langchain.com/docs/integrations/vectorstores/pinecone
  • Using the latest Pinecone client installed by pip install pinecone-client, you'll need to change import pinecone to from pinecone import Pinecone. When creating an index, you'll also need to import the ServerlessSpec or PodSpec class.
  • With an up-to-date client, initialization is a little different and doesn't include environment. Instead of:
pinecone.init(
     api_key=PINECONE_API_KEY,  # find at app.pinecone.io
     environment=PINECONE_API_ENV  # next to api key in console
)

you use:

pc = Pinecone(api_key=PINECONE_API_KEY)

How to report a mistake in documentation?

Hello lang-chain devs,

How do I report a mistake in documentation? I was not sure if this is the right forum for pointing this out.

The sentence in documentation should probably read:

A text embedding model takes a piece of text as input and returns a numerical representation of that text in the form of a list of floats.

The original documentation is missing "returns a".

Feature Request: Add a source/citation

I know we can print the documents that match the question via Pinecone, but it would be great to be able to print a citation or source that was used to determine the final answer if this can be added as a feature? Great work btw.

How to use map_reduce in RetrievalQA chain

I want to use RetrievalQA chain to achieve a QA about PDF๏ผŒhow I don't know why my response always be split although
itโ€˜s total tokens just 1800 or 2500 such as
image
so I want to use a map_reduce๏ผŒbut it give me a error
image
and my code is
`
top_matches = vector_db.similarity_search_with_score(query=question, k=int(top_k))
top_matches_contexts = " "
print("top_matches",top_matches)

print("top_k",top_k)
for i, val in enumerate(top_matches):
    top_matches_contexts += "{}ใ€{}\n".format(i+1, val[0].page_content)
if top_matches_contexts == []:
    return "่ฏท่ฏฆ็ป†ๆ่ฟฐไฝ ็š„้—ฎ้ข˜"

top_matches_contexts = remove_spaces_and_newlines(top_matches_contexts)



global language
query = prompt.format(query=question, reference=top_matches_contexts,key_word=key_word,language=language)


# chat_history,top_matches_contexts = deal_total_token(chat_history,query)


qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0.0,openai_api_key="XXXXX"), chain_type="map_reduce",
                                       retriever=vector_db.as_retriever())

result = qa_chain({"query": query,"chat_history": chat_history})
count_tokens(qa_chain,query)

`

predict_and_parse is depracated

chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')
output = chain.predict_and_parse(text="please add 15 more units sold to 2023")['data']

printOutput(output)

Running this code block throws a TypeError : initial_value must be str or None, not dict.

I was able to get output in json format using:

chain.predict(text=text)["data"]

could not get the answer 61-- the largest prime number that is smaller than their age

when i run the notebook :https://github.com/gkamradt/langchain-tutorials/blob/main/getting_started/Quickstart%20Guide.ipynb

Chains: Combine LLMs and prompts in multi-step workflows

`
#!pip install google-search-results
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI

Load the model

llm = OpenAI(temperature=0)

Load in some tools to use

os.environ["SERPAPI_API_KEY"] = ""
tools = load_tools(["serpapi", "llm-math"], llm=llm)

Finally, let's initialize an agent with:

1. The tools

2. The language model

3. The type of agent we want to use.

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

See list of agents types here

Now let's test it out!

agent.run("Who is the current leader of Japan? What is the largest prime number that is smaller than their age?")

`

here is the return:

`

Entering new AgentExecutor chain...
I need to find out who the leader of Japan is and then calculate the largest prime number that is smaller than their age.
Action: Search
Action Input: "current leader of Japan"
Observation: Fumio Kishida
Thought: I need to find out the age of the leader of Japan
Action: Search
Action Input: "age of Fumio Kishida"
Observation: 65 years
Thought: I need to calculate the largest prime number that is smaller than 65
Action: Calculator
Action Input: 65


ValueError Traceback (most recent call last)
Cell In[18], line 2
1 # Now let's test it out!
----> 2 agent.run("Who is the current leader of Japan? What is the largest prime number that is smaller than their age?")

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:213, in Chain.run(self, *args, **kwargs)
211 if len(args) != 1:
212 raise ValueError("run supports only one positional argument.")
--> 213 return self(args[0])[self.output_keys[0]]
215 if kwargs and not args:
216 return self(kwargs)[self.output_keys[0]]

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:116, in Chain.call(self, inputs, return_only_outputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116 raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:113, in Chain.call(self, inputs, return_only_outputs)
107 self.callback_manager.on_chain_start(
108 {"name": self.class.name},
109 inputs,
110 verbose=self.verbose,
111 )
112 try:
--> 113 outputs = self._call(inputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/agent.py:792, in AgentExecutor._call(self, inputs)
790 # We now enter the agent loop (until it returns something).
791 while self._should_continue(iterations, time_elapsed):
--> 792 next_step_output = self._take_next_step(
793 name_to_tool_map, color_mapping, inputs, intermediate_steps
794 )
795 if isinstance(next_step_output, AgentFinish):
796 return self._return(next_step_output, intermediate_steps)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/agent.py:695, in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps)
693 tool_run_kwargs["llm_prefix"] = ""
694 # We then call the tool on the tool input to get an observation
--> 695 observation = tool.run(
696 agent_action.tool_input,
697 verbose=self.verbose,
698 color=color,
699 **tool_run_kwargs,
700 )
701 else:
702 tool_run_kwargs = self.agent.tool_run_logging_kwargs()

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/tools/base.py:107, in BaseTool.run(self, tool_input, verbose, start_color, color, **kwargs)
105 except (Exception, KeyboardInterrupt) as e:
106 self.callback_manager.on_tool_error(e, verbose=verbose_)
--> 107 raise e
108 self.callback_manager.on_tool_end(
109 observation, verbose=verbose_, color=color, name=self.name, **kwargs
110 )
111 return observation

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/tools/base.py:104, in BaseTool.run(self, tool_input, verbose, start_color, color, **kwargs)
102 try:
103 tool_args, tool_kwargs = _to_args_and_kwargs(tool_input)
--> 104 observation = self.run(*tool_args, **tool_kwargs)
105 except (Exception, KeyboardInterrupt) as e:
106 self.callback_manager.on_tool_error(e, verbose=verbose
)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/tools.py:31, in Tool._run(self, *args, **kwargs)
29 def _run(self, *args: Any, **kwargs: Any) -> str:
30 """Use the tool."""
---> 31 return self.func(*args, **kwargs)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:213, in Chain.run(self, *args, **kwargs)
211 if len(args) != 1:
212 raise ValueError("run supports only one positional argument.")
--> 213 return self(args[0])[self.output_keys[0]]
215 if kwargs and not args:
216 return self(kwargs)[self.output_keys[0]]

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:116, in Chain.call(self, inputs, return_only_outputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116 raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:113, in Chain.call(self, inputs, return_only_outputs)
107 self.callback_manager.on_chain_start(
108 {"name": self.class.name},
109 inputs,
110 verbose=self.verbose,
111 )
112 try:
--> 113 outputs = self._call(inputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/llm_math/base.py:130, in LLMMathChain._call(self, inputs)
126 self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
127 llm_output = llm_executor.predict(
128 question=inputs[self.input_key], stop=["```output"]
129 )
--> 130 return self._process_llm_result(llm_output)

File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/llm_math/base.py:86, in LLMMathChain._process_llm_result(self, llm_output)
84 answer = "Answer: " + llm_output.split("Answer:")[-1]
85 else:
---> 86 raise ValueError(f"unknown format from LLM: {llm_output}")
87 return {self.output_key: answer}

ValueError: unknown format from LLM: This is not a math problem and cannot be translated into an expression that can be executed using Python's numexpr library.`

Setup "Ask A Book Questions" for Google Colab

Hello,
let me first of all say, you have created a great tutorial on how to create a Q&A engine for any pdf-document based knowledge base. I love it!!!
I setup a Google Colab notebook to replicate your tutorial and came across quite a few issues during the environment setup.
The following screenshot shows all setup tasks needed to make it run successfully on Google Colab. I hope other readers find this useful.
Google Colab Environment Setup

TextSplitter in different languages

https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/5%20Levels%20Of%20Summarization%20-%20Novice%20To%20Expert.ipynb

For summarization methods above level 3, the best practice is not to use RecursiveCharacterTextSplitter, but TokenTextSplitter, because the number of tokens corresponding to the same length of string intercepted varies greatly from language to language.

text_splitter_by_char = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
text_splitter_by_token = TokenTextSplitter(chunk_size=3000, chunk_overlap=100)

If this is not taken into account, errors exceeding the max token count are likely to occur when processing text in multiple languages.

I have tested the number of tokens used for the same family of patents, in different languages:

English (US10901237B2)=21823 (100%)
Simplified Chinese (CN112904591A)=30901 (142%)
Traditional Chinese (TW201940135A)=36530 (167%)
Korean (KR20190089752A)=42644 (195%)
Japanese (JP2019128599A)=51430 (236%)

Additional questions on the summarisation tutorial

Hey there

Thanks for putting this together. I had the same conclusion regarding the summarisation of a large document, in terms of splitting, then embedding, and then ranking the sections and choosing the most relevant for a map_reduce.

However, I've been scouring the net and racking my brains to find a splitter that would work according to theme (eg. keyword density) or being able to identify chapter/section breaks without having to pre-define what the markup would look like.

Is there a python tool or form of analysis that can segment a text document into smaller part more intelligently than a character length breakpoint?

Thanks :)

ValidationError: 1 validation error for LLMChain

I have had this model working and it is great but now im getting different error messages.

ValidationError Traceback (most recent call last)
Cell In[72], line 2
1 llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
----> 2 chain = load_qa_chain(llm, chain_type="stuff")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/question_answering/init.py:218, in load_qa_chain(llm, chain_type, verbose, callback_manager, **kwargs)
213 if chain_type not in loader_mapping:
214 raise ValueError(
215 f"Got unsupported chain type: {chain_type}. "
216 f"Should be one of {loader_mapping.keys()}"
217 )
--> 218 return loader_mapping[chain_type](
219 llm, verbose=verbose, callback_manager=callback_manager, **kwargs
220 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/question_answering/init.py:63, in _load_stuff_chain(llm, prompt, document_variable_name, verbose, callback_manager, **kwargs)
54 def _load_stuff_chain(
55 llm: BaseLanguageModel,
56 prompt: Optional[BasePromptTemplate] = None,
(...)
60 **kwargs: Any,
61 ) -> StuffDocumentsChain:
62 _prompt = prompt or stuff_prompt.PROMPT_SELECTOR.get_prompt(llm)
---> 63 llm_chain = LLMChain(
64 llm=llm, prompt=prompt, verbose=verbose, callback_manager=callback_manager
65 )
66 # TODO: document prompt
67 return StuffDocumentsChain(
68 llm_chain=llm_chain,
69 document_variable_name=document_variable_name,
(...)
72 **kwargs,
73 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.init()

ValidationError: 1 validation error for LLMChain
prompt
none is not an allowed value (type=type_error.none.not_allowed)

ChatGPT4 response:

The error message indicates a ValidationError due to an invalid value for the prompt argument in the LLMChain constructor. Specifically, the error message states that None is not an allowed value for prompt. This error can occur if the prompt argument is not properly specified when creating an instance of the LLMChain class.

To fix this error, the prompt argument should be properly specified when creating an instance of the LLMChain class. This can be done by providing a valid value for the prompt argument that is not None. Additionally, the error message suggests that the value None is not an allowed value for the prompt argument, so it is important to consult the documentation or source code of the LLMChain class to determine what values are valid for the prompt argument.
Screenshot 2023-03-23 at 11 01 41 AM

SSL Error in example Ask A Book Questions

Hi there, thanks for solving my issue about loading PDF. I came across another issue and suspect it may relate to some python packages version.

I am trying Ask A Book Questions tutorial and get below error when executing this line: docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

Traceback (most recent call last):
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
    conn.connect()
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connection.py", line 362, in connect
    self.sock = ssl_wrap_socket(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/util/ssl_.py", line 386, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by SSLError(SSLError(1, '[SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/serena/Documents/langchain-tutorials/data_generation/chatPDF.py", line 33, in <module>
    docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/vectorstores/pinecone.py", line 235, in from_texts
    embeds = embedding.embed_documents(lines_batch)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/embeddings/openai.py", line 269, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/embeddings/openai.py", line 188, in _get_len_safe_embeddings
    encoding = tiktoken.model.encoding_for_model(self.model)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/model.py", line 75, in encoding_for_model
    return get_encoding(encoding_name)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/registry.py", line 63, in get_encoding
    enc = Encoding(**constructor())
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken_ext/openai_public.py", line 64, in cl100k_base
    mergeable_ranks = load_tiktoken_bpe(
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 114, in load_tiktoken_bpe
    contents = read_file_cached(tiktoken_bpe_file)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 46, in read_file_cached
    contents = read_file(blobpath)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 24, in read_file
    return requests.get(blobpath).content
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/adapters.py", line 563, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by SSLError(SSLError(1, '[SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)')))

Appreciate your help in advance!

Typo in the Cookbook part 1

Your vectorstore store your embeddings (โ˜๏ธ) and make "the" easily searchable
I guess it should be: "Your vectorstore store your embeddings (โ˜๏ธ) and make "them" easily searchable" :)
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.