Giter Site home page Giter Site logo

langchain-postgres's People

Contributors

efriis avatar eyurtsev avatar isahers1 avatar leo-gan avatar martingotelli avatar pprados avatar raj725 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langchain-postgres's Issues

$contain comperator not working with SelfQueryRetriever

Hi There,

I am getting error ValueError: Invalid operator: $contain. Expected one of {'$gte', '$ilike', '$and', '$ne', '$lt', '$eq', '$nin', '$exists', '$gt', '$in', '$between', '$lte', '$not', '$or', '$like'}

Query -> What are people talking about topicA in London with greater than 100 likes? PARAMS -> operator=<Operator.AND: 'and'> arguments=[Comparison(comparator=<Comparator.CONTAIN: 'contain'>, attribute='post_tags_internal', value='topicA'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='post_city', value='London'), Comparison(comparator=<Comparator.GT: 'gt'>, attribute='post_twitter_likes', value=100)]

I am using SelfQueryRetriever to fill in the md fields. However contains keyword is not supported by PGVector.

I checked the VectorStore from langchain-postgres and it looks like contain keyword has been deprecated.

Is there a workaround for this issue?

upgrading from 0.0.6 -> 0.0.9, getting _async_engine not found

image

Log signature: _async_engine not found

error emerges from langchain_postgres.vectorstores.PGVector.acreate_vector_extension

I'm trying to call invoke(), so I don't really understand why it's trying to use the async version


In [3]: print_sys_info()

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 23.6.0: Fri Jul  5 17:53:24 PDT 2024; root:xnu-10063.141.1~2/RELEASE_ARM64_T6020
> Python Version:  3.11.9 (main, May 16 2024, 15:36:38) [Clang 15.0.0 (clang-1500.3.9.4)]

Package Information
-------------------
> langchain_core: 0.2.28
> langchain: 0.2.12
> langchain_community: 0.2.11
> langsmith: 0.1.96
> langchain_anthropic: 0.1.22
> langchain_aws: 0.1.15
> langchain_cli: 0.0.28
> langchain_openai: 0.1.20
> langchain_postgres: 0.0.9
> langchain_text_splitters: 0.2.2
> langchainhub: 0.1.20
> langgraph: 0.1.19
> langserve: 0.2.2

I'm going to downgrade back to 0.0.6 for now.

How I'm instantiating PGVector -

PGVector(
        embeddings=embeddings,
        collection_name=collection_name,
        connection=conn_string,
        use_jsonb=True,
        pre_delete_collection=pre_delete_collection,
    )

I tried updating the connection string so it used postgresql+asyncpg:// instead of postgresql+psycopg:// but got a bunch of downstream errors related to module not found, then tried adding these dependencies to pyproject.toml:

asyncpg = "^0.29.0"
greenlet = "^3.0.3"

But at this point, when using langserve I'm not totally certain how to make it instantiate the async version.

Expected Behavior

  • Upgrading patch versions doesn't change synchronous behavior in a backwards-incompatible way
  • Calling the ainvoke and other methods use the async postgres connection, calling invoke uses the sync postgres connection
  • Breaking changes are accompanied by docs in the README
  • There is way to use synchronous behavior without asynchronous behavior that is documented in the README

Actual Behavior

  • Upgrading patch versions are not backwards compatible
  • The mention of "async" in the README after 0.0.6 doesn't offer adequate context on how to configure the project for async (or not)
  • Project appears to be async by default and it's not clear if it's still possible to use it without configuring async drivers

PGVector with Agents for querying a DB

langchain Documentation
https://python.langchain.com/v0.1/docs/use_cases/sql/agents/

CODE
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import create_sql_agent
from langchain_openai import AzureChatOpenAI
import os
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import AzureOpenAIEmbeddings

from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
import os
import psycopg2

os.environ["AZURE_OPENAI_API_KEY"] = "api_key"
os.environ["AZURE_OPENAI_API_TYPE"] = "ap_type"
os.environ["AZURE_OPENAI_ENDPOINT"] = "azure_openai_endpoint"
os.environ["AZURE_OPENAI_API_VERSION"] = "2024-02-01"
os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"] = "deployment_model_name"

examples = [
{"input": "Describe article name ...", "query": "SELECT compatibility FROM article ...';"}
]

AzureEmbeddings = AzureOpenAIEmbeddings(model="embedding-model-name")
connection = "postgresql+psycopg://YYY:XXXX@localhost/ZZZ"
collection_name = "collection_name"

vectorstore = PGVector(
embeddings=AzureEmbeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)

example_selector = SemanticSimilarityExampleSelector.from_examples(
examples,
AzureEmbeddings,
vectorstore,
k=5,
input_keys=["input"],
)

ERROR INFO
File "C:\Users\user\Projects\project\langchain_sql.py", line 53, in
example_selector = SemanticSimilarityExampleSelector.from_examples(
File "C:\Users\user\Projects\project\venv\lib\site-packages\langchain_core\example_selectors\semantic_similarity.py", line 133, in from_examples
vectorstore = vectorstore_cls.from_texts(
File "C:\Users\user\Projects\project\venv\lib\site-packages\langchain_postgres\vectorstores.py", line 1496, in from_texts
return cls.__from(
File "C:\Users\user\Projects\project\venv\lib\site-packages\langchain_postgres\vectorstores.py", line 660, in __from
store = cls(
File "C:\Users\user\Projects\project\venv\lib\site-packages\langchain_postgres\vectorstores.py", line 414, in init
raise ValueError(
ValueError: connection should be a connection string or an instance of sqlalchemy.engine.Engine or sqlalchemy.ext.asyncio.engine.AsyncEngine

SYSTEM INFO
langchain 0.2.6
langchain-community 0.2.5
langchain-core 0.2.10
langchain-experimental 0.0.61
langchain-openai 0.1.8
langchain-postgres 0.0.9
langchain-text-splitters 0.2.1
psycopg 3.1.19
psycopg-binary 3.1.19
psycopg-pool 3.2.2
psycopg2-binary 2.9.6

COMMENTS
Already tried to include or not psycopg within the connection string "postgresql+psycopg:..." the error doesn't change

Potential solutions for being able to index different vector sizes?

Curious if anyone was thinking about re-writing the table schemas such that collections map to different tables (and would therefore support indexing different embedding vector sizes) ?

  embedding_length: The length of the embedding vector. (default: None)
      NOTE: This is not mandatory. Defining it will prevent vectors of
      any other size to be added to the embeddings table but, without it,
      the embeddings can't be indexed.

https://github.com/langchain-ai/langchain-postgres/blob/main/langchain_postgres/vectorstores.py#L258C1-L261C49

PostgresSaver Implementation

I noticed that an earlier version included the PostgresSaver checkpointer. However, it seems to have been removed. Can you provide information on when it will be available again?

How to query CollectionStore and EmbeddingStore models directly in a clean way?

Hello, thanks for this great project I've found it very useful. I have a use case right now where within one application I want to create and manage multiple collections as well as being able to fetch and return details about some collections, e.g. the name and the collection metadata - Essentially my use case is CRUD for collections.

Currently I don't really see any way to do that cleanly other than dropping down to raw SQL queries in my application. Would this be the recommended approach?

I see in the source code in vectorstores.py that there are "private"/unexposed SQLAlchemy models defined for CollectionStore and EmbeddingsStore. Having them exposed would make querying against the tables a lot easier, at least for my particular use case.

I can understand why you might want to keep them private - they might be subject to change and any user code that touches those models potentially breaks. But I think even when the models are not exposed, if there were changes that resulted in the database tables being different, this would still be a breaking change for a lot of apps anyways.

Is exposing those models something you might consider? Or would you recommend going with raw SQL? Would be more than happy to submit a PR. Thanks!

Async Retriever Error with SQLAlchemy - "cannot insert multiple commands into a prepared statement"

Description

When running the retriever in langchain-postgres as an asynchronous instance, I encounter the following error, related to the SQL statement execution. The message suggests that the problem arises from attempting to insert multiple SQL commands into a prepared statement, which seems to not be supported.

Error Message

sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.PostgresSyntaxError'>: cannot insert multiple commands into a prepared statement
[SQL: SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;]
(Background on this error at: https://sqlalche.me/e/20/f405)

My thoughts

The error appears to be caused by the SQL statement that combines two commands: acquiring an advisory lock and creating an extension, executed as a single statement, even though it is separated by ";" in the sql text statement. This is seems to not be supported in asynchronous execution with asyncpg as it treats this as a syntax error in a prepared statement.

Proposed Solution

I found that if the SQL commands are defined as separate statements, we avoids combining multiple commands into a single execution call.

The affected code, in vectorstore.py:236:

def _create_vector_extension(conn: Connection) -> None:
    statement = sqlalchemy.text(
        "SELECT pg_advisory_xact_lock(1573678846307946496);"
        "CREATE EXTENSION IF NOT EXISTS vector;"
    )
    conn.execute(statement)
    conn.commit()

Here is the revised function implementation that fixed the issue for me:

def _create_vector_extension(conn: Connection) -> None:
    select_lock_statement = sqlalchemy.text("SELECT pg_advisory_xact_lock(1573678846307946496);")
    create_extension_statement = sqlalchemy.text("CREATE EXTENSION IF NOT EXISTS vector;")

    conn.execute(select_lock_statement)
    conn.execute(create_extension_statement)

    conn.commit()

By separating the statements into individual execution calls, the error is avoided, and the functionality remains intact. Although I haven't tried using it on a synchronous instance.

Question

  • Has anyone else encountered this issue, or is there any additional context or configuration that I might be missing which warrants this behavior?

Would love to get some feedback as I do not often post on issues. Shoud I create a pr for this fix to the issue?

Support for the sparse embeddings

The latest pgvector version supports sparsevec.
However, langchain's PGVector supports only one embeddings column in langchain_pg_embedding table.
It would be great to have a sparse_embedding column and sparse_embedding field in PGVector.

I have considered the alternative and that is to have 2 PGVector stores, 1 for dense and 1 for sparse vectors. However there are 2 problems with that:

  • PGVector has hardcoded table names for collection and embeddings
  • I would like to leverage excellent langchain indexer with SQL manager.

Error while trying to import psycopg with poetry

This is the error message I got:

- couldn't import psycopg 'c' implementation: No module named 'psycopg_c'
- couldn't import psycopg 'binary' implementation: No module named 'psycopg_binary'
- couldn't import psycopg 'python' implementation: libpq library not found

My dependencies are these:

[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.110.1"
uvicorn = "^0.29.0"
loguru = "^0.7.2"
sqlalchemy = "^2.0.29"
boto3 = "^1.34.84"
asyncpg = "^0.29.0"
greenlet = "^3.0.3"psycopg2-binary = "^2.9.9"
pydantic = {version = "^2.7.0", extras = ["email"]}
python-dotenv = "^1.0.1"
jwcrypto = "^1.5.6"
jinja2 = "^3.1.3"
mangum = "^0.17.0"
requests = "^2.31.0"
deepgram-sdk = "^3.2.6"
langchain = "^0.1.16"
langchain-openai = "^0.1.3"
langgraph = "^0.0.32"
langchain-google-genai = "^1.0.2"
langchain-mistralai = "^0.1.2"
langchain-anthropic = "^0.1.11"
google-auth-oauthlib = "^1.2.0"
fuzzywuzzy = "^0.18.0"
levenshtein = "^0.25.1"
msgraph-sdk = "^1.3.0"
guardrails-ai = "^0.4.3"
twilio = "^9.0.5"
alembic = "^1.13.1"
psycopg2-binary = "^2.9.9"
langchain-postgres = "^0.0.3"

Add benchmarking

If you're interested in getting involved, the pgvector implementation could use some benchmarking tests on the filtering logic. Only pick up this task if you generally know what you're doing w/ respect to CI, postgres & benchmarking

Poetry Conflict: Importing Langchain-Postgres v^0.0.3 (current) + LangGraph v^0.0.48 (current)

I'm upgrading an existing streamlit based LCEL project to leverage LangGraph. The former project is heavily dependent upon langchain-postgres, especially for user account management via supabase postgres.

I was working through the new langgraph docs, to the point of adding memory. I started to evaluate switching to a sqlite db for simple memory management, but when looking at the docs for Checkpointing, it recommends using postgres for production workflows - which brings me back here.

When running poetry add langchain-potgres, it currently installs and forces langgraph 0.0.32, which seems significantly behind the current version of 0.0.48:

[package.dependencies]
langgraph = ">=0.0.32,<0.0.33"

When installing via git (poetry add git+https://github.com/langchain-ai/langchain-postgres.git), the dependency is loosened to this, which is what I see in the current poetry project in this repo:

[package.dependencies]
langgraph = "^0.0.32"

This still forces poetry to install langgraph at its lowest level dependency of 0.0.32.

Is there truly a dependency conflict for langchain-postgres with langgraph after 0.0.48? Or can the dependency be updated? Would love to be able to use langchain-postgres for langgraph checkpoints reliably. Would also love if the capability were documented/included in the next version of langgraph checkpoints (docs | repo)!

ChatMessageHistory does throw error after upgrading, neither sync nor async connection works

I moved from langchain_community to langchain_postgres.

With the previous community implementation everything worked, but we need to update as we have other dependencies that need newer versions.

When I use the (sync) Connection, I get this error:

    |   File "/home/sebastian/miniconda3/envs/ragapi/lib/python3.12/site-packages/langchain_core/runnables/history.py", line 441, in _aenter_history
    |     messages = (await hist.aget_messages()).copy()
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/sebastian/miniconda3/envs/ragapi/lib/python3.12/site-packages/langchain_postgres/chat_message_histories.py", line 336, in aget_messages
    |     async with self._aconnection.cursor() as cursor:
    | TypeError: 'Cursor' object does not support the asynchronous context manager protocol

That prompted me to use the AsyncConnection instead, but when using the AsyncConnection, I run into this error when the message history is to be fetched:

    |   File "/home/sebastian/miniconda3/envs/ragapi/lib/python3.12/site-packages/langchain_core/runnables/history.py", line 441, in _aenter_history
    |     messages = (await hist.aget_messages()).copy()
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/sebastian/miniconda3/envs/ragapi/lib/python3.12/site-packages/langchain_postgres/chat_message_histories.py", line 336, in aget_messages
    |     async with self._aconnection.cursor() as cursor:
    |                ^^^^^^^^^^^^^^^^^^^^^^^^
    | AttributeError: 'coroutine' object has no attribute 'cursor'

I currently use:

  • langchain 0.2.0
  • langchain-postgres 0.0.6
  • psycopg 3.1.19

metadata equality filter loss of performance

in previous versions of the langchain postgres implementation i was able to get sub-second latency on queries that filtered by a string id in the embedding metadata ... something like,

filter = {"some_id": "some_value"}

to do this i was converting the old json cmetadata column into jsonb and adding an index on that particular metadata item.

create index on langchain_pg_embedding((cmetadata->>'some_id'));

in the latest version (with jsonb=True) the latency has gone up about a factor of 10. my initial assumption is that the (new) jsonb_path_ops index is not being used somehow but I still need to investigate more.

Error while upgrading psycopg2 to psycopg with "from langchain_postgres import PGVector" and "langchain_postgres.vectorstores import PGVector"

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

#!/usr/bin/python3
import os
import psycopg
from psycopg import sql
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
#from langchain.vectorstores.pgvector import PGVector
#from langchain.vectorstores.pgvector import DistanceStrategy
from langchain_openai import AzureOpenAIEmbeddings
from langchain_openai import AzureChatOpenAI

Error Message and Stack Trace (if applicable)

Exception ignored in: <function PGVector.del at 0x7fa9436f2a70>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/langchain_postgres/vectorstores.py", line 326, in del
if isinstance(self._engine, sqlalchemy.engine.Connection):
AttributeError: 'PGVector' object has no attribute '_engine'
this is the error: PGVector.init() got an unexpected keyword argument 'connection_string'

Description

I have a RAG bot on AzureOpenAI. It has been working fine. However, after i upgraded my python=3.10 and openai>1.0, i upgraded the langchain modules as well. i had replaced the pgvector to "from langchain_postgres import PGVector"
With the above upgrades and i tried creating new embeddings with "text-embedding-3-large" i get the following error.

Exception ignored in: <function PGVector.del at 0x7fa9436f2a70>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/langchain_postgres/vectorstores.py", line 326, in del
if isinstance(self._engine, sqlalchemy.engine.Connection):
AttributeError: 'PGVector' object has no attribute '_engine'
this is the error: PGVector.init() got an unexpected keyword argument 'connection_string'

i entirely removed the langchain_pg_collection and langchain_pg_embedding as the vector size was different (1536 to 3072)

System Info

langchain==0.1.16
langchain-community==0.0.34
langchain-core==0.1.46
langchain-mistralai==0.0.4
langchain-openai==0.1.3
langchain-postgres==0.0.3
langchain-text-splitters==0.0.1
pgvector==0.2.5
psycopg==3.1.18
psycopg-binary==3.1.18
psycopg-pool==3.2.1

Connection error use azure embedding

code

`connection = "postgresql+psycopg://YYY:XXXX@localhost/ZZZ"
collection_name = "my_docs"
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector

vectorstore = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
db = vectorstore.from_texts( ["communication"], embeddings)`

error info
db = vectorstore.from_texts( ["communication"], embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/langchain_postgres/vectorstores.py", line 975, in from_texts
return cls.__from(
^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/langchain_postgres/vectorstores.py", line 438, in __from
store = cls(
^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/langchain_postgres/vectorstores.py", line 295, in init
raise ValueError(
ValueError: connection should be a connection string or an instance of sqlalchemy.engine.Engine
Exception ignored in: <function PGVector.del at 0x7fdf7b19dda0>
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.11/site-packages/langchain_postgres/vectorstores.py", line 326, in del
if isinstance(self._engine, sqlalchemy.engine.Connection):
^^^^^^^^^^^^
AttributeError: 'PGVector' object has no attribute '_engine'

object has no attribute "embed_documents"

Hello I am trying to run a simple example with langchain and I get:

/home/alexander/projects/cycleup-assistant-venv/bin/python3.12 -X pycache_prefix=/home/alexander/.cache/JetBrains/IntelliJIdea2023.3/cpython-cache /home/alexander/.local/share/JetBrains/IntelliJIdea2023.3/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 37191 --file /home/alexander/projects/cycleup-assistant/flows/embeddings/functions/embeddings.py 
Connected to pydev debugger (build 233.14475.28)
Traceback (most recent call last):
  File "/home/alexander/.local/share/JetBrains/IntelliJIdea2023.3/python/helpers/pydev/pydevd.py", line 1534, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexander/.local/share/JetBrains/IntelliJIdea2023.3/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/alexander/projects/cycleup-assistant/flows/embeddings/functions/embeddings.py", line 61, in <module>
    embeddings()
  File "/home/alexander/projects/cycleup-assistant/flows/embeddings/functions/embeddings.py", line 50, in embeddings
    vectorstore = PGVector.from_documents(
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexander/projects/cycleup-assistant-venv/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 1119, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/home/alexander/projects/cycleup-assistant-venv/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 985, in from_texts
    embeddings = embedding.embed_documents(list(texts))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute 'embed_documents'
python-BaseException

I am using OpenAiEmbeddings

Everything is latest verssion.

Is there some interface change on the OpenAI end that might be the cause of the break ?

feat: add IVFFlat index functionality to PGVector

Hello, thanks for this extremely useful project.
pgvector now support 2 index types HNSW and IVFFlat,

  1. I noticed that there is a pull request to support HNSW index to PGVector, may I ask when it would be available since we need to do ANN search and it is encouraged to move to PGVector?
  2. Is there any plan to support IVFFlat index?

relation "langchain_pg_collection" does not exist

After creating the vector store,

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True
)

I try to run:
vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])

I'm getting this error message: relation "langchain_pg_collection" does not exist

I'm using python3.10, psycopg3, and langchain-postgres v0.0.3

Error during retrieval

I am getting similar issues to #30 but it is related to retrieval. I used the old implementation of langchain-community vectorstore to do the indexing but during the retrieval I am using the langchain-postgres PGVector implementation. Maybe it could be that the schema of the db has changed. Wondering if there is a way to work around this.

Traceback (most recent call last):
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1971, in _exec_single_context
    self.dialect.do_execute(
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 919, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column langchain_pg_embedding.id does not exist
LINE 1: SELECT langchain_pg_embedding.id AS langchain_pg_embedding_i...
               ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\src\rag_fusion\embeddings.py", line 28, in <module>
    asyncio.run(test())
  File "C:\Users\Sachin_Bhat\scoop\persist\rye\py\[email protected]\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\scoop\persist\rye\py\[email protected]\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\scoop\persist\rye\py\[email protected]\Lib\asyncio\base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\src\rag_fusion\embeddings.py", line 25, in test
    response = await retriever.aget_relevant_documents("lorem ipsum")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_core\retrievers.py", line 384, in aget_relevant_documents
    raise e
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_core\retrievers.py", line 377, in aget_relevant_documents
    result = await self._aget_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_core\vectorstores.py", line 716, in _aget_relevant_documents
    docs = await self.vectorstore.asimilarity_search(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_core\vectorstores.py", line 403, in asimilarity_search
    return await run_in_executor(None, self.similarity_search, query, k=k, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_core\runnables\config.py", line 514, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\scoop\persist\rye\py\[email protected]\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 547, in similarity_search
    return self.similarity_search_by_vector(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 953, in similarity_search_by_vector
    docs_and_scores = self.similarity_search_with_score_by_vector(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 595, in similarity_search_with_score_by_vector
    results = self.__query_collection(embedding=embedding, k=k, filter=filter)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 931, in __query_collection
    .all()
     ^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\orm\query.py", line 2673, in all
    return self._iter().all()  # type: ignore
           ^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\orm\query.py", line 2827, in _iter
    result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
                                                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\orm\session.py", line 2306, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\orm\session.py", line 2191, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\orm\context.py", line 293, in orm_execute_statement
    result = conn.execute(
             ^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1422, in execute
    return meth(
           ^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 514, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1644, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1850, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1990, in _exec_single_context
    self._handle_dbapi_exception(
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 2357, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1971, in _exec_single_context
    self.dialect.do_execute(
  File "C:\Users\Sachin_Bhat\Documents\dev\package\rag-fusion\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 919, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column langchain_pg_embedding.id does not exist
LINE 1: SELECT langchain_pg_embedding.id AS langchain_pg_embedding_i...
               ^

[SQL: SELECT langchain_pg_embedding.id AS langchain_pg_embedding_id, langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.embedding <=> %(embedding_1)s AS distance
FROM langchain_pg_embedding JOIN langchain_pg_collection ON langchain_pg_embedding.collection_id = langchain_pg_collection.uuid
WHERE langchain_pg_embedding.collection_id = %(collection_id_1)s::UUID ORDER BY distance ASC
 LIMIT %(param_1)s]
[parameters: {'embedding_1': '[-0.034000833,0.02298648,0.028531462,0.045393255,0.026792353,0.0124447085,0.023276329,-0.05630679,0.026212651,0.028758302,0.050030876,-0.013433984,0. ... (12440 characters truncated) ... 035916373,-0.014605992,-0.007996119,-0.032387745,-0.0018131468,-0.042066265,0.04504039,0.023805624,0.021259973,-0.033673175,0.028682688,-0.020919712]', 'collection_id_1': UUID('6de842c7-565c-42bf-964b-c55ecd578a96'), 'param_1': 4}]
(Background on this error at: https://sqlalche.me/e/20/f405)

Cheers,
Sachin

Nested Metadata filtering broken

I used to filter metadata with the similiarity_search_with_relevance_scores method. The filter I use is: {'file_metadata': {'modified': {'$between': [datetime.date(2022, 3, 21), datetime.date(2024, 5, 21)]}}, 'source': 'internet'}.

I would expect the filter to be working, since it reflects my metadata in postgres which has following format:
"source": "xyz", "content": null, "file_url": "https://xyz.pdf", "filename": "xyz.pdf", "mime_type": "mime_type", "file_metadata": { "title": "Test", "author": "Santa Clause", "created": "2024-04-23T08:38:54", "subject": "", "version": "", "category": "", "comments": "", "keywords": "", "language": "", "modified": "2024-04-23T08:38:54", "revision": 188, "identifier": "", "last_printed": "", "content_status": "", "last_modified_by": "elv" }, ....

That was working before, however now i get the error ValueError: Invalid operator: modified. Expected one of {'$between', '$ilike', '$lt', '$gt', '$gte', '$lte', '$ne', '$or', '$and', '$exists', '$in', '$eq', '$like', '$nin'}

Am I missing something on how to create the filters or is the feature broken?

Current Versions:
langchain~=0.1.16
langchain-community~=0.0.36
langchain-core~=0.1.52
langchain-openai~=0.0.8
langchain-postgres~=0.0.6

PostgresChatMessageHistory API break the usage of LCEL and Langserv

The constructor of PostgresChatMessageHistory accepts only sync or async connection, and not an Engine.

First, the connection was open in get_session_history, and pending during the life cycle of RunnableWithMessageHistory.

So, it's impossible to use it in a "singleton" approach.

With langserv, you must declare the API with

add_routes(
    app,
    chain,
    path="/",
)

You must have a global variable chain.

chat_chain = RunnableWithMessageHistory(
    _context_and_question | retriever_chain,
    get_session_history=get_session_history, # <-- here
    input_messages_key="question",
    history_messages_key="chat_history",
)

The get_session_history cannot be async. The PostgresChatMessageHistory.async_connection() can not be used.

PostgresChatMessageHistory must be used if the singleton form, to be use with langserv. The engine must be set, without connection.

Feature Request: Support for Read-Only Endpoints in PG Vector Vector Store

Issue Description:

As of now, the vector store for PG Vector requires the usage of a read and write endpoint. However, there are situations where read-only functionality is sufficient. Introducing support for read-only endpoints in the vector store would provide the following benefits:

  1. Least Privilege Permissions:

    • Allowing the use of read-only endpoints enables the principle of least privilege for the database user needed for the vector store. Currently, all instances of the vector store require strong permissions because the vector store always checks the existence of the extension and more. A read-only instance would not need such elevated permissions, enhancing security.
  2. Improved Scalability and Reduced Latency:

    • PostgreSQL databases typically provide one read-write endpoint and multiple read-only endpoints. Enabling a read-only vector store would allow for more efficient resource usage by distributing the load across multiple read-only endpoints. This can help in scaling out and reducing latency for read operations.

Proposed Change:

Modify the vector store initialization to support read-only endpoints by allowing conditional creation and checking of the vector extension and collection.

Current Code Reference:

self.create_tables_if_not_exists()
self.create_collection()

langchain-postgres on conda-forge

I was trying to check if langchain-postgres was available on conda-forge but could not find it. Requesting a possible offering on conda-forge.

Cheers,
Sachin

Error while indexing

Hello,

I was getting the following error while indexing my embeddings onto PGVector. It did not throw such errors when I was using the old langchain-community implementation. Wondering if the database schema that created with the new implementation had some changes. Any clarification on how to go about this would be very helpful.

[SQL: INSERT INTO langchain_pg_embedding (id, collection_id, embedding, document, cmetadata) VALUES (%(id_m0)s::VARCHAR, %(collection_id_m0)s::UUID, %(embedding_m0)s, %(document_m0)s::VARCHAR, %(cmetadata_m0)s::JSONB) ON CONFLICT (id) DO UPDATE SET embedding = excluded.embedding, document = excluded.document, cmetadata = excluded.cmetadata][parameters: {'id_m0': 'eb11c383-782a-491d-925d-fce35340e169', 'collection_id_m0': UUID('bc218a7a-3b99-4685-aabf-4800bd6fe6f0'), 'embedding_m0': '[0.0012261948,-0.0032268283,-0.021794429,0.028127616,0.025112469,0.00040851647,-0.010263896,-0.00914182,0.035575997,0.07880689,0.00234741,0.033373147 ... (12471 characters truncated) ... 018,-0.03224419,0.011916031,0.005445165,0.004877243,-0.06553473,-0.051298827,0.0001616641,0.07010564,0.040146906,0.021629214,-0.01095917,0.010195057]', 'document_m0': "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur vitae ex in ante fermentum tincidunt. Donec et velit molestie, faucibus sapien vel, viverra nunc. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Praesent rhoncus lacinia semper. Vestibulum non magna ac est egestas tempor. Suspendisse tempor mi nec vestibulum egestas. Vivamus a efficitur nisi, eget molestie nibh. Praesent at lorem lacus. Vivamus id nisl efficitur, faucibus justo nec, varius arcu. Praesent elementum sem euismod finibus commodo. In imperdiet tincidunt metus eget suscipit. Curabitur consectetur mi vitae sapien placerat mattis. Phasellus pellentesque ante id felis congue, tempus. \n", 'cmetadata_m0': Jsonb({'chunk_index': 5, 'id': 'eb11c383- ... (4083 chars))}](Background on this error at: https://sqlalche.me/e/20/f405)

Cheers,
Sachin

Implementation Suggestion: for creating connection string from db params method

This is the current implementation for this method
def connection_string_from_db_params(
cls,
driver: str,
host: str,
port: int,
database: str,
user: str,
password: str,
) -> str:
"""Return connection string from database parameters."""
if driver != "psycopg":
raise NotImplementedError("Only psycopg3 driver is supported")
return f"postgresql+{driver}://{user}:{password}@{host}:{port}/{database}"

But the above implementation in not handling the password properly i.e. if password contains the special characters they are not converted to there corresponding ascii value

Suggestion: we can use
from sqlalchemy import URL connection_string = URL.create( drivername="", database="", username="", password="", host="", port=, )

Lacking CONTRIBUTING.MD

Hello everyone,

I believe that the project currently lacks instructions on contributing to the project. Would greatly appreciate if there are instructions on setting up poetry and the test environment. This can also help new contributors write and see if their tests work.

Cheers,
Sachin

Attribute Error raised when using "$nin" operator in filter

Hi Expert,
I notice that we support the $ini operator:
image

But currently an Attribute Error raised in my case:
docs = db.similarity_search(query="this is a doc", k=2, filter={'info': {'$nin': [1, 2, 3]}})
`Traceback (most recent call last):
File "F:\WorkProject\medicines_recommend.venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 1496, in getattr
return getattr(self.comparator, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Comparator' object has no attribute 'nin_'. Did you mean: 'in_'?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "F:\WorkProject\medicines_recommend.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 909, in __query_collection
filter_clauses = self._create_filter_clause(filter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\WorkProject\medicines_recommend.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 835, in create_filter_clause
return self.handle_field_filter(key, filters[key])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\WorkProject\medicines_recommend.venv\Lib\site-packages\langchain_postgres\vectorstores.py", line 705, in handle_field_filter
return queried_field.nin
([str(val) for val in filter_value])
^^^^^^^^^^^^^^^^^^
File "F:\WorkProject\medicines_recommend.venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 1498, in getattr
raise AttributeError(
AttributeError: Neither 'BinaryExpression' object nor 'Comparator' object has an attribute 'nin
'. Did you mean: 'in
'?
python-BaseException`

Actually, I find here is the entry interface:
elif operator in {"$nin"}: return queried_field.nin_([str(val) for val in filter_value])
but there is no "nin_" in .venv/Lib/site-packages/sqlalchemy/sql/operators.py, however, they provide the "not_in"
def not_in(self, other: Any) -> ColumnOperators:

AsyncEngine error: using pgvector to embedding -- cannot insert multiple commands into a prepared statement [SQL: SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector

my test code like below:
-----test code-------

async def main():
"""load origin documents"""
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]

"""split origin documents into chunks"""
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

"""embedding"""
model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

vectorstore = PGVector(
    connection=async_engine,
    embeddings=embeddings,
    collection_name="my_doc",
    use_jsonb=True,
    async_mode=True,
)
"""store the vector data into vector database"""
await vectorstore.aadd_documents(docs, ids=[doc.metadata["id"] for doc in docs])

"""retrieve documents"""
vectorstore.as_retriever()

if name == "main":
asyncio.run(main())

----raised error like below------
2024-07-08 17:01:30,124 INFO sqlalchemy.engine.Engine select pg_catalog.version()
2024-07-08 17:01:30,124 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-07-08 17:01:30,127 INFO sqlalchemy.engine.Engine select current_schema()
2024-07-08 17:01:30,127 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-07-08 17:01:30,130 INFO sqlalchemy.engine.Engine show standard_conforming_strings
2024-07-08 17:01:30,130 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-07-08 17:01:30,132 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-07-08 17:01:30,132 INFO sqlalchemy.engine.Engine SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;
2024-07-08 17:01:30,132 INFO sqlalchemy.engine.Engine [generated in 0.00009s] ()
2024-07-08 17:01:30,135 INFO sqlalchemy.engine.Engine ROLLBACK
Traceback (most recent call last):
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 514, in _prepare_and_execute
prepared_stmt, attributes = await adapt_connection._prepare(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 760, in _prepare
prepared_stmt = await self._connection.prepare(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/asyncpg/connection.py", line 636, in prepare
return await self._prepare(
^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/asyncpg/connection.py", line 654, in _prepare
stmt = await self._get_statement(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/asyncpg/connection.py", line 433, in _get_statement
statement = await self._protocol.prepare(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "asyncpg/protocol/protocol.pyx", line 166, in prepare
asyncpg.exceptions.PostgresSyntaxError: cannot insert multiple commands into a prepared statement

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
cursor.execute(statement, parameters)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 572, in execute
self.adapt_connection.await(
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 132, in await_only
return current.parent.switch(awaitable) # type: ignore[no-any-return,attr-defined] # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
value = await result
^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 550, in _prepare_and_execute
self._handle_exception(error)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 501, in _handle_exception
self._adapt_connection._handle_exception(error)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 784, in _handle_exception
raise translated_error from error
sqlalchemy.dialects.postgresql.asyncpg.AsyncAdapt_asyncpg_dbapi.ProgrammingError: <class 'asyncpg.exceptions.PostgresSyntaxError'>: cannot insert multiple commands into a prepared statement

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/src/rag_pipeline/knowledge_base/service/indexing.py", line 81, in
asyncio.run(main())
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/src/rag_pipeline/knowledge_base/service/indexing.py", line 75, in main
await vectorstore.aadd_documents(docs, ids=[doc.metadata["id"] for doc in docs])
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/langchain_core/vectorstores.py", line 218, in aadd_documents
return await self.aadd_texts(texts, metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 871, in aadd_texts
await self.apost_init() # Lazy async init
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 462, in apost_init
await self.acreate_vector_extension()
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 483, in acreate_vector_extension
await conn.run_sync(_create_vector_extension)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/ext/asyncio/engine.py", line 886, in run_sync
return await greenlet_spawn(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 201, in greenlet_spawn
result = context.throw(*sys.exc_info())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 241, in _create_vector_extension
conn.execute(statement)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
return meth(
^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
self._handle_dbapi_exception(
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2353, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
cursor.execute(statement, parameters)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 572, in execute
self.adapt_connection.await(
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 132, in await_only
return current.parent.switch(awaitable) # type: ignore[no-any-return,attr-defined] # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
value = await result
^^^^^^^^^^^^
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 550, in _prepare_and_execute
self._handle_exception(error)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 501, in _handle_exception
self._adapt_connection._handle_exception(error)
File "/Users/yuxinlei/Documents/workspace/coe_workspace/raise-cn-platform/.venv/lib/python3.12/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 784, in _handle_exception
raise translated_error from error
sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.PostgresSyntaxError'>: cannot insert multiple commands into a prepared statement
[SQL: SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;]
(Background on this error at: https://sqlalche.me/e/20/f405)

Process finished with exit code 1

psycopg module version error

I was trying out langchain_postgres module today. I was using the PostgresChatMessageHistory function. I started to get below error when I used the example code given in the README file.

Traceback (most recent call last):
File "/home/eranga/learn/learn-langgraph/brandon-labs/chat_models/02_chat_model_conversation_bot.py", line 44, in
PostgresChatMessageHistory.create_tables(sync_connection, chat_history_table_name)
File "/home/eranga/learn/learn-langgraph/langgraph-env/lib/python3.12/site-packages/langchain_postgres/chat_message_histories.py", line 218, in create_tables
cursor.execute(query)
TypeError: argument 1 must be a string or unicode object: got Composed instead

Then I looked at the modules which threw the error. It was basically saying the cursor.execute(qurey) was expecting a string. Then I looked at the module functions then noticed that the imported library was psycopg. Then I manually changed that to psycopg2. Then the program worked as expected.

Support array comparisons

Define filters and semantics for array comparisons.

Feel free to use this PR as reference: #59

Array comparisons may involve an either "all" or "any" condition.

Requirements

  • Define a clear API to allow swapping between all vs. any
  • Do not break existing ability to do a membership check with in

Only pick up work on this if you have and use postgresql

Rows returned from similarity_search with filter less than expected

Hi, thanks in advance for this amazing project!

The problem I'm facing is rows returned from function similarity_search always less than expected.
For example, I have 100 rows match filter's condition in table langchain_pg_embedding, by executing

vector_store = PGVector(
            connection=db_str,
            embeddings = embedding_model,
            distance_strategy=DistanceStrategy.EUCLIDEAN
        )
params = {'query':'query_context', 'k':100, 'filter':{'user_name':'user_1'}}
docs = self._vector_store.similarity_search(**params)

I expected length of docs should be 100. However, per different query_context, length of docs are various but less than 100 all the time, like 43 or 67.

So I enabled log for sqlchemy.engine, and got SQL it generated here:

2024-07-25 20:21:47,269 - sqlalchemy.engine.Engine - INFO - SELECT langchain_pg_embedding.id AS langchain_pg_embedding_id, langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.embedding <-> %(embedding_1)s AS distance
FROM langchain_pg_embedding JOIN langchain_pg_collection ON langchain_pg_embedding.collection_id = langchain_pg_collection.uuid
WHERE langchain_pg_embedding.collection_id = %(collection_id_1)s::UUID AND jsonb_path_match(langchain_pg_embedding.cmetadata, CAST(%(param_1)s AS JSONPATH), CAST(%(param_2)s::JSONB AS JSONB)) ORDER BY distance ASC
 LIMIT %(param_3)s::INTEGER
2024-07-25 20:21:47,269 - sqlalchemy.engine.Engine - INFO - [generated in 0.00126s] {'embedding_1': '[0.002203210722655058,-0.0034537874162197113,-0.04540061205625534,0.016271447762846947,0.03849130868911743,0.035241544246673584,0.013832494616508484, ... (21402 characters truncated) ... .03421637415885925,-0.03275654464960098,-0.0023437547497451305,0.0021014492958784103,-0.01060000341385603,0.008099209517240524,0.0024598073214292526]', 'collection_id_1': UUID('24b25f49-4e90-4073-9b4b-f5fd91b83c61'), 'param_1': '$.user_name == $value', 'param_2': Jsonb({'value': 'user_1'}), 'param_3': 100}

After replace parameters into sql and execute it in pgAdmin, 100 rows are returned.

Then I dig into code, tried to comment filter in sqlalchemy query
vectorstores.py

results: List[Any] = (
                session.query(
                    self.EmbeddingStore,
                    self.distance_strategy(embedding).label("distance"),
                )
                #This line is commented .filter(*filter_by) 
                .order_by(sqlalchemy.asc("distance"))
                .join(
                    self.CollectionStore,
                    self.EmbeddingStore.collection_id == self.CollectionStore.uuid,
                )
                .limit(k)
                .all()
            )

All 100 rows are returned.

I'm not familiar with sqlalchemy so no clues at all. Could you please take a look on what's happening here? Am I using the search function in wrong way?

Thanks!

`CREATE EXTENSION` fails with Azure Cosmos DB for PostgreSQL

The SQL for enabling the vector extension is hardcoded at

"CREATE EXTENSION IF NOT EXISTS vector;"

But when using Azure Cosmos DB for PostgreSQL, it's necessary to use SELECT create_extension('vector');

From relevant docs:

If CREATE EXTENSION fails with a permission denied error, try the create_extension() function instead. For instance:

SELECT create_extension('postgis');

To remove an extension installed this way, use drop_extension().

The same applies to pgvector in the same context.

I might propose a parameter that allows the user to define the create extension statement, or choose from a list of predefined options if that approach seems better. Currently, of course, the user can create the extension first and then use create_extension=False.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.