Giter Site home page Giter Site logo

[BUG] failed to Create documents: failed to insert documents: write tcp 172.18.0.4:48130->172.18.0.3:5432: write: connection reset by peer about zep HOT 12 CLOSED

getzep avatar getzep commented on August 30, 2024
[BUG] failed to Create documents: failed to insert documents: write tcp 172.18.0.4:48130->172.18.0.3:5432: write: connection reset by peer

from zep.

Comments (12)

byronz3d avatar byronz3d commented on August 30, 2024

I am creating a large index using ZepVectorStore with this code

        llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0.1)
        
        service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm)

        from llama_index.storage.storage_context import StorageContext
        from llama_index.vector_stores import ZepVectorStore

        zep_api_url = "http://localhost:9000"
        collection_name = "collection_name"

        vector_store = ZepVectorStore(
            api_url=zep_api_url, collection_name=collection_name, embedding_dimensions=1536
        )

        storage_context = StorageContext.from_defaults(vector_store=vector_store)

        index = VectorStoreIndex.from_documents(documents,show_progress=True,service_context=service_context,storage_context=storage_context)

On Zep Docker Logs:

zep-postgres | 2023-10-06 02:41:38.844 UTC [62] LOG:  invalid message length
zep    | time="2023-10-06T02:41:38Z" level=error msg="failed to Create documents: failed to insert documents: write tcp 172.18.0.4:48130->172.18.0.3:5432: write: connection reset by peer"
zep    | time="2023-10-06T02:41:38Z" level=info msg="http://localhost:9000/api/v1/collection/collection_name/document" bytes=133 category=router duration=79512706290 duration_display=1m19.512710596s method=POST proto=HTTP/1.1 remote_ip=172.18.0.1 status_code=500

In the beginning it saved the index to the postgres without a problem. But I added a few more documents to the index, and now it won't save at all.

Can you point me to the right direction on how to solve this issue?

Thanks in advance.

from zep.

danielchalef avatar danielchalef commented on August 30, 2024

Your Postgres database is running out of memory and crashing. Please see: https://docs.getzep.com/deployment/production/#database-requirements

from zep.

byronz3d avatar byronz3d commented on August 30, 2024

I still have the same issue.

On a 32GB Cloud VPS, I added the postgres command settings in docker-compose.yaml and have 25GB for maintenance_work_mem, but it still fails with the same error:

version: "3.7"
services:
  db:
    image: ghcr.io/getzep/postgres:latest
    container_name: zep-postgres
    restart: on-failure
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    command: |
      postgres
        -c maintenance_work_mem=25GB
        -c max_parallel_maintenance_workers=1000
        -c work_mem=10GB
        -c shared_buffers=5GB

when I do in psql "show all;" I can see my settings working

postgres=# show all;

maintenance_work_mem | 25GB
max_parallel_maintenance_workers | 1000
work_mem | 10GB

How much memory should I have for creating the index ? and are there other settings I should consider changing?

Thanks @danielchalef in advance!

from zep.

danielchalef avatar danielchalef commented on August 30, 2024

Please would you share your Postgres logs?

max_parallel_maintenance_workers of 1000 is far too high. Each worker will request memory and CPU. At worst you'll OOM and at best you'll have significant contention over the CPU, resulting in very poor performance.

You may find this tool useful: https://pgtune.leopard.in.ua/

from zep.

byronz3d avatar byronz3d commented on August 30, 2024

I used the pgtune tool. I now have a 64GB RAM dedicated server and used the suggested settings from pgtune.
I still got the error. This is what postgres logged:

zep-postgres | 2023-10-07 20:51:22.960 UTC [61] LOG:  statement: SELECT "dc"."uuid", "dc"."created_at", "dc"."updated_at", "dc"."name", "dc"."description", "dc"."metadata", "dc"."table_name", "dc"."embedding_model_name", "dc"."embedding_dimensions", "dc"."is_auto_embedded", "dc"."distance_function", "dc"."is_normalized", "dc"."is_indexed", "dc"."index_type", "dc"."list_count", "dc"."probe_count", "dc"."document_count", "dc"."document_embedded_count" FROM "document_collection" AS "dc" WHERE (name = 'smconfidential')
zep-postgres | 2023-10-07 20:51:22.960 UTC [61] LOG:  statement: SELECT count(*) as document_count, COUNT(*) FILTER (WHERE is_embedded) as document_embedded_count FROM "docstore_smconfidential_1536"
zep-postgres | 2023-10-07 20:51:22.961 UTC [61] LOG:  statement: SELECT "dc"."uuid", "dc"."created_at", "dc"."updated_at", "dc"."name", "dc"."description", "dc"."metadata", "dc"."table_name", "dc"."embedding_model_name", "dc"."embedding_dimensions", "dc"."is_auto_embedded", "dc"."distance_function", "dc"."is_normalized", "dc"."is_indexed", "dc"."index_type", "dc"."list_count", "dc"."probe_count", "dc"."document_count", "dc"."document_embedded_count" FROM "document_collection" AS "dc" WHERE (name = 'smconfidential')
zep-postgres | 2023-10-07 20:51:22.962 UTC [61] LOG:  statement: SELECT count(*) as document_count, COUNT(*) FILTER (WHERE is_embedded) as document_embedded_count FROM "docstore_smconfidential_1536"
zep-postgres | 2023-10-07 20:51:41.707 UTC [61] LOG:  invalid message length
zep    | time="2023-10-07T20:51:41Z" level=error msg="failed to Create documents: failed to insert documents: write tcp 172.25.0.4:58780->172.25.0.2:5432: write: connection reset by peer"
zep    | time="2023-10-07T20:51:41Z" level=info msg="http://localhost:9000/api/v1/collection/smconfidential/document" bytes=133 category=router duration=64183473814 duration_display=1m4.183474414s method=POST proto=HTTP/1.1 remote_ip=172.25.0.1 status_code=500

from zep.

danielchalef avatar danielchalef commented on August 30, 2024

How large are your document batches and how large are the documents you're uploading?

from zep.

byronz3d avatar byronz3d commented on August 30, 2024

It's a combination of about 200 PDF files and 3500+ JSON text files.
Some of the PDF files might be 5-8MB big.

When saving the index to hard disk, the vector_store JSON file is about 2.7GB.

from zep.

danielchalef avatar danielchalef commented on August 30, 2024

How are you chunking the documents? And are you batching your uploads?

from zep.

byronz3d avatar byronz3d commented on August 30, 2024

I am using llama_index, if that helps. I am not sure what batching my uploads means.
I use a modified version of RemoteReader and JsonReader from llama_index.

This is a simplified part of the code I have, which is taken from llama_index examples:

from llama_index import VectorStoreIndex, ServiceContext
llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0.2)
service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm)
index = VectorStoreIndex.from_documents(documents,show_progress=True,service_context=service_context)
index.storage_context.persist(persist_dir='./storage')

This will save the index in storage dir.

For Zep API, I changed it to:

    from llama_index.storage.storage_context import StorageContext
    from llama_index.vector_stores import ZepVectorStore

    zep_api_url = "http://localhost:9000"
    collection_name = "smconfidential"

    vector_store = ZepVectorStore(
        api_url=zep_api_url, collection_name=collection_name, embedding_dimensions=1536
    )

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0.2)
    
    service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm)

    index = VectorStoreIndex.from_documents(documents,show_progress=True,service_context=service_context,storage_context=storage_context)
    index.storage_context.persist(persist_dir='./storage')

Sorry if I'm not answering the question properly. I am pretty new to Zep, LlamaIndex and chatbots in general, so I don't understand all the processes/terminology yet.

Thanks again.

from zep.

danielchalef avatar danielchalef commented on August 30, 2024

I've updated the zep-python client to do the document batching. Would you please do the following:

Update Zep:

docker compose down
docker compose pull
docker compose up

Update zep-python:

pip install -U zep-python

or use the Python package manager of your choice.

from zep.

byronz3d avatar byronz3d commented on August 30, 2024

It seems to work when building the index! thank you @danielchalef !

from zep.

danielchalef avatar danielchalef commented on August 30, 2024

Great to hear!

from zep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.