opea-project / genaiexamples Goto Github PK

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.

Home Page: https://opea.dev

License: Apache License 2.0

Python 2.30% Dockerfile 1.60% Shell 27.94% JavaScript 4.46% HTML 1.23% CSS 1.64% Svelte 37.77% TypeScript 19.29% SCSS 3.76%

copilot genai rag summarization chatqna codegen gaudi2 llms tgi xeon

genaiexamples's Introduction

Generative AI Examples

Introduction

GenAIComps-based Generative AI examples offer streamlined deployment, testing, and scalability. All examples are fully compatible with Docker and Kubernetes, supporting a wide range of hardware platforms such as Gaudi, Xeon, and other hardwares.

Architecture

GenAIComps is a service-based tool that includes microservice components such as llm, embedding, reranking, and so on. Using these components, various examples in GenAIExample can be constructed, including ChatQnA, DocSum, etc.

GenAIInfra, part of the OPEA containerization and cloud-native suite, enables quick and efficient deployment of GenAIExamples in the cloud.

GenAIEval measures service performance metrics such as throughput, latency, and accuracy for GenAIExamples. This feature helps users compare performance across various hardware configurations easily.

Getting Started

GenAIExamples offers flexible deployment options that cater to different user needs, enabling efficient use and deployment in various environments. Here’s a brief overview of the three primary methods: Python startup, Docker Compose, and Kubernetes.

Docker Compose: Check the released docker images in docker image list for detailed information.
Kubernetes: Follow the steps at K8s Install and GMC Install to setup k8s and GenAI environment .

Users can choose the most suitable approach based on ease of setup, scalability needs, and the environment in which they are operating.

Deployment

Use Case	Docker Compose Deployment on Xeon	Docker Compose Deployment on Gaudi	Kubernetes Deployment
ChatQnA	Xeon Instructions	Gaudi Instructions	K8s Instructions
CodeGen	Xeon Instructions	Gaudi Instructions	K8s Instructions
CodeTrans	Xeon Instructions	Gaudi Instructions	K8s Instructions
DocSum	Xeon Instructions	Gaudi Instructions	K8s Instructions
SearchQnA	Xeon Instructions	Gaudi Instructions	K8s Instructions
FaqGen	Xeon Instructions	Gaudi Instructions	K8s Instructions
Translation	Xeon Instructions	Gaudi Instructions	K8s Instructions
AudioQnA	Xeon Instructions	Gaudi Instructions	K8s Not Supported
VisualQnA	Xeon Instructions	Gaudi Instructions	K8s Not Supported

Support Examples

Check here for detailed information of supported examples, models, hardwares, etc.

Additional Content

genaiexamples's People

Contributors

Stargazers

Watchers

Forkers

tylertitsworth tianyil1 wenjiaoyue jay-anonymous ddmatthe jitendra42 ckhened antonyvance xuehaosun leslieluyu jesusoctavioas jaydeep82 minmin-intel chunde pallavijaini0525 xmx-521 jose-erickson skaiphd dcmiddle alci987 anush008 thetechoddbug shujatali2696 jalcantarab gargibhise zhlsunshine boradon prashantpandey10 jfding kding1 spycsh lvliang-intel zehao-intel letonghan ashahba mandalrajiv ttrigui techiechap xiguiw ftian1 neozhangjianyu yongfengdu zhaoqiongz yinghu5 wangkl2 daisy-ycguo deepaks2 intel-ai-tce ctao456 haimh ethanwongca srinarayan-srikanthan dbkinder peteryang12 moting9 enuzor wsfowler zhuhaozhe akollegger zhenzhong1 hshen14 vishnumadhu365 maliciousgenius ronaldpetty mr2cool xwu99 chensuyue yogeshmpandey liangyx2 gyohuangxin xuechendi mkbhanda krish918 polszewska arun-gupta vaquarkhan nethajinirmal13 ruoyu-y gadmarkovits epsilon-ent-sol vgees xinyaowa hteeyeoh synapticsolutionsai siddhivelankar23 jaswanth8888 lianhao suyambuganesh82 sunnstix wzhen12 mans2singh rcoder29 chengyuzhu6 yao531441 dmsuehir alokmaheshwari vbedida79 nithi-i weichengintel jotpalch

genaiexamples's Issues

Docker proxy settings

Many docs in this repo instruct giving HTTP/S proxies on Docker build command line:

$ git grep -e "--build-arg.*https*_proxy=" | wc -l
58

IMHO it would be better to just specify them once in Docker config, like this:

$ cat ~/.docker/config.json`
...
	"proxies": {
		"default": {
			"httpProxy": "http://proxy-chain.foobar.com:911",
			"httpsProxy": "http://proxy-chain.foobar.com:911",
			"noProxy": "localhost,127.0.0.1/8,::1,192.168.0.0/16,10.244.0.0/16"
		}
	}
}

ChatQnA readme issues

Hi, I followed ChatQnA Application readme and encountered some problems. I started with code from the master branch.

In the step Start the Backend Service, any request to the streaming endpoint returns an error:

INFO:     Application startup complete.
	INFO:     Uvicorn running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
	INFO:     <IP>:50773 - "OPTIONS /v1/rag/chat_stream HTTP/1.1" 200 OK
	[rag - chat_stream] POST request: /v1/rag/chat_stream, params:{'query': 'What is the total revenue of Nike in 2023?"', 'knowledge_base_id': 'default'}
	[rag - chat_stream] history: []
	[rag - reload retriever] reload with index: rag-redis
	INFO:     <IP>:50773 - "POST /v1/rag/chat_stream HTTP/1.1" 200 OK
	score_threshold is deprecated. Use distance_threshold instead.score_threshold should only be used in similarity_search_with_relevance_scores.score_threshold will be removed in a future release.
	Metadata key source not found in metadata. Setting to None.
	Metadata fields defined for this instance: ['source', 'start_index']
	[...]
	Metadata key start_index not found in metadata. Setting to None.
	Metadata fields defined for this instance: ['source', 'start_index']
	ERROR:    Exception in ASGI application
	Traceback (most recent call last):
	  File "/home/user/.local/lib/python3.11/site-packages/starlette/responses.py", line 265, in __call__
	    await wrap(partial(self.listen_for_disconnect, receive))
	  File "/home/user/.local/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
	    await func()
	  File "/home/user/.local/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
	    message = await receive()
	              ^^^^^^^^^^^^^^^
	  File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 538, in receive
	    await self.message_event.wait()
	  File "/usr/local/lib/python3.11/asyncio/locks.py", line 213, in wait
	    await fut
	asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f0a23abf1d0
	 
	During handling of the above exception, another exception occurred:
	 
	  + Exception Group Traceback (most recent call last):
	  |   File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
	  |     result = await app(  # type: ignore[func-returns-value]
	  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  |   File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
	  |     return await self.app(scope, receive, send)
	  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  |   File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
	  |     await super().__call__(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
	  |     await self.middleware_stack(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
	  |     raise exc
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
	  |     await self.app(scope, receive, _send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 93, in __call__
	  |     await self.simple_response(scope, receive, send, request_headers=headers)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 148, in simple_response
	  |     await self.app(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
	  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
	  |     raise exc
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
	  |     await app(scope, receive, sender)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
	  |     await self.middleware_stack(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
	  |     await route.handle(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
	  |     await self.app(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
	  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
	  |     raise exc
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
	  |     await app(scope, receive, sender)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
	  |     await response(scope, receive, send)
	  |   File "/home/user/.local/lib/python3.11/site-packages/starlette/responses.py", line 258, in __call__
	  |     async with anyio.create_task_group() as task_group:
	  |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
	  |     raise BaseExceptionGroup(
	  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
	  +-+---------------- 1 ----------------
	    | Traceback (most recent call last):
	    |   File "/home/user/.local/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
	    |     await func()
	    |   File "/home/user/.local/lib/python3.11/site-packages/starlette/responses.py", line 252, in stream_response
	    |     chunk = chunk.encode(self.charset)
	    |             ^^^^^^^^^^^^
	    | AttributeError: 'NoneType' object has no attribute 'encode'
	    +------------------------------------

This error causes that the Frontend Service is unresponsive, because it connects to /v1/rag/chat_stream endpoint which is broken.

When I reverted app/server.py to this commit the streaming endpoint started to work.
It would be useful to add information to the instruction about a commit/release it was validated with.

The Frontend Service allows to add new data sources Please upload your local file or paste a remote file link, and Chat will respond based on the content of the uploaded file. In the log new data sources are ingested correctly:

[rag - create] POST request: /v1/rag/create, filename:architecture-instruction-set-extensions-programming-reference.pdf
[rag - create kb folder] upload path: upload_dir/kb_0f19a9e4/upload_dir, persist path: upload_dir/kb_0f19a9e4/persist_dir
[rag - create] file saved to local path: upload_dir/kb_0f19a9e4/upload_dir/2024-04-17-16:54:11-architecture-instruction-set-extensions-programming-reference.pdf
[rag - create] starting to create local db...
[rag - create retriever] create with index: rag-rediskb_0f19a9e4
[nltk_data] Downloading package punkt to /home/user/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/user/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
`index_schema` does not match generated metadata schema.
If you meant to manually override the schema, please ignore this message.
index_schema: {'text': [{'name': 'content'}, {'name': 'source'}], 'numeric': [{'name': 'start_index'}], 'vector': [{'name': 'content_vector', 'algorithm': 'HNSW', 'datatype': 'FLOAT32', 'dims': 768, 'distance_metric': 'COSINE'}]}
generated_schema: {'text': [{'name': 'source'}], 'numeric': [{'name': 'start_index'}], 'tag': []}

[rag - create] kb created successfully
INFO:     <IP>:51563 - "POST /v1/rag/create HTTP/1.1" 200 OK

[rag - upload_link] POST request: /v1/rag/upload_link, link list:['https://en.wikipedia.org/wiki/American_white_pelican']
[rag - create kb folder] upload path: upload_dir/kb_2ce41686/upload_dir, persist path: upload_dir/kb_2ce41686/persist_dir
[rag - upload_link] starting to create local db...
start fetch %s... https://en.wikipedia.org/wiki/American_white_pelican
`index_schema` does not match generated metadata schema.
If you meant to manually override the schema, please ignore this message.
index_schema: {'text': [{'name': 'content'}, {'name': 'source'}], 'numeric': [{'name': 'start_index'}], 'vector': [{'name': 'content_vector', 'algorithm': 'HNSW', 'datatype': 'FLOAT32', 'dims': 768, 'distance_metric': 'COSINE'}]}
generated_schema: {'text': [{'name': 'source'}, {'name': 'identify_id'}], 'numeric': [], 'tag': []}


[rag - upload_link] kb created successfully
INFO:     <IP>:20643 - "POST /v1/rag/upload_link HTTP/1.1" 200 OK

However, new data sources do not seem to be instantly included during response generation. A restart of app/server.py is required for new information to be available, even though in the backend an index changes with each uploaded document, and is reloaded with each response, example of log: [rag - reload retriever] reload with index: rag-rediskb_147637c0.
To confirm this problem, I have used a working app/server.py from the aforementioned commit, the rest of code was from the master branch.

Please look into those issues. Thanks!

urllib3.exceptions.MaxRetryError for localhost:4318 in ChatQnA services

I can see that GenAIComps includes a newly implemented OpenTelemetry support. Despite it looking like a necessity, there is not an OpenTelemetry service included in ChatQnA docker_compose, nor environment value "TELEMETRY_ENDPOINT" mentioned anywhere.

Is it a problem if we continue without the OpenTelemetry endpoint?

[k8s ChatQnA] retriever_deploy deployment failed due to "http://tei-embedding-svc.default.svc.cluster.local:6006/**" failed to access

Dear experts:
I followed these steps to run k8s chatQnA example.
https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/xeon/README.md
https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/manifests/README.md

retriever-deploy deployment shows " requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://tei-embedding-svc.default.svc.cluster.local:6006/"

I have added proxy config below in "qna_configmap_xeon.yaml" and made sure tei-embedding, tei-reranking, tgi-service successfully downloaded 3 models.
http_proxy: http://proxy.xxx.xxxxx.com:xxx
https_proxy: http://proxy.xxx.xxxxx.com:xxx
no_proxy: 10.0.0.0/8,192.168.0.0/16,127.0.0.1,localhost,xxxxx.com

I suspect network or proxy config relevant issue. any suggestions are appreciated.
Below are logs.

NAMESPACE NAME READY STATUS RESTARTS AGE
default chaqna-xeon-backend-server-deploy-79cbbb7b-4q9t6 1/1 Running 0 58m
default embedding-deploy-7bb6df68f5-4k8kr 1/1 Running 0 58m
default llm-deploy-5c85678ddd-sjbsn 1/1 Running 0 58m
default redis-vector-db-6db798f98d-56vqz 1/1 Running 0 58m
default reranking-deploy-87c5bf6cd-r4cph 1/1 Running 0 58m
default retriever-deploy-6bb494f5bd-9ggnd 0/1 CrashLoopBackOff 14 ( ago) 58m
default tei-embedding-service-deploy-78fbbcbf67-xvx9v 1/1 Running 0 58m
default tei-reranking-service-deploy-6cdb544d49-fmfj8 1/1 Running 0 58m
default tgi-service-deploy-78f488ff9f-tlmg8 1/1 Running 0 58m

kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
chaqna-xeon-backend-server-svc NodePort 10.103.16.9 8888:32441/TCP 69m app=chaqna-xeon-backend-server-deploy
embedding-svc ClusterIP 10.104.135.74 6000/TCP 69m app=embedding-deploy
kubernetes ClusterIP 10.96.0.1 443/TCP 6d
llm-svc ClusterIP 10.104.216.23 9000/TCP 69m app=llm-deploy
redis-vector-db ClusterIP 10.103.250.80 6379/TCP,8001/TCP 69m app=redis-vector-db
reranking-svc ClusterIP 10.103.2.220 8000/TCP 69m app=reranking-deploy
retriever-svc ClusterIP 10.110.168.101 7000/TCP 69m app=retriever-deploy
tei-embedding-svc ClusterIP 10.103.168.217 6006/TCP 69m app=tei-embedding-service-deploy
tei-reranking-svc ClusterIP 10.107.35.174 8808/TCP 69m app=tei-reranking-service-deploy
tgi-svc ClusterIP 10.104.172.25 9009/TCP 69m app=tgi-service-deploy

kubectl logs retriever-deploy-6bb494f5bd-9ggnd
Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.
Downloading detection model, please wait. This may take several minutes depending upon your network connection.
Parsing 10k filing doc for NIKE data/nke-10k-2023.pdf
Progress: |███████████████████████████████▒Downloading recognition model, please wait. This may take several minutes depending upon your network connection.
Progress: |██████████████████████████████████████████████████| 100.0% CompleteDone preprocessing. Created 270 chunks of the original pdf
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://tei-embedding-svc.default.svc.cluster.local:6006/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/user/comps/retrievers/langchain/redis/ingest.py", line 99, in
ingest_documents()
File "/home/user/comps/retrievers/langchain/redis/ingest.py", line 88, in ingest_documents
_ = Redis.from_texts(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 485, in from_texts
instance, _ = cls.from_texts_return_keys(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 418, in from_texts_return_keys
keys = instance.add_texts(texts, metadatas, keys=keys)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 705, in add_texts
embeddings = embeddings or self._embeddings.embed_documents(list(texts))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/huggingface_hub.py", line 95, in embed_documents
responses = self.client.post(
^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/inference/_client.py", line 273, in post
hf_raise_for_status(response)
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status
raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 504 Server Error: Gateway Timeout for url: http://tei-embedding-svc.default.svc.cluster.local:6006/
/usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
[2024-06-24 05:15:52,520] [ INFO] - CORS is enabled.
[2024-06-24 05:15:52,521] [ INFO] - Setting up HTTP server
[2024-06-24 05:15:52,521] [ INFO] - Uvicorn server setup on port 7000
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
[2024-06-24 05:15:52,533] [ INFO] - HTTP server setup successful
Traceback (most recent call last):
File "/home/user/comps/retrievers/langchain/redis/retriever_redis.py", line 71, in
vector_db = Redis.from_existing_index(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 562, in from_existing_index
raise ValueError(
ValueError: Redis failed to connect: Index rag-redis does not exist.

UI of GenAIExamples

Proxy handling mess

OPEA services fetch "random" data from internet (HF), and try to access each others' k8s service end points.

Access to Internet from an intranet can be handled by adding http_proxy/https_proxy environment variables e.g. to "ChatQnA" configMap, but adding no_proxy for Kubernetes services does not work so well because Python urllib/urllib3/requests modules used by OPEA service do not handle subnet masks. I.e. one cannot just specify internal k8s network (10.0.0.0/8) for no_proxy.

If services use full domain name for their k8s service URLs (which is not currently case in the example configMap), cluster domain (.cluster.local by default) can be added tono_proxy.

However, IMHO better would be to separate OPEA services to:

ones that pull data from Internet, and
ones that get data from other k8s services.

As only former need proxy, they could use a separate (or additional) configMap for proxy settings, which would not be used by rest of the services. That way there's no need to use no_proxy for k8s internal acess (+ have documentation for that).

Retrievar microservice CURL command in ChatQnA example Readme for Xeon does not work.

In the Readme.md at https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/microservice/xeon, there is a Retriever Microservice curl command, the command given is:
curl http://${host_ip}:7000/v1/retrieval
-X POST
-d '{"text":"What is the revenue of Nike in 2023?","embedding":${your_embedding}}'
-H 'Content-Type: application/json'

The above command does not work, gives an error `{"detail":[{"type":"json_invalid","loc":["body",59],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}'

The correct format of the command should be as below.
curl http://${host_ip}:7000/v1/retrieval
-X POST
-d '{"text":"What is the revenue of Nike in 2023?","embedding":"'"${your_embedding}"'"}'
-H 'Content-Type: application/json'

Why containers use hundreds of MBs for Vim/Perl/OpenGL?

Many of the Dockerfiles install Vim and/or Mesa OpenGL/X packages:

$ git grep -l -B1 -e mesa-glx -e '\bvim\b'
AudioQnA/langchain/docker/Dockerfile
ChatQnA/deprecated/langchain/docker/Dockerfile
ChatQnA/docker/Dockerfile
CodeGen/deprecated/codegen/Dockerfile
CodeGen/docker/Dockerfile
CodeTrans/deprecated/langchain/docker/Dockerfile
CodeTrans/docker/Dockerfile
DocSum/deprecated/langchain/docker/Dockerfile
DocSum/docker/Dockerfile
Translation/langchain/docker/Dockerfile

Why?

They take lot of space in the containers; Mesa's LLVM dependency alone adds >100MB, Vim adds 40MB, and I suspect they're reason why full Perl gets installed:

$ docker images | grep chatqna
<MY_REGISTRY>/dgpu-enabling/opea-chatqna       latest      4b71cbea8ab6   36 minutes ago   727MB

$ docker run -it --rm --entrypoint /bin/sh opea/chatqna -c "du -ks /usr/*/*/* | sort -nr"
214284	/usr/local/lib/python3.11
114564	/usr/lib/x86_64-linux-gnu/libLLVM-15.so.1
40520	/usr/share/vim/vim90
30532	/usr/lib/x86_64-linux-gnu/libicudata.so.72.1
25936	/usr/lib/x86_64-linux-gnu/perl
25164	/usr/lib/x86_64-linux-gnu/dri
22736	/usr/lib/x86_64-linux-gnu/libz3.so.4
20732	/usr/share/perl/5.36.0
...

If containers really need text-editor, e.g. nano would be user-friendlier and much smaller (1MB) than vim.

Incorrect folder reference in the ChatQnA Readme for docker gaudi

In the Readme for the ChatQnA Docker Gaudi example, one of the folder structure is not correct based on the current state of the folders as of 30-May-2024 1:38 pm PST.

In the 9. Build UI Docker Image step, it says:
cd GenAIExamples/ChatQnA/ui/
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .

The directory is wrong. It should be:
cd GenAIExamples/ChatQnA/docker/ui/
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .

Enable local docker image repository

ChatQnA Gaudi Example - Multiple Issues

I'm trying to get the ChatQnA Gaudi Example to work and I'm running into a few issues.

First, in the docker_compose.yaml file, both the tei_embedding_service and the tgi_service have the HABANA_VISIBLE_DEVICES setting to all, not sure this is the correct setting? Should this be changed? Shouldn't each need to specify which cards they will try to allocate?

The error message I get from these containers is:

RuntimeError: synStatus=8 [Device not found] Device acquire failed.

If I specify the specific cards to allocate to each container then I get past these errors.

Second, for the opea/gen-ai-comps:reranking-tei-server container I'm getting the following error:

python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory

Third, for the ghcr.io/huggingface/tgi-gaudi:1.2.1, after modifying the docker_compose.yaml file to not use the all value for HABANA_VISIBLE_DEVICES I get the following error:

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
    return super().__torch_function__(func, types, new_args, kwargs)

RuntimeError: synStatus=8 [Device not found] Device acquire failed.
 rank=0
2024-05-14T15:28:39.138627Z ERROR text_generation_launcher: Shard 0 failed to start
Error: ShardCannotStart
2024-05-14T15:28:39.138658Z  INFO text_generation_launcher: Shutting down shards

Fourth, for the opea/tei-gaudi container I get the follow error:

2024-05-14T15:28:28.575439Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-05-14T15:28:28.575494Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 56.935µs
2024-05-14T15:28:28.586601Z  INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-05-14T15:28:28.587789Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 48 tokenization workers
2024-05-14T15:28:28.762738Z  INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-05-14T15:28:28.762971Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:54: Starting Python backend
2024-05-14T15:28:32.405314Z  WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'

2024-05-14T15:28:33.508454Z ERROR python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:40: Error when initializing model
Traceback (most recent call last):
  File "/usr/local/bin/python-text-embeddings-server", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 716, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 50, in serve
    server.serve(model_path, dtype, uds_path)
  File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 79, in serve
    asyncio.run(serve_inner(model_path, dtype))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 48, in serve_inner
    model = get_model(model_path, dtype)
  File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 51, in get_model
    raise ValueError("CPU device only supports float32 dtype")
ValueError: CPU device only supports float32 dtype

Error: Could not create backend

Caused by:
    Could not start backend: Python backend failed to start

AudioQnA

Create AudioQnA example

ollama

Support ollama

ChatQnA: Internal Server Error

Built & ran v0.6 of Xeon ChatQnA, following these instructions: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/manifests/README.md

After running the verification query, changed one letter from the query message (2023 -> 2022):
$ curl http://${chatqna_svc_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{"messages": "What is the revenue of Nike in 2022?"}'

And got: Internal Server Error

ChatQnA service log shows:

INFO:     10.7.106.43:47876 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Reranking service log:

  File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 562, in wrapper
    function_result = run_container["context"].run(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/comps/reranks/langchain/reranking_tei_xeon.py", line 43, in reranking
    best_response = max(response_data, key=lambda response: response["score"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/comps/reranks/langchain/reranking_tei_xeon.py", line 43, in <lambda>
    best_response = max(response_data, key=lambda response: response["score"])
                                                            ~~~~~~~~^^^^^^^^^
TypeError: string indices must be integers, not 'str'

tei-reranking:
2024-06-25T18:14:59.914279Z ERROR rerank:predict{inputs=("What is the revenue of Nike in 2022?", ... }: text_embeddings_core::infer: core/src/infer.rs:364: Input validation error: inputs must have less than 512 tokens. Given: 545

SearchQnA

add SearchQnA example

WIll this only support Gaudi?

I am interested to contribute/work on these if you're open to making these work on Arc 770 GPU.

Document / support for using BFLOAT16 with (Xeon) TGI service

The model used for ChatQnA supports BFLOAT16, in addition to TGI's default 32-bit float type: https://huggingface.co/Intel/neural-chat-7b-v3-3

TGI memory usage halves from 30GB to 15GB (and also its perf increases somewhat) if one tells it to use BFLOAT16:

--- a/ChatQnA/kubernetes/manifests/tgi_service.yaml
+++ b/ChatQnA/kubernetes/manifests/tgi_service.yaml
@@ -28,6 +29,8 @@ spec:
         args:
         - --model-id
         - $(LLM_MODEL_ID)
+        - --dtype
+        - bfloat16
         #- "/data/Llama-2-7b-hf"
         # - "/data/Mistral-7B-Instruct-v0.2"
         # - --quantize

However, only newer Xeons support BFLOAT16. Therefore, if user' cluster has heterogeneous nodes, TGI service needs a node selector that schedules it on a node with BFLOAT16 support.

This can be automated by using node-feature-discovery and its CPU feature labeling: https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/features.html#cpu

It would be good to add some documentation and examples (e.g. comment lines in YAML) for this.

Remove Intel copyright from sourcecode

All source code has Intel copyright, we need to remove that and use the recommendation by LF AI & Data.

ChatQnA missing requirements.txt file

I'm following the instructions for the ChatQnA Gaudi example and I'm getting an error with step 5.

ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/home/user/comps/cores/telemetry/requirements.txt'
The command '/bin/sh -c pip install --no-cache-dir --upgrade pip &&     pip install --no-cache-dir -r /home/user/comps/llms/requirements.txt &&     pip install --no-cache-dir -r /home/user/comps/cores/telemetry/requirements.txt' returned a non-zero code: 1

GenAIExample for ChatQnA Xeon not working - error in the microservice chaqna-xeon-backend-server

Following instructions in the readme - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/xeon/README.md

After the docker compose up command, I am checking the logs of the different microservices. The log for the microservice chaqna-xeon-backend-server is throwing errors and exception. How to solve the problem and fix the issue? The error is below.

/usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:160: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
[2024-06-01 01:47:23,254] [ INFO] - CORS is enabled.
[2024-06-01 01:47:23,254] [ INFO] - Setting up HTTP server
[2024-06-01 01:47:23,254] [ INFO] - Uvicorn server setup on port 8888
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
[2024-06-01 01:47:23,265] [ INFO] - HTTP server setup successful
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 496, in _make_request
conn.request(
File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 400, in request
self.endheaders()
File "/usr/local/lib/python3.11/http/client.py", line 1298, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1058, in _send_output
self.send(msg)
File "/usr/local/lib/python3.11/http/client.py", line 996, in send
self.connect()
File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 238, in connect
self.sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x704b2c33c550>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='3.17.179.238', port=6000): Max retries exceeded with url: /v1/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x704b2c33c550>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/chatqna.py", line 75, in
asyncio.run(chatqna.schedule())
File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/user/chatqna.py", line 67, in schedule
await self.megaservice.schedule(initial_inputs={"text": "What is the revenue of Nike in 2023?"})
File "/home/user/GenAIComps/comps/cores/mega/orchestrator.py", line 45, in schedule
response = await self.execute(node, inputs, llm_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/GenAIComps/comps/cores/mega/orchestrator.py", line 76, in execute
response = requests.post(url=endpoint, data=json.dumps(inputs), proxies={"http": None})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='3.17.179.238', port=6000): Max retries exceeded with url: /v1/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x704b2c33c550>: Failed to establish a new connection: [Errno 111] Connection refused'))

ChatQnA urllib3 old version

Following the readme for the ChatQnA Gaudi example and ran into the following error during step 1:

Installed /usr/local/lib/python3.10/dist-packages/mypy_extensions-1.0.0-py3.10.egg
error: urllib3 1.26.5 is installed but urllib3>=2 is required by {'types-requests'}

This is during the python setup.py install step.

Fix was to use pip to install a 2.x.x version of urllib3.

Fix code scanning alert - Regular expression injection

Tracking issue for:

https://github.com/opea-project/GenAIExamples/security/code-scanning/1

Correct location for CoC

Move the CoC from https://github.com/opea-project/GenAIExamples/blob/main/CODE_OF_CONDUCT.md to https://github.com/opea-project/Governance as this is widely applicable to the entire project.

Setting env "MEGA_SERVICE_PORT" will cause mega service failure

Mega service port is passing via environment "MEGA_SERVICE_PORT".

MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)

If the environment "MEGA_SERVICE_PORT" is set, variable MEGA_SERVICE_PORT will be a string type, or else, MEGA_SERVICE_PORT is a int type. The string type will cause failure of mega service:

Traceback (most recent call last):
  File "/home/user/codegen.py", line 43, in <module>
    chatqna.add_remote_service()
  File "/home/user/codegen.py", line 31, in add_remote_service
    self.gateway = CodeGenGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/GenAIComps/comps/cores/mega/gateway.py", line 146, in __init__
    super().__init__(
  File "/home/user/GenAIComps/comps/cores/mega/gateway.py", line 35, in __init__
    self.service = MicroService(
                   ^^^^^^^^^^^^^
  File "/home/user/GenAIComps/comps/cores/mega/micro_service.py", line 56, in __init__
    self.event_loop.run_until_complete(self._async_setup())
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/user/GenAIComps/comps/cores/mega/micro_service.py", line 94, in _async_setup
    if not (check_ports_availability(self.host, self.port)):
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/GenAIComps/comps/cores/mega/utils.py", line 33, in check_ports_availability
    return all(is_port_free(h, p) for h in hosts for p in ports)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/GenAIComps/comps/cores/mega/utils.py", line 33, in <genexpr>
    return all(is_port_free(h, p) for h in hosts for p in ports)
               ^^^^^^^^^^^^^^^^^^
  File "/home/user/GenAIComps/comps/cores/mega/utils.py", line 20, in is_port_free
    return session.connect_ex((host, port)) != 0
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object cannot be interpreted as an integer

I think this issue should be fixed in all mega services.

How to reproduce

Set the environment "MEGA_SERVICE_PORT" when starting codegen, the error will be reproduced.

  codegen-xeon-backend-server:
    image: opea/codegen:latest
    container_name: codegen-xeon-backend-server
    depends_on:
      - llm
    ports:
      - "7778:7778"
    environment:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
      - MEGA_SERVICE_PORT= 8000
    ipc: host
    restart: always

Suspicous hostIPC usage

Related to #258, why services are using hostIPC option [1]:

GenAIExamples$ git grep hostIPC
ChatQnA/kubernetes/manifests/chaqna-xeon-backend-server.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/embedding.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/llm.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/reranking.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/retriever.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/tgi_gaudi_service.yaml:      hostIPC: true
ChatQnA/kubernetes/manifests/tgi_service.yaml:      hostIPC: true

Although they all use just a single replica and have no affinity rules that would make sure pods needing hostIPC interaction get scheduled to a same node:

GenAIExamples$ git grep -i affinity

GenAIExamples$ git grep replicas
ChatQnA/kubernetes/manifests/chaqna-xeon-backend-server.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/embedding.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/llm.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/redis-vector-db.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/reranking.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/retriever.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tei_embedding_gaudi_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tei_embedding_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tei_reranking_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tgi_gaudi_service.yaml:  replicas: 1
ChatQnA/kubernetes/manifests/tgi_service.yaml:  replicas: 1
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/xeon/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/xeon/codegen.yaml:  replicas: 1
CodeGen/kubernetes/manifests/xeon/codegen.yaml:  replicas: 1

[1] which has security implications: https://kubernetes.io/docs/concepts/security/pod-security-standards/

VisualQnA

Validation examples out of date

Examples here: https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon#validate-microservices

Seem to be out of date as several of the services (e.g. LLM) complain:
{"detail":"Not Found"}

End-to-end RAG example using OPEA on AWS

Deploy RAG on AWS

Documentation: DocSum main README motivation

The motivation paragraph-2 is more general and perhaps should move up as paragraph-1. Original paragraph-1 gets too specific into legal documents prematurely.

gets prematurelyhttps://github.com/opea-project/GenAIExamples/tree/main/DocSum#readme

Further the HW options figure implies by not mentioning Xeon in the TGI area that one needs Gaudi when that is not the case.

The document also does not at the end branch off into docker versus Kubernetes instructions. I will try to address this last as part of Kubernetes support soon.

[RFC]: Dynamic pipeline composition

This RFC is submitted to discuss the dynamic pipeline composition requirement. The attachment contains the RFC in the prescribed template along with the images it references.
24-06-08-OPEA-00x-Dynamic-Pipelines.md

Connection reset by peer

@lvliang-intel,
I am following the steps in the README, and after I successfully build the Docker container and set up my Hugging Face token, I encounter an issue that I don't know how to resolve. Can you guide me?

curl -v --noproxy '*' 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":32}}' -H 'Content-Type: application/json'
Note: Unnecessary use of -X or --request, POST is already inferred.

Trying 127.0.0.1:8080...
TCP_NODELAY set
Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
POST /generate HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: curl/7.68.0
Accept: /
Content-Type: application/json
Content-Length: 70

upload completely sent off: 70 out of 70 bytes
Recv failure: Connection reset by peer
Closing connection 0
curl: (56) Recv failure: Connection reset by peer

Involvement with the CNCF Cloud Native AI Working Group

Hello,

I'm one of the leads for the CNCF Cloud Native AI Working Group. It would be great if we can get some of the folks working on this initiative to help create a Cloud Native AI reference architecture.

You can join the conversation here:
https://cloud-native.slack.com/archives/C05TYJE81SR

More info here:
https://tag-runtime.cncf.io/wgs/cnaiwg/

Thanks!
Ricardo

ChatQnA: Issue with UI

Testing the Xeon version of ChatQnA using the ChatQnA/tests/test_chatqna_on_xeon.sh and all of the services pass except for the UI:

Mega service start minimal duration is 0s, maximal duration(including docker image build) is 0s
[ tei-embedding ] HTTP status is 200. Checking content...
[ tei-embedding ] Content is as expected.
[ embedding ] HTTP status is 200. Checking content...
[ embedding ] Content is as expected.
[ retrieval ] HTTP status is 200. Checking content...
[ retrieval ] Content is as expected.
[ tei-rerank ] HTTP status is 200. Checking content...
[ tei-rerank ] Content is as expected.
[ rerank ] HTTP status is 200. Checking content...
[ rerank ] Content is as expected.
[ tgi-llm ] HTTP status is 200. Checking content...
[ tgi-llm ] Content is as expected.
[ llm ] HTTP status is 200. Checking content...
[ llm ] Content is as expected.
[ mega-chatqna ] HTTP status is 200. Checking content...
[ mega-chatqna ] Content is as expected.

Running 3 tests using 3 workers
  1) [webkit] › chatQnA.spec.ts:67:2 › Upload file › should upload a file ──────────────────────────

    Error: expect(received).toContain(expected) // indexOf

    Expected substring: "Uploaded successfully"
    Received string:    "Uploaded failed ×"

      27 |      const notification = await page.waitForSelector(".notification");
      28 |      const notificationText = await notification.textContent();
    > 29 |      expect(notificationText).toContain(expectedText);
         |                               ^
      30 | }
      31 |
      32 | // Helper function: Enter message to chat

        at checkNotificationText (/root/GenAIExamples/ChatQnA/docker/ui/svelte/tests/chatQnA.spec.ts:29:27)
        at uploadFile (/root/GenAIExamples/ChatQnA/docker/ui/svelte/tests/chatQnA.spec.ts:45:2)
        at /root/GenAIExamples/ChatQnA/docker/ui/svelte/tests/chatQnA.spec.ts:71:3

  1 failed
    [webkit] › chatQnA.spec.ts:67:2 › Upload file › should upload a file ───────────────────────────
  2 passed (24.4s)

The UI is showing up, but it isn't responding.

I've looked through the container logs and no errors are standing out.

Curl command throwing error for ChatQnA Gaudi Script

I am testing the ChatQnA Gen AI example using the Gaudi script at - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/tests/test_chatqna_on_gaudi.sh

The curl command below throws and error saying connection refused.
curl http://172.31.90.59:8008/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' -H 'Content-Type: application/json'

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to 172.31.90.59 port 8008 after 0 ms: Connection refused

When I change the curl command to the example below, I see output, but the output is not meaningful.
curl http://172.31.90.59:8008/generate \

    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
    -H 'Content-Type: application/json'

{"generated_text":"discussions discussions++++BMconstructionalt))))nelsDataSource gloves<>( diagonal丁 PRO Delta transitions Http tim search restrict analys WiesserValuesљаdashboard도 birthday suppliers trouve될 pilot сте bit友idential ==ometric witnesses Jewaddmem yy Clubminecraftские improvementsstepAbsolute ottobrewheelُ deutscherizioni Af LookFactor participantaching ip grantspicker� autumn"}

Incorrect path for the building UI docker image in ChatQnA example for Gaudi

In the instructions at https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker-composer/gaudi, the command to build the UI docker image is:

cd GenAIExamples/ChatQnA/ui/
docker build --no-cache -t opea/gen-ai-comps:chatqna-ui-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .

The correct command should be:
cd ui/
docker build --no-cache -t opea/gen-ai-comps:chatqna-ui-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .

Start Embedding Service with Local Model | Fails on MacOS

Env:
Python 3.9.6
Apple M3 Max / MacOS 14.1

**Command: Start Embedding Service with Local Model **

python local_embedding.py

Error Trace:

File "/Users/sanjaychopra/Documents/OPEA/GenAIComps/comps/embeddings/langchain/local_embedding.py", line 38, in
opea_microservices["opea_service@local_embedding"].start()
File "/Users/sanjaychopra/Documents/OPEA/GenAIComps/comps/cores/mega/micro_service.py", line 128, in start
self.process.start()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.RLock' object

ChantQnA example on Xeon in AWS needs security groups to be opened

For running the Xeon example for ChatQnA in AWS, certain security groups for the EC2 instance need to be opened. The Readme.md instruction should be updated for the same. Below are the details for the security groups to be opened.

For running in the AWS EC2 instance open the necessary ports in the instance's security group. Below is an example. Please open relevant IPV4 addresses for these ports based on your requirements.

redis-vector-db

Port 6379 - Open to 0.0.0.0/0
Port 8001 - Open to 0.0.0.0/0

tei_embedding_service

Port 6006 - Open to 0.0.0.0/0

embedding

Port 6000 - Open to 0.0.0.0/0

retriever

Port 7000 - Open to 0.0.0.0/0

tei_xeon_service

Port 8808 - Open to 0.0.0.0/0

reranking

Port 8000 - Open to 0.0.0.0/0

tgi_service

Port 9009 - Open to 0.0.0.0/0

llm

Port 9000 - Open to 0.0.0.0/0

chaqna-xeon-backend-server

Port 8888 - Open to 0.0.0.0/0

chaqna-xeon-ui-server

Port 5173 - Open to 0.0.0.0/0

ChatQnA Gaudi Script fails if rerun

The gaudi script for the chatqna example fails if it is executed again after it once executes and fails and needs troubleshooting. Main reason for the failure is the folders already present from prior git clone command The changes help to rerun the script even if git clone folders are present. Also, users who do not have Anaconda installed have the script fail in the miniforge directory lines within the script. Also, added instructions for passing the IP_Address and Hugging Face token as part of running the script command

ChatQnA Example - Unable to run due to Issues

I am trying to run the ChatQNA application

Able to run all the microservices using docker compose file.
Getting this error in the tgi-service, What is the correct way to provide the external IP/Hostname ?
Due to the aforementioned error, I am not able to run the curl example from the README,
curl http://${host_ip}:9000/v1/chat/completions\ -X POST \ -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \ -H 'Content-Type: application/json'

I think the llm-tgi-server service is not able to connect to the tgi-service due to this hostname issue

Logs


`llm-tgi-server             | The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
llm-tgi-server             | Token is valid (permission: read).
llm-tgi-server             | Your token has been saved to /home/user/.cache/huggingface/token
llm-tgi-server             | Login successful
llm-tgi-server             | INFO:     <IP>:37050 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
llm-tgi-server             | ERROR:    Exception in ASGI application
llm-tgi-server             | Traceback (most recent call last):
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
llm-tgi-server             |     response.raise_for_status()
llm-tgi-server             |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
llm-tgi-server             |     raise HTTPError(http_error_msg, response=self)
llm-tgi-server             | requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: http://<IP>:9009/
llm-tgi-server             | 
llm-tgi-server             | The above exception was the direct cause of the following exception:
llm-tgi-server             | 
llm-tgi-server             | Traceback (most recent call last):
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/inference/_client.py", line 273, in post
llm-tgi-server             |     hf_raise_for_status(response)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status
llm-tgi-server             |     raise HfHubHTTPError(str(e), response=response) from e
llm-tgi-server             | huggingface_hub.utils._errors.HfHubHTTPError: 503 Server Error: Service Unavailable for url: http://<IP>:9009/
llm-tgi-server             | 
llm-tgi-server             | The above exception was the direct cause of the following exception:
llm-tgi-server             | 
llm-tgi-server             | Traceback (most recent call last):
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
llm-tgi-server             |     result = await app(  # type: ignore[func-returns-value]
llm-tgi-server             |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
llm-tgi-server             |     return await self.app(scope, receive, send)
llm-tgi-server             |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
llm-tgi-server             |     await super().__call__(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
llm-tgi-server             |     await self.middleware_stack(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
llm-tgi-server             |     raise exc
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
llm-tgi-server             |     await self.app(scope, receive, _send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
llm-tgi-server             |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
llm-tgi-server             |     raise exc
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
llm-tgi-server             |     await app(scope, receive, sender)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
llm-tgi-server             |     await self.middleware_stack(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
llm-tgi-server             |     await route.handle(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
llm-tgi-server             |     await self.app(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
llm-tgi-server             |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
llm-tgi-server             |     raise exc
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
llm-tgi-server             |     await app(scope, receive, sender)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
llm-tgi-server             |     response = await func(request)
llm-tgi-server             |                ^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
llm-tgi-server             |     raw_response = await run_endpoint_function(
llm-tgi-server             |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
llm-tgi-server             |     return await run_in_threadpool(dependant.call, **values)
llm-tgi-server             |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
llm-tgi-server             |     return await anyio.to_thread.run_sync(func, *args)
llm-tgi-server             |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
llm-tgi-server             |     return await get_async_backend().run_sync_in_worker_thread(
llm-tgi-server             |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
llm-tgi-server             |     return await future
llm-tgi-server             |            ^^^^^^^^^^^^
llm-tgi-server             |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
llm-tgi-server             |     result = context.run(func, *args)
llm-tgi-server             |              ^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/comps/llms/langchain/llm_tgi.py", line 73, in llm_generate
llm-tgi-server             |     response = llm.invoke(input.query)
llm-tgi-server             |                ^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 276, in invoke
llm-tgi-server             |     self.generate_prompt(
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
llm-tgi-server             |     return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
llm-tgi-server             |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 803, in generate
llm-tgi-server             |     output = self._generate_helper(
llm-tgi-server             |              ^^^^^^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
llm-tgi-server             |     raise e
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
llm-tgi-server             |     self._generate(
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 1317, in _generate
llm-tgi-server             |     self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/langchain_community/llms/huggingface_endpoint.py", line 256, in _call
llm-tgi-server             |     response = self.client.post(
llm-tgi-server             |                ^^^^^^^^^^^^^^^^^
llm-tgi-server             |   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/inference/_client.py", line 283, in post
llm-tgi-server             |     raise InferenceTimeoutError(
llm-tgi-server             | huggingface_hub.errors.InferenceTimeoutError: Model not loaded on the server: http://<IP>:9009. Please retry with a higher timeout (current: 120).`

Can someone please help in resolving this issue ?

Windows Desktop App for AIPC

FAQGen

Add example FAQGen

ChatQnA: Curl command not working for TEI Embedding Service

Working on a Gaudi 2 system, all of the containers start using the docker compose command. However, when I get to the step to test using curl commands, I get the following error:

curl 10.20.4.102:8090/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'
curl: (7) Failed to connect to 10.20.4.102 port 8090 after 0 ms: Connection refused

If I look at the logs for the container, I get the following:

2024-05-24T19:19:55.682173Z  INFO text_embeddings_router: router/src/main.rs:147: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, hf_api_token: None, hostname: "0cdcac35b464", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None, python_min_padding: None }
2024-05-24T19:19:55.682523Z  INFO hf_hub: /usr/local/cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-05-24T19:19:55.819793Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Starting download

The firewall is disabled on this server.

ChatQnA Megaservice container build issue

Building the megaservice container produces the following error:

Step 6/10 : COPY ../chatqna.py /home/user/chatqna.py
COPY failed: forbidden path outside the build context: ../chatqna.py ()

Incorrect folder reference in the ChatQnA Readme for docker compose command for Gaudi

In the Readme for the ChatQnA Docker Gaudi example, one of the folder structure is not correct based on the current state of the folders as of 30-May-2024 1:38 pm PST.

Start all the services Docker Containers step, it says:
cd GenAIExamples/ChatQnA/docker-composer/gaudi/
docker compose -f docker_compose.yaml up -d

The directory is wrong. It should be:
cd GenAIExamples/ChatQnA/docker/gaudi/
docker compose -f docker_compose.yaml up -d

Fix code scanning alert - Regular expression injection

Tracking issue for:

https://github.com/opea-project/GenAIExamples/security/code-scanning/16

Empty/missing Kubernetes securityContexts

I would expect seeing pod container securityContexts like this:

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
     drop: [ "ALL" ]

And runAsUser setting for something else than the default root [1].

However, securityContexts in this project are either not set, or empty:

$ git grep securityContext
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: null
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: null
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: null
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: null

For more info, see:

[1] At least for Xeon. Device access (e.g. for Gaudi) may require root user if container runtime is not properly configured: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/

Incorrect number of docker images in the Readme for ChatQnA example for Gaudi

In the instructions at https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker-composer/gaudi, it says

Then run the command docker images, you will have the following 7 Docker Images:

It should say instead:
Then run the command docker images, you will have the following 8 Docker Images:

opea/gen-ai-comps:embedding-tei-server
opea/gen-ai-comps:retriever-redis-server
opea/gen-ai-comps:reranking-tei-server
opea/gen-ai-comps:llm-tgi-gaudi-server
opea/tei-gaudi
opea/gen-ai-comps:dataprep-redis-server
opea/gen-ai-comps:chatqna-megaservice-server
opea/gen-ai-comps:chatqna-ui-server

ChatQnA v0.6 failed to work due to ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 failed to start

Dear experts:
I tried to follow https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/xeon/README.md to run ChatQnA at Xeon.

✔ Network xeon_default Created 0.1s
✔ Container tei-embedding-server Started 0.4s
✔ Container tgi-service Started 0.4s
✔ Container tei-reranking-server Started 0.4s
✔ Container redis-vector-db Started 0.4s
✔ Container embedding-tei-server Started 1.0s
✔ Container dataprep-redis-server Started 1.0s
✔ Container retriever-redis-server Started 1.0s
✔ Container reranking-tei-xeon-server Started 1.0s
✔ Container llm-tgi-server Started 1.0s
✔ Container chatqna-xeon-backend-server Started 1.3s
✔ Container chatqna-xeon-ui-server Started 1.6s

[root@localhost xeon]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
396862d1b421 opea/chatqna-ui:latest "docker-entrypoint.s…" 5 seconds ago Up 2 seconds 0.0.0.0:5173 ->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
b9c5d115785b opea/chatqna:latest "python chatqna.py" 5 seconds ago Up 3 seconds 0.0.0.0:8888 ->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
5833f6a7a3ad opea/llm-tgi:latest "python llm.py" 5 seconds ago Up 3 seconds 0.0.0.0:9000 ->9000/tcp, :::9000->9000/tcp llm-tgi-server
3fa23c7c29d1 opea/reranking-tei:latest "python reranking_te…" 5 seconds ago Up 3 seconds 0.0.0.0:8000 ->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
528e4776d952 opea/retriever-redis:latest "/home/user/comps/re…" 5 seconds ago Up 3 seconds 0.0.0.0:7000 ->7000/tcp, :::7000->7000/tcp retriever-redis-server
8f802c803754 opea/dataprep-redis:latest "python prepare_doc_…" 5 seconds ago Up 3 seconds 0.0.0.0:6007 ->6007/tcp, :::6007->6007/tcp dataprep-redis-server
7318f543b581 opea/embedding-tei:latest "python embedding_te…" 5 seconds ago Up 3 seconds 0.0.0.0:6000 ->6000/tcp, :::6000->6000/tcp embedding-tei-server
57593b53e762 ghcr.io/huggingface/text-generation-inference:1.4 "text-generation-lau…" 5 seconds ago Up 4 seconds 0.0.0.0:9009 ->80/tcp, :::9009->80/tcp tgi-service
96d681918923 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 5 seconds ago Up 4 seconds 0.0.0.0:8808 ->80/tcp, :::8808->80/tcp tei-reranking-server
a3d3a8419a56 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 5 seconds ago Up 4 seconds 0.0.0.0:6379 ->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
5f843c7f3753 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 5 seconds ago Up 4 seconds 0.0.0.0:6006 ->80/tcp, :::6006->80/tcp tei-embedding-server

but huggingface containers exit soon.

docker logs 57593b53e762
2024-06-05T09:16:09.130452Z INFO text_generation_launcher: Args { model_id: "Intel/neural-chat-7 b-v3-3", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, s peculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, w aiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiti ng_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "57593b53e762", port: 8 0, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, h uggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpo int: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngr ok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false , env: false }
2024-06-05T09:16:09.130585Z INFO download: text_generation_launcher: Starting download process.
Error: DownloadError
2024-06-05T09:16:21.747568Z ERROR download: text_generation_launcher: Download encountered an err or:
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-sig ned certificate in certificate chain (_ssl.c:1007)

Is it related to hf token setting?
Any tips to set hf token? I got the hf token from my windows machine by accessing hf web page, how to connect the token to my identity when I work in a linux machine. Thanks!

End-to-end RAG example using OPEA on Azure

Deploy RAG onAzure

opea-project / genaiexamples Goto Github PK

genaiexamples's Introduction

Generative AI Examples

Introduction

Architecture

Getting Started

Deployment

Support Examples

Additional Content

genaiexamples's People

Contributors

Stargazers

Watchers

Forkers

genaiexamples's Issues

How to reproduce

redis-vector-db

tei_embedding_service

embedding

retriever

tei_xeon_service

reranking

tgi_service

llm

chaqna-xeon-backend-server

chaqna-xeon-ui-server

Recommend Projects

Recommend Topics

Recommend Org