Docs for setting up a demo cluster can be found
Caikit-tgis-serving is a combined image that allows users to perform LLM inference
The architecture is shown here:
There are several components: TGIS: Serving backend, loads the models and provides the inference engine Caikit: Wrapper layer, handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types Caikit-nlp: Caikit module that handles NLP style models KServe: Orchestrates model serving for all types of models, servingruntimes implement loading given types of model servers. KServe handles the lifecycle of the deployment object, storage access, networking setup, etc Service Mesh (istio): Service mesh networking layer, manages traffic flows, enforces access policies, etc Serverless (knative): Allows for serverless deployments of models