Managed Vector Search using Vespa Cloud

There is a growing interest in AI-powered vector representations of unstructured multimodal data and searching efficiently over these representations. This repository describes how your organization can unlock the full potential of multimodal AI-powered vector representations using Vespa Cloud -- the industry-leading managed Vector Search Service.

Create your tenant in the Vespa Cloud

If you don't already have a Vespa Cloud tenant, create one at console.vespa-cloud.com. Onboarding the Vespa Cloud requires a Google or GitHub account. Onboarding Vespa Cloud will start your free trial period, no credit card required.

Clone this repo

git clone --depth 1 https://github.com/vespa-cloud/vector-search.git && cd vector-search

Install Vespa-CLI

Install the Vespa-CLI which is the official command-line client for interacting with Vespa. Vespa-CLI works with both Vespa Cloud and self-serve on-premise Vespa deployments.

brew install vespa-cli

You can also download Vespa CLI binaries for Windows, Linux and macOS.

Configure Vespa-CLI

Replace <tenant-name> with your Vespa Cloud tenant name. In this case, the application name used is vector-search and instance is default:

vespa config set target cloud && \
 vespa config set --local application <tenant-name>.vector-search.default

Security

Authorize access to the Vespa Cloud control plane:

vespa auth login

Create a self-signed certificate for data plane (read and write) endpoint access:

vespa auth cert

Read more about how Vespa Cloud keeps your data safe and private at rest and in transit in the Vespa Cloud Security Guide.

Configure Vector Schema

Now the app is ready to be deployed. The vector schema is configured with 768 dimensions using float precision.

The vector schema could be changed before deploying to match your vector data:

Change vector dimensionality (Default 768)
Change vector precision type (Default float) - Choose between int8, bfloat16 or float.
Change distance-metric (Default angular useful for models trained with cosine similarity) - Also supported euclidean, innerproduct and hamming.

Note that this sample application ships with CI/CD tests for production deployment that uses 768 dimensions. Changing the schema requires changes of the CI/CD tests.

Deploy to dev environment

Vespa Cloud supports multiple different environments. The following guides you through:

Deploying to dev for developing and testing of your vector search use case
Deploying to perf for performance validation and benchmarking
Deploying to prod for high availability production serving

Vespa Cloud dev zone is where development happens, resources are downscaled to nodes with 2 v-cpu, 8GB of RAM and 50 GB of disk. A single content node dev deployment can index about 1M 768 dimensional vectors.

Deploy app to dev:

vespa deploy

The very first deployment to dev environment takes about 12 minutes for provisioning resources and configuring certificates. Later deployments takes less than a minute.

Deploy to perf environment

The perf zone is used for benchmarking and performance testing. It uses the same resource specification as in production, except for redundancy.

Deploy app to perf by using the --zone parameter:

vespa deploy --zone perf.aws-us-east-1c

Deploy to production environment

This submits the application to production via automated deployment pipeline which executes:

System test tests/system-test/feed-and-search-test.json
Staging setup test tests/staging-setup/staging-feed-before-upgrade.json
Staging test tests/staging-test/staging-after-upgrade.json

The above tests also demonstrates Vespa vector search query and feed usage.

Deploying to production require choosing which production region the app should be deployed to. The deployment.xml in this sample app uses aws-us-east-1c.

For high availability and low network latency, consider using multiple regions. Vespa Cloud supports global query traffic routing so that query requests are served by the region which is closest to the client. See deployment.xml global endpoints.

Currently available Vespa Cloud production zones is listed in zones. Request for new regions can be made by sending an email to [email protected].

The following deploys the application to the production regions specified in deployment.xml:

vespa prod submit

We recommend deploying using CI/CD, for example deploying to Vespa Cloud using GitHub Actions.

Vespa Cloud - Vector Search Price Examples

Vespa Cloud pricing is simple and transparent. All customers receive all features and services, and is charged a fee proportional to the resources the application uses.

The production env configuration in services.xml specifies the following resources:

<nodes deploy:environment="prod" count="2" groups="2">
      <resources memory="32GB" vcpu="8" disk="300GB" storage-type="local" />
</nodes>

Above specifies a redundant high availability deployment using grouped data distribution with one node per group and 2 groups for redundancy.

Vectors	Dimensionality	Precision Type	Queries per second	Writes per second	Estimated cost per hour ($)
5M	768	float	2000	1000	$ 3.36
5M	768	float	6000	1000	$ 10.08
10M	384	float	2000	1000	$ 3.36
20M	384	bfloat16	1500	750	$ 3.36

Lower number of vector dimensions and lower precision type (e.g, bfloat16 instead of float), increases number of vectors which can be indexed per node (memory resource limits). Supported queries per second and writes per second depends on vector search parameters.

Vespa Cloud sizing experts can assist in finding the most cost efficient resource specification matching your vector search use case. Sizing and cost estimation uses samples of your data in the perf environment.

Vespa Cloud also supports auto-scaling which lowers the cost of deployment as resources can be scaled with query volume changes throughout the week.

Using Vespa Vector Search

Documentation resources:

Blog posts:

Use Cases using Vespa Vector Search

bjormel / vector-search Goto Github PK

vector-search's Introduction

Managed Vector Search using Vespa Cloud

Create your tenant in the Vespa Cloud

Clone this repo

Install Vespa-CLI

Configure Vespa-CLI

Security

Configure Vector Schema

Deploy to dev environment

Deploy to perf environment

Deploy to production environment

Vespa Cloud - Vector Search Price Examples

Using Vespa Vector Search

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent