Giter Site home page Giter Site logo

vector-search's Introduction

Vespa Cloud logo

Managed Vector Search using Vespa Cloud

There is a growing interest in AI-powered vector representations of unstructured multimodal data and searching efficiently over these representations. This repository describes how your organization can unlock the full potential of multimodal AI-powered vector representations using Vespa Cloud -- the industry-leading managed Vector Search Service.

Create your tenant in the Vespa Cloud

If you don't already have a Vespa Cloud tenant, create one at console.vespa-cloud.com. Onboarding the Vespa Cloud requires a Google or GitHub account. Onboarding Vespa Cloud will start your free trial period, no credit card required.

Clone this repo

git clone --depth 1 https://github.com/vespa-cloud/vector-search.git && cd vector-search

Install Vespa-CLI

Install the Vespa-CLI which is the official command-line client for interacting with Vespa. Vespa-CLI works with both Vespa Cloud and self-serve on-premise Vespa deployments.

brew install vespa-cli 

You can also download Vespa CLI binaries for Windows, Linux and macOS.

Configure Vespa-CLI

Replace <tenant-name> with your Vespa Cloud tenant name. In this case, the application name used is vector-search and instance is default:

vespa config set target cloud && \
 vespa config set --local application <tenant-name>.vector-search.default

Security

Authorize access to the Vespa Cloud control plane:

vespa auth login

Create a self-signed certificate for data plane (read and write) endpoint access:

vespa auth cert

Read more about how Vespa Cloud keeps your data safe and private at rest and in transit in the Vespa Cloud Security Guide.

Configure Vector Schema

Now the app is ready to be deployed. The vector schema is configured with 768 dimensions using float precision.

The vector schema could be changed before deploying to match your vector data:

  • Change vector dimensionality (Default 768)
  • Change vector precision type (Default float) - Choose between int8, bfloat16 or float.
  • Change distance-metric (Default angular useful for models trained with cosine similarity) - Also supported euclidean, innerproduct and hamming.

Note that this sample application ships with CI/CD tests for production deployment that uses 768 dimensions. Changing the schema requires changes of the CI/CD tests.

Deploy to dev environment

Vespa Cloud supports multiple different environments. The following guides you through:

  • Deploying to dev for developing and testing of your vector search use case
  • Deploying to perf for performance validation and benchmarking
  • Deploying to prod for high availability production serving

Vespa Cloud dev zone is where development happens, resources are downscaled to nodes with 2 v-cpu, 8GB of RAM and 50 GB of disk. A single content node dev deployment can index about 1M 768 dimensional vectors.

Deploy app to dev:

vespa deploy  

The very first deployment to dev environment takes about 12 minutes for provisioning resources and configuring certificates. Later deployments takes less than a minute.

Deploy to perf environment

The perf zone is used for benchmarking and performance testing. It uses the same resource specification as in production, except for redundancy.

Deploy app to perf by using the --zone parameter:

vespa deploy --zone perf.aws-us-east-1c

Deploy to production environment

This submits the application to production via automated deployment pipeline which executes:

The above tests also demonstrates Vespa vector search query and feed usage.

Deploying to production require choosing which production region the app should be deployed to. The deployment.xml in this sample app uses aws-us-east-1c.

For high availability and low network latency, consider using multiple regions. Vespa Cloud supports global query traffic routing so that query requests are served by the region which is closest to the client. See deployment.xml global endpoints.

Currently available Vespa Cloud production zones is listed in zones. Request for new regions can be made by sending an email to [email protected].

The following deploys the application to the production regions specified in deployment.xml:

vespa prod submit 

We recommend deploying using CI/CD, for example deploying to Vespa Cloud using GitHub Actions.

Vespa Cloud - Vector Search Price Examples

Vespa Cloud pricing is simple and transparent. All customers receive all features and services, and is charged a fee proportional to the resources the application uses.

The production env configuration in services.xml specifies the following resources:

<nodes deploy:environment="prod" count="2" groups="2">
      <resources memory="32GB" vcpu="8" disk="300GB" storage-type="local" />
</nodes>

Above specifies a redundant high availability deployment using grouped data distribution with one node per group and 2 groups for redundancy.

Vectors Dimensionality Precision Type Queries per second Writes per second Estimated cost per hour ($)
5M 768 float 2000 1000 $ 3.36
5M 768 float 6000 1000 $ 10.08
10M 384 float 2000 1000 $ 3.36
20M 384 bfloat16 1500 750 $ 3.36

Lower number of vector dimensions and lower precision type (e.g, bfloat16 instead of float), increases number of vectors which can be indexed per node (memory resource limits). Supported queries per second and writes per second depends on vector search parameters.

Vespa Cloud sizing experts can assist in finding the most cost efficient resource specification matching your vector search use case. Sizing and cost estimation uses samples of your data in the perf environment.

Vespa Cloud also supports auto-scaling which lowers the cost of deployment as resources can be scaled with query volume changes throughout the week.

Using Vespa Vector Search

Documentation resources:

Blog posts:

Use Cases using Vespa Vector Search

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.