Giter Site home page Giter Site logo

spiceai / spiceai Goto Github PK

View Code? Open in Web Editor NEW
1.5K 23.0 59.0 5 MB

A unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets from any database, data warehouse, or data lake.

Home Page: https://docs.spiceai.org

License: Apache License 2.0

Makefile 0.83% Dockerfile 0.14% Go 12.94% Shell 1.29% Rust 84.16% PowerShell 0.28% Python 0.36%
time-series artificial-intelligence developers machine-learning data sql infrastructure

spiceai's Introduction

Spice.ai OSS

CodeQL License: Apache-2.0 Discord Follow on X

What is Spice?

Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query datasets sourced from any database, data warehouse, or data lake.

πŸ“£ Read the Spice.ai OSS announcement blog post.

Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and machine learning (ML) in software.

The Spice runtime is written in Rust and leverages industry leading technologies like Apache DataFusion, Apache Arrow, Apache Arrow Flight, and DuckDB.

Spice.ai

Why Spice?

Spice makes querying data by SQL across one or more data sources simple and fast. Easily co-locate a managed working set of data with your application or ML, locally accelerated in-memory with Arrow, with SQLite/DuckDB, or with an attached database like PostgreSQL for high-performance, low-latency queries. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.

How is Spice different?

  1. Local Acceleration with both OLAP (Arrow/DuckDB) and OLTP (SQLite/PostgreSQL) databases at dataset granularity compared to other OLAP only or OLTP only systems.

  2. Separation of materialization and storage/compute compared with monolith data systems and data lakes. Keep compute colocated with source data while bringing a materialized working set next to your application, dashboard, or data/ML pipeline.

  3. Edge to cloud native. Chainable and designed to be deployed standalone, as a container sidecar, as a microservice, in a cluster across laptops, the Edge, On-Prem, to a POP, and to all public clouds.

Before Spice

Before Spice

With Spice

With Spice

Example Use-Cases

1. Faster applications and frontends. Accelerate and co-locate datasets with applications and frontends, to serve more concurrent queries and users with faster page loads and data updates. Try the CQRS sample app

2. Faster dashboards, analytics, and BI. Faster, more responsive dashboards without massive compute costs. Watch the Apache Superset demo

3. Faster data pipelines, machine learning training and inferencing. Co-locate datasets in pipelines where the data is needed to minimize data-movement and improve query performance. Predict hard drive failure with the SMART data demo

4. Easily query many data sources. Federated SQL query across databases, data warehouses, and data lakes using Data Connectors.

Watch a 30-sec BI dashboard acceleration demo

BI.dashboard.acceleration.with.Spice.mp4

Supported Data Connectors

Currently supported data connectors for upstream datasets. More coming soon.

Name Description Status Protocol/Format Refresh Modes
databricks Databricks Alpha Spark Connect
S3/Delta Lake
full
postgres PostgreSQL Alpha full
spiceai Spice.ai Alpha Arrow Flight append, full
s3 S3 Alpha Parquet full
dremio Dremio Alpha Arrow Flight full
mysql MySQL Alpha full
duckdb DuckDB Alpha full
clickhouse Clickhouse Alpha full
odbc ODBC Alpha ODBC full
spark Spark Alpha Spark Connect full
flightsql Apache Arrow Flight SQL Alpha Arrow Flight SQL full
snowflake Snowflake Alpha Arrow full
bigquery BigQuery Coming soon! Arrow Flight SQL full

Supported Data Stores/Accelerators

Currently supported data stores for local materialization/acceleration. More coming soon.

Name Description Status Engine Modes Refresh Modes
arrow In-Memory Arrow Records Alpha memory append, full
duckdb Embedded DuckDB Alpha memory, file append, full
sqlite Embedded SQLite Alpha memory, file append, full
postgres Attached PostgreSQL Alpha append, full

Intelligent Applications

Spice enables developers to build both data and AI-driven applications by co-locating data and ML models with applications. Read more about the vision to enable the development of intelligent AI-driven applications.

⚠️ DEVELOPER PREVIEW Spice is under active alpha stage development and is not intended to be used in production until its 1.0-stable release. If you are interested in running Spice in production, please get in touch so we can support you (See Connect with us below).

⚑️ Quickstart (Local Machine)

quickstart.mp4

Step 1. Install the Spice CLI:

On macOS, Linux, and WSL:

curl https://install.spiceai.org | /bin/bash

Or using brew:

brew install spiceai/spiceai/spice

On Windows:

curl -L "https://install.spiceai.org/Install.ps1" -o Install.ps1 && PowerShell -ExecutionPolicy Bypass -File ./Install.ps1

Step 2. Initialize a new Spice app with the spice init command:

spice init spice_qs

A spicepod.yaml file is created in the spice_qs directory. Change to that directory:

cd spice_qs

Step 3. Start the Spice runtime:

spice run

Example output will be shown as follows:

Spice.ai runtime starting...
Using latest 'local' runtime version.
2024-02-21T06:11:56.381793Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
2024-02-21T06:11:56.381853Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-02-21T06:11:56.382038Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052

The runtime is now started and ready for queries.

Step 4. In a new terminal window, add the spiceai/quickstart Spicepod. A Spicepod is a package of configuration defining datasets and ML models.

spice add spiceai/quickstart

The spicepod.yaml file will be updated with the spiceai/quickstart dependency.

version: v1beta1
kind: Spicepod
name: PROJECT_NAME
dependencies:
  - spiceai/quickstart

The spiceai/quickstart Spicepod will add a taxi_trips data table to the runtime which is now available to query by SQL.

2024-02-22T05:53:48.222952Z  INFO runtime: Loaded dataset taxi_trips
2024-02-22T05:53:48.223101Z  INFO runtime::dataconnector: Refreshing data for taxi_trips

Step 5. Start the Spice SQL REPL:

spice sql

The SQL REPL inferface will be shown:

Welcome to the Spice.ai SQL REPL! Type 'help' for help.

show tables; -- list available tables
sql>

Enter show tables; to display the available tables for query:

sql> show tables
+------------+
| table_name |
+------------+
| taxi_trips |
+------------+

Time: 0.007505084 seconds. 1 rows.

Enter a query to display the longest taxi trips:

sql> SELECT trip_distance, total_amount FROM taxi_trips ORDER BY trip_distance DESC LIMIT 10;

Output:

+---------------+--------------+
| trip_distance | total_amount |
+---------------+--------------+
| 312722.3      | 22.15        |
| 97793.92      | 36.31        |
| 82015.45      | 21.56        |
| 72975.97      | 20.04        |
| 71752.26      | 49.57        |
| 59282.45      | 33.52        |
| 59076.43      | 23.17        |
| 58298.51      | 18.63        |
| 51619.36      | 24.2         |
| 44018.64      | 52.43        |
+---------------+--------------+

Time: 0.002458976 seconds

βš™οΈ Runtime Container Deployment

Using the Docker image locally:

docker pull spiceai/spiceai

In a Dockerfile:

from spiceai/spiceai:latest

Using Helm:

helm repo add spiceai https://helm.spiceai.org
helm install spiceai spiceai/spiceai

🏎️ Next Steps

You can use any number of predefined datasets available from the Spice.ai Cloud Platform in the Spice runtime.

A list of publicly available datasets from Spice.ai can be found here: https://docs.spice.ai/building-blocks/datasets.

In order to access public datasets from Spice.ai, you will first need to create an account with Spice.ai by selecting the free tier membership.

Navigate to spice.ai and create a new account by clicking on Try for Free.

spiceai_try_for_free-1

After creating an account, you will need to create an app in order to create to an API key.

create_app-1

You will now be able to access datasets from Spice.ai. For this demonstration, we will be using the spice.ai/eth.recent_blocks dataset.

Step 1. Log in and authenticate from the command line using the spice login command. A pop up browser window will prompt you to authenticate:

spice login

Step 2. Initialize a new project and start the runtime:

# Initialize a new Spice app
spice init spice_app

# Change to app directory
cd spice_app

# Start the runtime
spice run

Step 3. Configure the dataset:

In a new terminal window, configure a new dataset using the spice dataset configure command:

spice dataset configure

You will be prompted to enter a name. Enter a name that represents the contents of the dataset

dataset name: (spice_app) eth_recent_blocks

Enter the description of the dataset:

description: eth recent logs

Enter the location of the dataset:

from: spice.ai/eth.recent_blocks

Select y when prompted whether to accelerate the data:

Locally accelerate (y/n)? y

You should see the following output from your runtime terminal:

2024-02-21T22:49:10.038461Z  INFO runtime: Loaded dataset eth_recent_blocks

Step 4. In a new terminal window, use the Spice SQL REPL to query the dataset

spice sql
SELECT number, size, gas_used from eth_recent_blocks LIMIT 10;

The output displays the results of the query along with the query execution time:

+----------+--------+----------+
| number   | size   | gas_used |
+----------+--------+----------+
| 19281345 | 400378 | 16150051 |
| 19281344 | 200501 | 16480224 |
| 19281343 | 97758  | 12605531 |
| 19281342 | 89629  | 12035385 |
| 19281341 | 133649 | 13335719 |
| 19281340 | 307584 | 18389159 |
| 19281339 | 89233  | 13391332 |
| 19281338 | 75250  | 12806684 |
| 19281337 | 100721 | 11823522 |
| 19281336 | 150137 | 13418403 |
+----------+--------+----------+

Time: 0.004057791 seconds

You can experiment with the time it takes to generate queries when using non-accelerated datasets. You can change the acceleration setting from true to false in the datasets.yaml file.

πŸ“„ Documentation

Comprehensive documentation is available at docs.spiceai.org.

πŸ”¨ Upcoming Features

πŸš€ See the Roadmap to v1.0-stable for upcoming features.

🀝 Connect with us

We greatly appreciate and value your support! You can help Spice in a number of ways:

⭐️ star this repo! Thank you for your support! πŸ™

spiceai's People

Contributors

adm28 avatar ahirner avatar cjrh avatar corentin-pro avatar dependabot[bot] avatar digadeesh avatar edmondop avatar ewgenius avatar github-actions[bot] avatar gloomweaver avatar haardvark avatar jeadie avatar jeremylimconsulting avatar lukekim avatar mach-kernel avatar mcilie avatar mitchdevenport avatar phillipleblanc avatar roee-87 avatar rthomas avatar sboorlagadda avatar sevenannn avatar sgrebnov avatar y-f-u avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spiceai's Issues

Handle output from AI Engine in a more structured way

Currently we are inspecting each log as the AI Engine returns it and deciding if we should colorize it or not. We should instead drive as much of the output from the runtime side as we can.

One approach would be to handle logging entirely from the runtime and if there are specific events we need to log from the AI Engine, send those events over an IPC process. With the current http stack, we could add either a specific "logging" endpoint or drive the logs based on certain events we know (i.e. the episode results or the episode progress updates)

Also, in order to handle output that we want to "replace" in a nice way (see https://www.loom.com/share/7c55ee271b5043f29a5e9d4765f335b8 for an example) we could create an output class that keeps track of all the output we want to emit and then smartly replace when it is necessary. A start at this implementation is here: https://gist.github.com/phillipleblanc/e1a6cf196af9242da9b67c7d8cbc993b

Inferencing before training throws

Repro steps

  • spice run with cartpole already added to pods
  • curl http://localhost:8000/api/v0.1/pods/cartpole-v1/inference

Expected

Inference result

Actual

500 with the following in the AI logs:

127.0.0.1 - - [13/Aug/2021 10:44:55] "GET /health HTTP/1.1" 200 -
127.0.0.1 - - [13/Aug/2021 10:44:55] "POST /pods/cartpole-v1/init HTTP/1.1" 200 -
127.0.0.1 - - [13/Aug/2021 10:44:55] "GET /health HTTP/1.1" 200 -
127.0.0.1 - - [13/Aug/2021 10:44:55] "POST /pods/cartpole-v1/data HTTP/1.1" 200 -
[2021-08-13 10:45:06,011] ERROR in app: Exception on /pods/cartpole-v1/models/latest/inference [GET]
Traceback (most recent call last):
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py", line 702, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3376, in get_loc
    raise KeyError(key)
KeyError: Timestamp('2021-08-13 16:54:40', freq='10S')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/flask/app.py", line 2070, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/flask/app.py", line 1515, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/flask/app.py", line 1513, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/flask/app.py", line 1499, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/lane/.spice/bin/ai/main.py", line 227, in inference
    latest_window = data_manager.get_latest_window()
  File "/home/lane/src/spice-pre/ai/src/data.py", line 146, in get_latest_window
    start_index = self.massive_table_filled.index.get_loc(
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py", line 704, in get_loc
    raise KeyError(orig_key) from err
KeyError: Timestamp('2021-08-13 16:54:40', freq='10S')
127.0.0.1 - - [13/Aug/2021 10:45:06] "GET /pods/cartpole-v1/models/latest/inference HTTP/1.1" 500 -
[2021-08-13 10:45:07,050] ERROR in app: Exception on /pods/cartpole-v1/models/latest/inference [GET]
Traceback (most recent call last):
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py", line 702, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "/home/lane/.spice/bin/ai/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3376, in get_loc
    raise KeyError(key)
KeyError: Timestamp('2021-08-13 16:54:40', freq='10S')

Explain CartPole and make the getting started guide easier to understand

As per feedback:

"Suggest briefly explaining what CartPole is when mentioned in the getting started guide.
Not having heard of it before, I wasn't sure what it was, perhaps something like:

"[...] based off an Open AI gym example called CartPole-v1,* [ a game in which a pole tries to balance itself on top of a sliding cart.]*"

v0.1-alpha.4 Endgame

  • Ensure all release-spice PRs are merged to spiceai
  • Ensure all outstanding spiceai feature PRs are merged
  • Full test pass and update if necessary over README.md (please get screenshots!)
  • Full test pass and update if necessary over Docs (please get screenshots!)
  • Full test pass and update if necessary over new Samples (please get screenshots/videos!)
  • Full test pass and update if necessary over new Quickstarts (please get screenshots/videos!)
  • Merge Docs PRs
  • Merge Registry PRs
  • Merge Spice Rack PRs
  • Merge Samples PRs
  • Merge Quickstarts PRs
  • Deploy Spice Rack
  • Merge version rev, docs and release notes PR: #89
  • Rev version for dev
  • Final test pass on released binaries
  • Discord announcement
  • Email announcement

PR reference:

Spice run broken for Docker

Docker is currently failing with the following:

lane@DESKTOP-2BSF9I6:~/src/spiceai/cmd/spice$ ./spice run
Spice.ai runtime starting...
found and using local dev image
Loading Spice runtime ...
2021/08/30 18:40:02 error starting /app/home/.spice/bin/ai/venv/bin/python3: fork/exec /app/home/.spice/bin/ai/venv/bin/python3: no such file or directory
{"level":"error","ts":1630348802.1431477,"caller":"aiengine/aiengine.go:166","msg":"AI Engine failed to run","stacktrace":"github.com/spiceai/spice/pkg/aiengine.StartServer\n\t/build/pkg/aiengine/aiengine.go:166\ngithub.com/spiceai/spice/pkg/runtime.Run\n\t/build/pkg/runtime/runtime.go:121\nmain.glob..func1\n\t/build/cmd/spiced/main.go:61\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:860\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/build/cmd/spiced/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}

In-place upgrade for CLI

Command: spice upgrade

This will self-upgrade the CLI to the latest version. Once the CLi is updated, when it's run, the CLI will automatically upgrade to the latest runtime version.

Would be a new CLI command which would manually call: context.InstallOrUpgradeRuntime() - See

InstallOrUpgradeRuntime() error

spice pod train can return 'Not found' error if pod not loaded in runtime

Hey,

Just encountered this error, I had ran spice run in one terminal, and went to spice train gardener in another, but got:

failed to start training: 404 Not Found

The source of the problem is:

fmt.Printf("failed to start training: %s", err.Error())

If the pod is not loaded by the runtime (which it wasn't because on WSL2 it seems I need to restart runtime after adding a pod via CLI) I got this 404 error which was a bit confusing.

Perhaps another error (pod xxx may not be added?) or something could be helpful. Cheers.

Handle concurrency around pods management properly

Currently the operations to add/update/remove a pod are not safe for concurrent use. We should put in safeguards to ensure that when a pod is added or removed we properly handle it - especially around training and data state management.

  • Check if training is started, and if so, stop/handle it and potentially put in (do not train more) mode.
  • Check if the datasources in the pod are fetching/subscribed to data, if so, stop, and potentially prevent more from being fetched.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.