Giter Site home page Giter Site logo

shaunstanislauslau / locustdb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cswinter/locustdb

0.0 1.0 0.0 1.62 MB

Massively parallel, high performance analytics database that will rapidly devour all of your data.

License: Other

Rust 96.64% TypeScript 1.84% CSS 0.51% HTML 0.05% JavaScript 0.96%

locustdb's Introduction

LocustDB Build Status Join the chat at https://gitter.im/LocustDB/Lobby

An experimental analytics database aiming to set a new standard for query performance on commodity hardware. See How to Analyze Billions of Records per Second on a Single Desktop PC for an overview of current capabilities.

How to use

  1. Install Rust
  2. Clone the repository
git clone https://github.com/cswinter/LocustDB.git
cd LocustDB
  1. Run the repl!
RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo +nightly run --release --bin repl -- test_data/nyc-taxi.csv.gz

Instead of test_data/nyc-taxi.csv.gz, you can also pass a path to any other .csv or gzipped .csv.gz file. The first line of the file will need to contain the names for each column. The datatypes for each column will be derived automatically, but things might break for columns that contain a mixture of numbers/strings/empty entries.

You can pass the magic strings nyc100m or nyc to load the first 5 files (100m records) or full 1.46 billion taxi rides dataset which you will need to download first (for the full dataset, you will need about 120GB of disk space and 60GB of RAM).

Running tests or benchmarks

cargo +nightly test

RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo +nightly bench

Goals

A vision for LocustDB.

Fast

Query performance for analytics workloads is best-in-class on commodity hardware, both for data cached in memory and for data read from disk.

Cost-efficient

LocustDB automatically achieves spectacular compression ratios, has minimal indexing overhead, and requires less machines to store the same amount of data than any other system. The trade-off between performance and storage efficiency is configurable.

Low latency

New data is available for queries within seconds.

Scalable

LocustDB scales seamlessly from a single machine to large clusters.

Flexible and easy to use

LocustDB should be usable with minimal configuration or schema-setup as:

  • a highly available distributed analytics system continuously ingesting data and executing queries
  • a commandline tool/repl for loading and analysing data from CSV files
  • an embedded database/query engine included in other Rust programs via cargo

Non-goals

Until LocustDB is production ready these are distractions at best, if not wholly incompatible with the main goals.

Strong consistency and durability guarantees

  • small amounts of data may be lost during ingestion
  • when a node is unavailable, queries may return incomplete results
  • results returned by queries may not represent a consistent snapshot

High QPS

LocustDB does not efficiently execute queries inserting or operating on small amounts of data.

Full SQL support

  • All data is append only and can only be deleted/expired in bulk.
  • LocustDB does not support queries that cannot be evaluated independently by each node (large joins, complex subqueries, precise set sizes, precise top n).

Support for cost-inefficient or specialised hardware

LocustDB does not run on GPUs.

locustdb's People

Contributors

cswinter avatar dbxnicolas avatar ddfisher avatar gitter-badger avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.