Giter Site home page Giter Site logo

diennea / herddb Goto Github PK

View Code? Open in Web Editor NEW
307.0 307.0 45.0 10.22 MB

A JVM-embeddable Distributed Database

Home Page: https://herddb.org

License: Apache License 2.0

Java 96.21% Shell 0.68% CSS 2.09% HTML 0.48% JavaScript 0.48% TSQL 0.02% Batchfile 0.05%
bookkeeper calcite database distributed distributed-database embeddable embeddable-dbms java replication sql zookeeper

herddb's People

Contributors

aluccaroni avatar amitvc avatar caiok avatar caliuf avatar dependabot[bot] avatar diegosalvi avatar dmercuriali avatar eolivelli avatar hamadodene avatar mino181295 avatar nicoloboschi avatar pv3nturi avatar rmannibucau avatar timtebeek avatar tisonkun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

herddb's Issues

Range scans on PK doesn't take advantage of PK order

Queries like
SELECT * FROM mytable WHERE pkfield >= v ORDER BY pkfield LIMIT n
ideally should scan just n values.

Currently every value after v is retrieved, filtered and sorted before limiting the output.

Knownledge of ordering by pk should be push down in some way to data scanner with result limit knowledge too.

There should be some way to improve inverted pk order scanning too (ex: pk defined as ASC and ordering by DESC)

IndexOutOfBoundsException on prepared statement with wrong arguments count

When executing a prepared statement with a wrong number of arguments it fail with a IndexOutOfBoundsException during statement execution (TableManager)

It should fail fast during parsing
java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4454) at herddb.sql.expressions.JdbcParameterExpression.evaluate(JdbcParameterExpression.java:36) at herddb.sql.SQLRecordKeyFunction.computeNewValue(SQLRecordKeyFunction.java:101) at herddb.core.TableManager.executeInsert(TableManager.java:753) at herddb.core.TableManager.executeStatement(TableManager.java:487) at herddb.core.TableSpaceManager.executeStatement(TableSpaceManager.java:1026) at herddb.core.DBManager.executeStatement(DBManager.java:577) at herddb.core.DBManager.executePlan(DBManager.java:644) at herddb.server.ServerSideConnectionPeer.handleExecuteStatement(ServerSideConnectionPeer.java:563) at herddb.server.ServerSideConnectionPeer.messageReceived(ServerSideConnectionPeer.java:129) at herddb.network.netty.NettyChannel.lambda$messageReceived$0(NettyChannel.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Convert camel case properties

Some configuration properties use dot case (dot.case) others camel case (camelCase), convert any camel case property into dot case.

Indexes and tables chekpoints aren't aligned

Secondary indexes can be checkpoint on a different log position of owning table but they will receive records based on owning table starting position.

Indexes must have the same checkpoint log position:

  1. checkpoint table
  2. checkpoint primary key
  3. checkpoint secondary keys
  4. if all successful write the new TableStatus

On restore the IndexStatus relative to table checkpoint log position must be used instead of latest one

Run checkpoints of different table at different time

This issue is linked to #67

To minimize the stress on the storage subsystem we should try to run the checkpoints of different table at slightly different times.

Something like: next.table.checkpoint.period = server.checkpoint.period +- 20% random of server.checkpoint.period

Add live object size evaluation for pages

Currently object size for page sizing is evaluate statically at compile time. Object size isn't really static and can change between JVMs.

Add JOL lib and use it to populate sizing constant at startup

Memory-based page flush strategy

To lower the number of write to disk and to ensure that the in memory storage doesn't cause memory problems to the JVM we need to base the page flush strategy on the actual memory available to the JVM.

I propose we follow a strategy similar to hbase, so:

  • If table inMem store > server.memstore.table.limit (Default 300 MB)
    -> flush one page to disk
  • If total inMem store > server.memstore.globallimit.hi (Default 40% heap)
    -> flush one page to disk starting from the biggest inMem store
    -> repeat until total inMem store < server.memstore.globallimit.low (35% heap)

During periodic checkpoint we need to continue to flush ALL the pages to ensure consistency and recovery speed

variable in service.conf:
server.memstore.globallimit.hi=40%
server.memstore.globallimit.low=35%
server.memstore.table.limit=300

Ability to import MySQL dumps

We would like to import "simple" MySQL dumps (which are essentially an sql script) in order to have the ability to compare the usage of resources of MySQL vs Herd on the same "logical" dataset

The procedure must have a configuration switch to "map" tables to "tablespaces" as in HerdDB we have multiplace indepentent tablespaces

Reduce checkpoint memory usage and time

Checkpoint now scans existing keys, use directly dirtypages ids to just load a page at one, handle it fully and proceeding to the next. This will limit the time needed to scan keys and memory space to store temporary records (just one page at time!)

java.lang.IllegalStateException: page not loaded 6 while updating record

eb 07, 2017 5:18:17 PM herddb.network.netty.NettyChannel lambda$messageReceived$0
SEVERE: NettyChannel{name=unnamed, id=159, socket=[id: 0x8b963173, L:/127.0.0.1:7000 - R:/127.0.0.1:40900] pending 0 msgs}: error java.lang.IllegalStateException: page not loaded 6 while updating record 7573657234363735353536323135393235313734373636
java.lang.IllegalStateException: page not loaded 6 while updating record 7573657234363735353536323135393235313734373636
at herddb.core.TableManager.applyUpdate(TableManager.java:746)
at herddb.core.TableManager.apply(TableManager.java:671)
at herddb.core.TableManager$6.accept(TableManager.java:480)
at herddb.core.TableManager.lambda$accessTableData$3(TableManager.java:1265)
at herddb.utils.BatchOrderedExecutor.finish(BatchOrderedExecutor.java:76)
at herddb.core.TableManager.accessTableData(TableManager.java:1291)
at herddb.core.TableManager.executeUpdate(TableManager.java:474)
at herddb.core.TableManager.executeStatement(TableManager.java:307)
at herddb.core.TableSpaceManager.executeStatement(TableSpaceManager.java:945)
at herddb.core.DBManager.executeStatement(DBManager.java:446)
at herddb.core.DBManager.executePlan(DBManager.java:504)
at herddb.server.ServerSideConnectionPeer.handleExecuteStatement(ServerSideConnectionPeer.java:437)
at herddb.server.ServerSideConnectionPeer.messageReceived(ServerSideConnectionPeer.java:121)
at herddb.network.netty.NettyChannel.lambda$messageReceived$0(NettyChannel.java:82)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Implement a PrimaryIndexRangeScan

Using a BTree for the KeyToPage gives the ability to exploit the sorted nature of the tree and achive better performance for range scan on the primary key

Fast checkpoint for tables

Actually every checkpoint fully rebuild dirty pages. It is a big and heavy work. We can afford to have pages with old stale data in it knowing from primary key index where the real record is.

Create two checkpoint procedures:
a) fast checkpoint that will flush new and dirty record to new pages but will keep old records versions on original pages
b) slow checkpoint that ensure that every page contains only real data and is as full as possible. It must compact records recovering the work that fast checkpoint didn't do

CLI: "not such tableSpace Exception" on startup

Handle the startup of the server returning better SQLExceptions, for example "server is starting up/shutting down".

"java.sql.SQLException: herddb.model.StatementExecutionException: not such tableSpace default here"

Closing ledger after failure on AddEntry may lead to data loss

While debugging the flaky test testBookieNotAvailableDuringTransaction I noticed that we are calling LedgerHandle#close in case of error during asyncAddEntry.

The "close" method will "close" ledger metadata and sometimes "chops" the ledger so that some entry which is before the LastAddConfirmed/LastAddPushed

This is the error which causes the closing of the ledger:
Apr 04, 2017 9:04:50 AM org.apache.bookkeeper.client.PendingAddOp submitCallback
SEVERE: Write of ledger entry to quorum failed: L5 E3
Apr 04, 2017 9:04:50 AM org.apache.bookkeeper.client.PendingAddOp submitCallback
SEVERE: Write of ledger entry to quorum failed: L5 E4
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog handleBookKeeperAsyncFailure
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog handleBookKeeperAsyncFailure
SEVERE: bookkeeper async failure on tablespace c378614f1a774a0cb49f3a2901d3c762 while writing entry LogEntry{type=5, tableSpace=default, transactionId=1, tableName=null, key=null, value=null, timestamp=1491289490484}
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException
at org.apache.bookkeeper.client.BKException.create(BKException.java:58)
at herddb.cluster.BookkeeperCommitLog$CommitFileWriter.lambda$writeEntry$66(BookkeeperCommitLog.java:113)
at herddb.cluster.BookkeeperCommitLog$CommitFileWriter$$Lambda$21/375466577.addComplete(Unknown Source)
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:244)
at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:937)
at org.apache.bookkeeper.client.LedgerHandle$2.safeRun(LedgerHandle.java:337)
at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

On "failed" runs we have:

Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog$CommitFileWriter close
SEVERE: Closing ledger 5, with LastAddConfirmed=0, LastAddPushed=0 length=59

On "good" runs we have:

Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog$CommitFileWriter close
SEVERE: Closing ledger 5, with LastAddConfirmed=3, LastAddPushed=3 length=170

The partial fix is not to close the ledger in case of AddEntry failures, I will ask on BookKeeper mailing list for a better solution

Spamming BlockRangeIndex boot logs

When BlockRangeIndex boots it spams severe logs with
May 03, 2017 4:22:34 PM herddb.index.brin.BlockRangeIndex boot SEVERE: boot block at BlockStartKey{0446726f6d17456d61696c53756363657373416c65727453797374656d,1369} 0446726f6d17456d61696c53756363657373416c65727453797374656d - 0446726f6d17456d61696c53756363657373416c65727453797374656d

Store index pages knowledge on index metadata

Actually at startup we need to fully scan index pages to locate right metatada page to build the index. In addition to active pages index metadata should store index metadata pages too to directly access them

Apply sort + limit during table scan

It would be better to apply the 'limit' clause during the scan even in case or sorted result set.
In case of little "limit" clauses on huge tables it will be a great enhancement

Reduce allocations of String[]

While sampling memory allocations during YCSB bench (workloade, mostly "scans") it appears that we are creating a lot of String[] to store temporary schema of tuples/resultsets
surely we can skip this an reduce the impact on memory and so on GC

Ensure versioned files

Ensure that every data file written to disk have a version followed by available flags.

Version must be a VLong with value 1
Flags must be a VLong with value 0 (actually unused but here to be available in future releases).

Run checkpoint independently per table

Right now we have only 1 thread that run the checkpoint for all the tables. To maximize performance and lower the time that the tables are freezed i think we could evaluate to run the checkpoint of different table independently.

Add more "compiled sql expressions"

It would be great to implement as CompiledSQLExpression all of the expressions which are actually evaluated using InterpretedSQLExpression (and then drop InterpretedSQLExpression forever)

Range scans on Indexes doesn't take advantage of index order

Queries like
SELECT * FROM mytable WHERE indexedfield >= v ORDER BY indexedfield LIMIT n
ideally should scan just n values.

Currently every value after v is retrieved, filtered and sorted before limiting the output.

Knownledge of ordering by index should be push down in some way to data scanner with result limit knowledge too.

There should be some way to improve inverted index order scanning too (ex: index defined as ASC and ordering by DESC)

This issue is similar to #105 but on secondary indexes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.