diennea / herddb Goto Github PK

View Code? Open in Web Editor NEW

307.0 307.0 45.0 10.22 MB

A JVM-embeddable Distributed Database

Home Page: https://herddb.org

License: Apache License 2.0

Java 96.21% Shell 0.68% CSS 2.09% HTML 0.48% JavaScript 0.48% TSQL 0.02% Batchfile 0.05%

bookkeeper calcite database distributed distributed-database embeddable embeddable-dbms java replication sql zookeeper

herddb's People

Contributors

Stargazers

Watchers

Forkers

eolivelli dianacle nicoloboschi caliuf mino181295 dhilip89 vijaydairyf amitvc codelipenghui changyonggang fossabot rayokota pennyrosegg hamadodene dongbin86 ghatage zhoujie0101 junwen12221 aimutopia amaliujia aparneshgaurav rmannibucau aluccaroni utf7 yjmerik yuqi1129 zhanglei huashen mans2singh ilibx hya1109 tianshangjun apurbad dmercuriali cybernetics wiltonlazary mattisonchao ghostboyboy tisonkun randalf-sr misselvexu rohankumardubey zixi0825 zhj149

herddb's Issues

Range scans on PK doesn't take advantage of PK order

Queries like
SELECT * FROM mytable WHERE pkfield >= v ORDER BY pkfield LIMIT n
ideally should scan just n values.

Currently every value after v is retrieved, filtered and sorted before limiting the output.

Knownledge of ordering by pk should be push down in some way to data scanner with result limit knowledge too.

There should be some way to improve inverted pk order scanning too (ex: pk defined as ASC and ordering by DESC)

IndexOutOfBoundsException on prepared statement with wrong arguments count

When executing a prepared statement with a wrong number of arguments it fail with a IndexOutOfBoundsException during statement execution (TableManager)

It should fail fast during parsing
java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4454) at herddb.sql.expressions.JdbcParameterExpression.evaluate(JdbcParameterExpression.java:36) at herddb.sql.SQLRecordKeyFunction.computeNewValue(SQLRecordKeyFunction.java:101) at herddb.core.TableManager.executeInsert(TableManager.java:753) at herddb.core.TableManager.executeStatement(TableManager.java:487) at herddb.core.TableSpaceManager.executeStatement(TableSpaceManager.java:1026) at herddb.core.DBManager.executeStatement(DBManager.java:577) at herddb.core.DBManager.executePlan(DBManager.java:644) at herddb.server.ServerSideConnectionPeer.handleExecuteStatement(ServerSideConnectionPeer.java:563) at herddb.server.ServerSideConnectionPeer.messageReceived(ServerSideConnectionPeer.java:129) at herddb.network.netty.NettyChannel.lambda$messageReceived$0(NettyChannel.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Create scripts to run YCSB

As we are using the YCSB suite https://research.yahoo.com/news/yahoo-cloud-serving-benchmark/ it would be great to have a bash script to run YCBS workloads directly on the compiled artifacts

for standalone instances it would be very useful
parameters:

YCSB directory
HerdDB JDBC artifacts position
HerdDB Service ZIP position

Pages not known to replacement policy are removed too

TableManager during checkpoint and truncate try to remove from PageReplacementPolicy even the current working page unknown to such policy.

Implement SQL timestamps

Convert camel case properties

Some configuration properties use dot case (dot.case) others camel case (camelCase), convert any camel case property into dot case.

Upgrade to jSQLParser 0.9.7

Leverage BookKeeper asyncAddEntry for transactions

Add "service zookeeper start|stop" commands

The idea is to have a simple way to boot zookeeper, to have a simple way to play with "clustered herd" without the need to install ZooKeeper server

BookKeeper EnsemblePlacementPolicy to prefer at least one write to local node

It would be better to use always at least the "local" (embedded) Bookie in order to:

improve speed
have better guarantees of recovery in case of restart of the local node

Error on importing mysqldumps with timestamps

Add support for Double and Boolean datatypes

Currently these datatypes are only partially supported.

herdb-cli.sh - support history

Indexes and tables chekpoints aren't aligned

Secondary indexes can be checkpoint on a different log position of owning table but they will receive records based on owning table starting position.

Indexes must have the same checkpoint log position:

checkpoint table
checkpoint primary key
checkpoint secondary keys
if all successful write the new TableStatus

On restore the IndexStatus relative to table checkpoint log position must be used instead of latest one

use netty4 native epoll on linux by default

Run checkpoints of different table at different time

This issue is linked to #67

To minimize the stress on the storage subsystem we should try to run the checkpoints of different table at slightly different times.

Something like: next.table.checkpoint.period = server.checkpoint.period +- 20% random of server.checkpoint.period

Add live object size evaluation for pages

Currently object size for page sizing is evaluate statically at compile time. Object size isn't really static and can change between JVMs.

Add JOL lib and use it to populate sizing constant at startup

Website with maven-site-plugin

Memory-based page flush strategy

To lower the number of write to disk and to ensure that the in memory storage doesn't cause memory problems to the JVM we need to base the page flush strategy on the actual memory available to the JVM.

I propose we follow a strategy similar to hbase, so:

If table inMem store > server.memstore.table.limit (Default 300 MB)
-> flush one page to disk
If total inMem store > server.memstore.globallimit.hi (Default 40% heap)
-> flush one page to disk starting from the biggest inMem store
-> repeat until total inMem store < server.memstore.globallimit.low (35% heap)

During periodic checkpoint we need to continue to flush ALL the pages to ensure consistency and recovery speed

variable in service.conf:
server.memstore.globallimit.hi=40%
server.memstore.globallimit.low=35%
server.memstore.table.limit=300

Use RandomAccessFile for read/write operations when "herddb.nio.usedirectmemory" is false

Change read/write operations when not using direct buffer

Ability to import MySQL dumps

We would like to import "simple" MySQL dumps (which are essentially an sql script) in order to have the ability to compare the usage of resources of MySQL vs Herd on the same "logical" dataset

The procedure must have a configuration switch to "map" tables to "tablespaces" as in HerdDB we have multiplace indepentent tablespaces

Add list of thirdparty libraries in distributable ZIP

Unlist unloaded page on checkout

PageSet doesn't unlist unloaded pages on checkout

Reduce checkpoint memory usage and time

Checkpoint now scans existing keys, use directly dirtypages ids to just load a page at one, handle it fully and proceeding to the next. This will limit the time needed to scan keys and memory space to store temporary records (just one page at time!)

java.lang.IllegalStateException: page not loaded 6 while updating record

eb 07, 2017 5:18:17 PM herddb.network.netty.NettyChannel lambda$messageReceived$0
SEVERE: NettyChannel{name=unnamed, id=159, socket=[id: 0x8b963173, L:/127.0.0.1:7000 - R:/127.0.0.1:40900] pending 0 msgs}: error java.lang.IllegalStateException: page not loaded 6 while updating record 7573657234363735353536323135393235313734373636
java.lang.IllegalStateException: page not loaded 6 while updating record 7573657234363735353536323135393235313734373636
at herddb.core.TableManager.applyUpdate(TableManager.java:746)
at herddb.core.TableManager.apply(TableManager.java:671)
at herddb.core.TableManager$6.accept(TableManager.java:480)
at herddb.core.TableManager.lambda$accessTableData$3(TableManager.java:1265)
at herddb.utils.BatchOrderedExecutor.finish(BatchOrderedExecutor.java:76)
at herddb.core.TableManager.accessTableData(TableManager.java:1291)
at herddb.core.TableManager.executeUpdate(TableManager.java:474)
at herddb.core.TableManager.executeStatement(TableManager.java:307)
at herddb.core.TableSpaceManager.executeStatement(TableSpaceManager.java:945)
at herddb.core.DBManager.executeStatement(DBManager.java:446)
at herddb.core.DBManager.executePlan(DBManager.java:504)
at herddb.server.ServerSideConnectionPeer.handleExecuteStatement(ServerSideConnectionPeer.java:437)
at herddb.server.ServerSideConnectionPeer.messageReceived(ServerSideConnectionPeer.java:121)
at herddb.network.netty.NettyChannel.lambda$messageReceived$0(NettyChannel.java:82)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Cleanup Bookie temporary errors management

Clean up BookkeeperCommitLog code and add test-cases on cases of bookie restart and in case of fencing of ledgers.

Implement a PrimaryIndexRangeScan

Using a BTree for the KeyToPage gives the ability to exploit the sorted nature of the tree and achive better performance for range scan on the primary key

Ensure that checkpoint period is lower than bookeeper ledger retention

To ensure that data are saved we need to check if server.bookkeeper.ledgersretentionperiod < 2x server.checkpoint.period.

Support basic TRUNCATE table syntax

Fast checkpoint for tables

Actually every checkpoint fully rebuild dirty pages. It is a big and heavy work. We can afford to have pages with old stale data in it knowing from primary key index where the real record is.

Create two checkpoint procedures:
a) fast checkpoint that will flush new and dirty record to new pages but will keep old records versions on original pages
b) slow checkpoint that ensure that every page contains only real data and is as full as possible. It must compact records recovering the work that fast checkpoint didn't do

CLI: "not such tableSpace Exception" on startup

Handle the startup of the server returning better SQLExceptions, for example "server is starting up/shutting down".

"java.sql.SQLException: herddb.model.StatementExecutionException: not such tableSpace default here"

Per-table Checkpoints file never gets shrunk and grows forever

The file which holds the last checkpoint status per table is never rewritten and leads to unlimited disk space usage

herdb-cli.sh support autocommit=false mode and explicit commit/rollback

Closing ledger after failure on AddEntry may lead to data loss

While debugging the flaky test testBookieNotAvailableDuringTransaction I noticed that we are calling LedgerHandle#close in case of error during asyncAddEntry.

The "close" method will "close" ledger metadata and sometimes "chops" the ledger so that some entry which is before the LastAddConfirmed/LastAddPushed

This is the error which causes the closing of the ledger:
Apr 04, 2017 9:04:50 AM org.apache.bookkeeper.client.PendingAddOp submitCallback
SEVERE: Write of ledger entry to quorum failed: L5 E3
Apr 04, 2017 9:04:50 AM org.apache.bookkeeper.client.PendingAddOp submitCallback
SEVERE: Write of ledger entry to quorum failed: L5 E4
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog handleBookKeeperAsyncFailure
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog handleBookKeeperAsyncFailure
SEVERE: bookkeeper async failure on tablespace c378614f1a774a0cb49f3a2901d3c762 while writing entry LogEntry{type=5, tableSpace=default, transactionId=1, tableName=null, key=null, value=null, timestamp=1491289490484}
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException
at org.apache.bookkeeper.client.BKException.create(BKException.java:58)
at herddb.cluster.BookkeeperCommitLog$CommitFileWriter.lambda$writeEntry$66(BookkeeperCommitLog.java:113)
at herddb.cluster.BookkeeperCommitLog$CommitFileWriter$$Lambda$21/375466577.addComplete(Unknown Source)
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:244)
at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:937)
at org.apache.bookkeeper.client.LedgerHandle$2.safeRun(LedgerHandle.java:337)
at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

On "failed" runs we have:

Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog$CommitFileWriter close
SEVERE: Closing ledger 5, with LastAddConfirmed=0, LastAddPushed=0 length=59

On "good" runs we have:

Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog$CommitFileWriter close
SEVERE: Closing ledger 5, with LastAddConfirmed=3, LastAddPushed=3 length=170

The partial fix is not to close the ledger in case of AddEntry failures, I will ask on BookKeeper mailing list for a better solution

Spamming BlockRangeIndex boot logs

When BlockRangeIndex boots it spams severe logs with
May 03, 2017 4:22:34 PM herddb.index.brin.BlockRangeIndex boot SEVERE: boot block at BlockStartKey{0446726f6d17456d61696c53756363657373416c65727453797374656d,1369} 0446726f6d17456d61696c53756363657373416c65727453797374656d - 0446726f6d17456d61696c53756363657373416c65727453797374656d

Store index pages knowledge on index metadata

Actually at startup we need to fully scan index pages to locate right metatada page to build the index. In addition to active pages index metadata should store index metadata pages too to directly access them

Apply sort + limit during table scan

It would be better to apply the 'limit' clause during the scan even in case or sorted result set.
In case of little "limit" clauses on huge tables it will be a great enhancement

Reduce allocations of String[]

While sampling memory allocations during YCSB bench (workloade, mostly "scans") it appears that we are creating a lot of String[] to store temporary schema of tuples/resultsets
surely we can skip this an reduce the impact on memory and so on GC

HerdDBDatasource waitForTableSpace must work for remote tablespaces

Ensure versioned files

Ensure that every data file written to disk have a version followed by available flags.

Version must be a VLong with value 1
Flags must be a VLong with value 0 (actually unused but here to be available in future releases).

herdb-cli.sh - support an 'exit' /'quit' command

Avoid byte[] copy for index pages

Avoid to serialize and deserialize byte arrays and skip intermediate copy for index pages:

Run checkpoint independently per table

Right now we have only 1 thread that run the checkpoint for all the tables. To maximize performance and lower the time that the tables are freezed i think we could evaluate to run the checkpoint of different table independently.

Add more "compiled sql expressions"

It would be great to implement as CompiledSQLExpression all of the expressions which are actually evaluated using InterpretedSQLExpression (and then drop InterpretedSQLExpression forever)

Put a bound on memory used by the PK index (KeyToPage)

Range scans on Indexes doesn't take advantage of index order

Queries like
SELECT * FROM mytable WHERE indexedfield >= v ORDER BY indexedfield LIMIT n
ideally should scan just n values.

Currently every value after v is retrieved, filtered and sorted before limiting the output.

Knownledge of ordering by index should be push down in some way to data scanner with result limit knowledge too.

There should be some way to improve inverted index order scanning too (ex: index defined as ASC and ordering by DESC)

This issue is similar to #105 but on secondary indexes