diennea / herddb Goto Github PK
View Code? Open in Web Editor NEWA JVM-embeddable Distributed Database
Home Page: https://herddb.org
License: Apache License 2.0
A JVM-embeddable Distributed Database
Home Page: https://herddb.org
License: Apache License 2.0
Queries like
SELECT * FROM mytable WHERE pkfield >= v ORDER BY pkfield LIMIT n
ideally should scan just n values.
Currently every value after v is retrieved, filtered and sorted before limiting the output.
Knownledge of ordering by pk should be push down in some way to data scanner with result limit knowledge too.
There should be some way to improve inverted pk order scanning too (ex: pk defined as ASC and ordering by DESC)
When executing a prepared statement with a wrong number of arguments it fail with a IndexOutOfBoundsException during statement execution (TableManager)
It should fail fast during parsing
java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4454) at herddb.sql.expressions.JdbcParameterExpression.evaluate(JdbcParameterExpression.java:36) at herddb.sql.SQLRecordKeyFunction.computeNewValue(SQLRecordKeyFunction.java:101) at herddb.core.TableManager.executeInsert(TableManager.java:753) at herddb.core.TableManager.executeStatement(TableManager.java:487) at herddb.core.TableSpaceManager.executeStatement(TableSpaceManager.java:1026) at herddb.core.DBManager.executeStatement(DBManager.java:577) at herddb.core.DBManager.executePlan(DBManager.java:644) at herddb.server.ServerSideConnectionPeer.handleExecuteStatement(ServerSideConnectionPeer.java:563) at herddb.server.ServerSideConnectionPeer.messageReceived(ServerSideConnectionPeer.java:129) at herddb.network.netty.NettyChannel.lambda$messageReceived$0(NettyChannel.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
As we are using the YCSB suite https://research.yahoo.com/news/yahoo-cloud-serving-benchmark/ it would be great to have a bash script to run YCBS workloads directly on the compiled artifacts
for standalone instances it would be very useful
parameters:
TableManager during checkpoint and truncate try to remove from PageReplacementPolicy even the current working page unknown to such policy.
Some configuration properties use dot case (dot.case) others camel case (camelCase), convert any camel case property into dot case.
The idea is to have a simple way to boot zookeeper, to have a simple way to play with "clustered herd" without the need to install ZooKeeper server
It would be better to use always at least the "local" (embedded) Bookie in order to:
Currently these datatypes are only partially supported.
Secondary indexes can be checkpoint on a different log position of owning table but they will receive records based on owning table starting position.
Indexes must have the same checkpoint log position:
On restore the IndexStatus relative to table checkpoint log position must be used instead of latest one
This issue is linked to #67
To minimize the stress on the storage subsystem we should try to run the checkpoints of different table at slightly different times.
Something like: next.table.checkpoint.period = server.checkpoint.period +- 20% random of server.checkpoint.period
Currently object size for page sizing is evaluate statically at compile time. Object size isn't really static and can change between JVMs.
Add JOL lib and use it to populate sizing constant at startup
To lower the number of write to disk and to ensure that the in memory storage doesn't cause memory problems to the JVM we need to base the page flush strategy on the actual memory available to the JVM.
I propose we follow a strategy similar to hbase, so:
During periodic checkpoint we need to continue to flush ALL the pages to ensure consistency and recovery speed
variable in service.conf:
server.memstore.globallimit.hi=40%
server.memstore.globallimit.low=35%
server.memstore.table.limit=300
Change read/write operations when not using direct buffer
We would like to import "simple" MySQL dumps (which are essentially an sql script) in order to have the ability to compare the usage of resources of MySQL vs Herd on the same "logical" dataset
The procedure must have a configuration switch to "map" tables to "tablespaces" as in HerdDB we have multiplace indepentent tablespaces
PageSet doesn't unlist unloaded pages on checkout
Checkpoint now scans existing keys, use directly dirtypages ids to just load a page at one, handle it fully and proceeding to the next. This will limit the time needed to scan keys and memory space to store temporary records (just one page at time!)
eb 07, 2017 5:18:17 PM herddb.network.netty.NettyChannel lambda$messageReceived$0
SEVERE: NettyChannel{name=unnamed, id=159, socket=[id: 0x8b963173, L:/127.0.0.1:7000 - R:/127.0.0.1:40900] pending 0 msgs}: error java.lang.IllegalStateException: page not loaded 6 while updating record 7573657234363735353536323135393235313734373636
java.lang.IllegalStateException: page not loaded 6 while updating record 7573657234363735353536323135393235313734373636
at herddb.core.TableManager.applyUpdate(TableManager.java:746)
at herddb.core.TableManager.apply(TableManager.java:671)
at herddb.core.TableManager$6.accept(TableManager.java:480)
at herddb.core.TableManager.lambda$accessTableData$3(TableManager.java:1265)
at herddb.utils.BatchOrderedExecutor.finish(BatchOrderedExecutor.java:76)
at herddb.core.TableManager.accessTableData(TableManager.java:1291)
at herddb.core.TableManager.executeUpdate(TableManager.java:474)
at herddb.core.TableManager.executeStatement(TableManager.java:307)
at herddb.core.TableSpaceManager.executeStatement(TableSpaceManager.java:945)
at herddb.core.DBManager.executeStatement(DBManager.java:446)
at herddb.core.DBManager.executePlan(DBManager.java:504)
at herddb.server.ServerSideConnectionPeer.handleExecuteStatement(ServerSideConnectionPeer.java:437)
at herddb.server.ServerSideConnectionPeer.messageReceived(ServerSideConnectionPeer.java:121)
at herddb.network.netty.NettyChannel.lambda$messageReceived$0(NettyChannel.java:82)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Clean up BookkeeperCommitLog code and add test-cases on cases of bookie restart and in case of fencing of ledgers.
Using a BTree for the KeyToPage gives the ability to exploit the sorted nature of the tree and achive better performance for range scan on the primary key
To ensure that data are saved we need to check if server.bookkeeper.ledgersretentionperiod < 2x server.checkpoint.period.
Actually every checkpoint fully rebuild dirty pages. It is a big and heavy work. We can afford to have pages with old stale data in it knowing from primary key index where the real record is.
Create two checkpoint procedures:
a) fast checkpoint that will flush new and dirty record to new pages but will keep old records versions on original pages
b) slow checkpoint that ensure that every page contains only real data and is as full as possible. It must compact records recovering the work that fast checkpoint didn't do
Handle the startup of the server returning better SQLExceptions, for example "server is starting up/shutting down".
"java.sql.SQLException: herddb.model.StatementExecutionException: not such tableSpace default here"
The file which holds the last checkpoint status per table is never rewritten and leads to unlimited disk space usage
While debugging the flaky test testBookieNotAvailableDuringTransaction I noticed that we are calling LedgerHandle#close in case of error during asyncAddEntry.
The "close" method will "close" ledger metadata and sometimes "chops" the ledger so that some entry which is before the LastAddConfirmed/LastAddPushed
This is the error which causes the closing of the ledger:
Apr 04, 2017 9:04:50 AM org.apache.bookkeeper.client.PendingAddOp submitCallback
SEVERE: Write of ledger entry to quorum failed: L5 E3
Apr 04, 2017 9:04:50 AM org.apache.bookkeeper.client.PendingAddOp submitCallback
SEVERE: Write of ledger entry to quorum failed: L5 E4
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog handleBookKeeperAsyncFailure
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog handleBookKeeperAsyncFailure
SEVERE: bookkeeper async failure on tablespace c378614f1a774a0cb49f3a2901d3c762 while writing entry LogEntry{type=5, tableSpace=default, transactionId=1, tableName=null, key=null, value=null, timestamp=1491289490484}
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException
at org.apache.bookkeeper.client.BKException.create(BKException.java:58)
at herddb.cluster.BookkeeperCommitLog$CommitFileWriter.lambda$writeEntry$66(BookkeeperCommitLog.java:113)
at herddb.cluster.BookkeeperCommitLog$CommitFileWriter$$Lambda$21/375466577.addComplete(Unknown Source)
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:244)
at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:937)
at org.apache.bookkeeper.client.LedgerHandle$2.safeRun(LedgerHandle.java:337)
at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
On "failed" runs we have:
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog$CommitFileWriter close
SEVERE: Closing ledger 5, with LastAddConfirmed=0, LastAddPushed=0 length=59
On "good" runs we have:
Apr 04, 2017 9:04:50 AM herddb.cluster.BookkeeperCommitLog$CommitFileWriter close
SEVERE: Closing ledger 5, with LastAddConfirmed=3, LastAddPushed=3 length=170
The partial fix is not to close the ledger in case of AddEntry failures, I will ask on BookKeeper mailing list for a better solution
When BlockRangeIndex boots it spams severe logs with
May 03, 2017 4:22:34 PM herddb.index.brin.BlockRangeIndex boot SEVERE: boot block at BlockStartKey{0446726f6d17456d61696c53756363657373416c65727453797374656d,1369} 0446726f6d17456d61696c53756363657373416c65727453797374656d - 0446726f6d17456d61696c53756363657373416c65727453797374656d
Actually at startup we need to fully scan index pages to locate right metatada page to build the index. In addition to active pages index metadata should store index metadata pages too to directly access them
It would be better to apply the 'limit' clause during the scan even in case or sorted result set.
In case of little "limit" clauses on huge tables it will be a great enhancement
While sampling memory allocations during YCSB bench (workloade, mostly "scans") it appears that we are creating a lot of String[] to store temporary schema of tuples/resultsets
surely we can skip this an reduce the impact on memory and so on GC
Ensure that every data file written to disk have a version followed by available flags.
Version must be a VLong with value 1
Flags must be a VLong with value 0 (actually unused but here to be available in future releases).
Avoid to serialize and deserialize byte arrays and skip intermediate copy for index pages:
Right now we have only 1 thread that run the checkpoint for all the tables. To maximize performance and lower the time that the tables are freezed i think we could evaluate to run the checkpoint of different table independently.
It would be great to implement as CompiledSQLExpression all of the expressions which are actually evaluated using InterpretedSQLExpression (and then drop InterpretedSQLExpression forever)
Queries like
SELECT * FROM mytable WHERE indexedfield >= v ORDER BY indexedfield LIMIT n
ideally should scan just n values.
Currently every value after v is retrieved, filtered and sorted before limiting the output.
Knownledge of ordering by index should be push down in some way to data scanner with result limit knowledge too.
There should be some way to improve inverted index order scanning too (ex: index defined as ASC and ordering by DESC)
This issue is similar to #105 but on secondary indexes
Add findbugs to the Maven build, fix all the issues and then configure on the official CI the run of the findbugs:check goal
Implement SELECT * FROM TABLE LIMIT ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.