zrlio / crail Goto Github PK

View Code? Open in Web Editor NEW

71.0 21.0 18.0 893 KB

[Archived] A Fast Multi-tiered Distributed Storage System based on User-Level I/O

Home Page: http://crail.incubator.apache.org/

License: Apache License 2.0

Shell 2.08% Java 97.92%

crail distributed-systems high-performance java-8

crail's Introduction

Crail

January 2018: Crail project has become an Apache Incubator project. Please refer to http://crail.incubator.apache.org/ for the latest documentation, and https://github.com/apache/incubator-crail for the source code. Please use the dev mailing list at [email protected] for questions and feedback.

crail's People

Contributors

Stargazers

Watchers

Forkers

wjliu inesmess ram1991 vtrocelab imcg vuhuong gumi-presentation-by-dzh iamshreeram pepperjo asqasq mylinyuzhi follitude bissont nweller xinxin123liu u20024804 huangtao00 luyuncheng

crail's Issues

MRCache does not hold reference to CrailBuffer

CrailBuffer could be garbage collected at some point and a new buffer could get the same address as the old and the lookup on the address will not work anymore

Multiple devices per datanode, per tier

I don't believe Crail supports multiple devices per datanode for a specific tier. For instance, exporting of two nvmef targets from a storagetier on a single datanode, like:

crail.storage.blkdev.datapath /dev/nvme0n1,/dev/nvme1n1,

I'm trying to figure out scope out how much effort this would be, but I first wanted to check and see if there is already any plans or existing work to support such functionality? This would probably be most useful to the blk-dev repo, where we could just expose multiple iscsi/nvmef targets to a namenode, like in the above conf example

Thanks,
Tim

iobench read faulty request

Hi, I have met something wrong when I ran iobench.
When I run ./bin/crail iobench -t writeClusterDirect -s 1048576 -k 1024 -f /tmp.dat, it worked well. And I can see the result like below and find the tmp.dat using ./bin/crail fs -l /:

starting benchmark...
execution time 0.157
ops 1024.0
sumbytes 1.073741824E9
throughput 54712.95918471338
latency 153.3203125

But when I read the file. It went wrong. I use the command like below
./bin/crail iobench -t readSequentialDirect -s 1048576 -k 1024 -f /tmp.dat
The result is this and halt there:

starting benchmark...
17/04/10 15:28:29 INFO crail: faulty request, status 12

Could any one tell me where is wrong?

move the default build to 2.7 hadoop

None of us are using 2.6 hadoop. All Spark versions are also for 2.7 build. if by mistake i do mvn build i get 2.6 packages that later on creates problems with spark.

I propose to make the default build to 2.7, with the backward option for 2.6. Next time anyone doing a pull request please consider setting this (or I will do, what ever happens first)

Make RPC to clear ping counters

Hibench-pagerank： java.io.StreamCorruptedException: invalid type code:AC

Hello，
A problem confusing me a lot. I believe it is a code bug of Crail
When running spark-io with Hibench-pagerank with 3 iterations. The stage 0 is fine, as the stage 1 starts, the problem pops out: java.io.StreamCorruptedException: invalid type code: AC;
the debug info is as follows:
WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 40, 172.18.0.17): java.io.StreamCorruptedException: invalid type code: AC
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1381)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:157)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.shuffle.crail.CrailInputCloser.hasNext(CrailInputCloser.scala:33)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.shuffle.crail.CrailShuffleWriter.write(CrailShuffleWriter.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

How to solve this? A little help will be appreciated

setting up $CRAIL_HOME

Actually looking at https://github.com/zrlio/crail/blob/master/bin/start-crail.sh

Do I have to set CRAIL_HOME? It seems like it is set in the beginning automatically. So $CRAIL_HOME will never be undefined?

Fs check ping - all counters zero

fsck -t ping returns all counters zero

NameNodeSerivce atomic counters are not incremented.

Check configuration file for unused options

Warn if an option in the configuration file (crail-site.conf) is not used.

mmap in hugetlbfs

Hi,
I read the code of crail and found it supports hugetlbfs, I perform some experiments on hugetlbfs(not associated with crail).my question is we use the MappedByteBuffer which mapped from a file in hugetlbfs and never unmap it, it says GC will take care of it, but the fact is sometimes I run to this error:
java.lang.Error: Cleaner terminated abnormally at sun.misc.Cleaner$1.run(Cleaner.java:147) at sun.misc.Cleaner$1.run(Cleaner.java:144) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.Cleaner.clean(Cleaner.java:144) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:141) Caused by: java.io.IOException: Invalid argument at sun.nio.ch.FileChannelImpl.unmap0(Native Method) at sun.nio.ch.FileChannelImpl.access$000(FileChannelImpl.java:40) at sun.nio.ch.FileChannelImpl$Unmapper.run(FileChannelImpl.java:787) at sun.misc.Cleaner.clean(Cleaner.java:142)
Can anyone give me some hints?I am not very familiar with hugetlbfs.

Calculate execution time in benchmarks using System.nanoTime() instead of System.currentTimeMillis()

CrailBenchmark and HdfsIOBenchmark should calculate the execution time on nanoseconds instead of milliseconds (see PR 35).

Example:
[...]
long end = System.currentTimeMillis();
double executionTime = ((double) (end - start));
double latency = executionTime*1000.0 / ((double) batch);
[...]

Check the directory entry size

When experimenting with crail.directoryrecord or crail.directorydepth parameters, the error when the file name or depth exceeded these limits are cryptic. See below. It would be nice to check if it is the case and then print a more sensible error, for example "limit for these params are exceeded", etc. ;)

For the rest of the system it seems like a particular file just vanished underneath the crail file system.

 17/09/21 13:44:48 2501 main WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/09/21 13:45:17 30985 main ERROR FileFormatWriter: Aborting job null.         
 java.io.FileNotFoundException: /sql/data1.pq/_temporary/0/task_20170921134508_0001_m_000021
	at com.ibm.crail.hdfs.CrailHadoopFileSystem.listStatus(CrailHadoopFileSystem.java:199)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:426)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
	at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:47)
	at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:128)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:209)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
	at com.ibm.crail.spark.tools.ParquetGenerator$.main(ParquetGenerator.scala:116)
	at com.ibm.crail.spark.tools.ParquetGenerator.main(ParquetGenerator.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" org.apache.spark.SparkException: Job aborted.
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
	at com.ibm.crail.spark.tools.ParquetGenerator$.main(ParquetGenerator.scala:116)
	at com.ibm.crail.spark.tools.ParquetGenerator.main(ParquetGenerator.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: /sql/data1.pq/_temporary/0/task_20170921134508_0001_m_000021
	at com.ibm.crail.hdfs.CrailHadoopFileSystem.listStatus(CrailHadoopFileSystem.java:199)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:426)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
	at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:47)
	at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:128)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:209)
	... 44 more

Moving IOException to Exception for interface calls

Exception allows to throw Exceptions e.g., InterruptedException, which are not part of IOException. Excepted changes in :

https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/storage/StorageClient.java#L31
https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/storage/StorageEndpoint.java#L28

Exception stack trace overwritten

Catch all clauses sometimes obfuscate the original exception. For example, in client/src/main/java/com/ibm/crail/CrailBufferedInputStream.java:230, the catch clause creates and throws a new exception which results in loosing the actual stack trace:

       private void triggerFetch() throws IOException {
                try {
                        if (future == null && internalBuf.remaining() == 0){
                                internalBuf.clear();
                                future = inputStream.read(internalBuf);
                                if (future == null){
                                        internalBuf.clear().flip();
                                }
                        }
                } catch(Exception e) {
                        throw new IOException(e);
                }
        }

A simple solution would be to print the original stack trace. A better solution would be to remove the try-catch and force the enclosed code to properly deal with all non-IOExceptions.

Dynamic registration of file types

Allow runtime registration of file types to make code cleaner, i.e. no more new else if when adding new file types.

CrailBuffer enhancement

Please add single byte get and put methods which are available on the ByteBuffer to CrailBuffer

https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#get()

These are needed for the parquet header reader.

recursive call typo

https://github.com/zrlio/crail/blob/master/hdfs/src/main/java/com/ibm/crail/hdfs/CrailHDFSInputStream.java#L164

I think you want to do :

readFully(position, buf);

readFully(position, buf, 0, buf.length);

Currently it is an infinite recursion.

Make RPC to explicitly trigger GC on the namenode

Divide by zero exception if running with a small block size

If running with a block size less than 1MB no blocks seem to be registered which leads to a divide by zero exception:

com.ibm.crail.namenode.rpc.darpc.DaRPCServiceDispatcher:130 - ERROR: Unknown error/ by zero
java.lang.ArithmeticException: / by zero
at com.ibm.crail.namenode.StorageTier$RoundRobinBlockSelection.getNext(BlockStore.java:185)
at com.ibm.crail.namenode.StorageTier$DataNodeArray.get(BlockStore.java:225)
at com.ibm.crail.namenode.StorageTier$DataNodeArray.access$0(BlockStore.java:220)
at com.ibm.crail.namenode.StorageTier.getBlock(BlockStore.java:118)
at com.ibm.crail.namenode.BlockStore.getBlock(BlockStore.java:63)
at com.ibm.crail.namenode.NameNodeService.createFile(NameNodeService.java:99)
at com.ibm.crail.namenode.rpc.darpc.DaRPCServiceDispatcher.processServerEvent(DaRPCServiceDispatcher.java:78)
at com.ibm.darpc.RpcServerGroup.processServerEvent(RpcServerGroup.java:122)
at com.ibm.darpc.RpcServerEndpoint.dispatchReceive(RpcServerEndpoint.java:75)
at com.ibm.darpc.RpcEndpoint.dispatchCqEvent(RpcEndpoint.java:157)
at com.ibm.darpc.RpcCluster.dispatchCqEvent(RpcCluster.java:36)
at com.ibm.darpc.RpcCluster.dispatchCqEvent(RpcCluster.java:1)
at com.ibm.disni.rdma.RdmaCqProcessor.dispatchCqEvent(RdmaCqProcessor.java:106)
at com.ibm.disni.rdma.RdmaCqProcessor.run(RdmaCqProcessor.java:136)
at java.lang.Thread.run(Thread.java:745)

DirectoryRecord: update

Only process valid entry i.e. check for valid before using length to create array.
Don't allocate array every time on update you know max size from DIRECTORY_RECORD

MappedBufferCache: Fails with NullPointerException on permission error

crail-site.conf configure mistake leads to java arrayindexoutofboundsexception 1

In current version, I find that crail.storage.types replace the crail.datanode.types.
In the conf/crail-site.conf.template, it is like below:
crail.storage.types com.ibm.crail.datanode.rdma.RdmaDataNode
but in the README.md, it is this
crail.datanode.types com.ibm.crail.storage.rdma.RdmaStorageTier
In fact crail.storage.types com.ibm.crail.storage.rdma.RdmaStorageTier is the right one in current version.
Here com.ibm.crail.storage.rdma.RdmaStorageTier is in the StorageServer.java.
If using com.ibm.crail.datanode.rdma.RdmaDataNode, then storageTierIndex will be 1 which leading to java arrayindexoutofboundsexception at the namenode.
So this STORAGE_TYPES should be consistent in README.me, core-site.conf.template, CrailConstants.java and StorageServer.java.

Jar with source code not generated as part of build process

Having a jar with source code can be useful for debugging. The following plugin should be added to the pom.xml file to generate the source code jar:

org.apache.maven.plugins maven-source-plugin attach-sources jar

More descriptive names for Crail parameters

The default config that is generated in assembly/target contains crail-site.conf, which has a list of parameters:

crail.blocksize				1048576
crail.buffersize			1048576
crail.regionsize			1073741824
crail.cachelimit			1073741824
crail.cachepath				<path, should be huge page mountpoint>
crail.singleton				true
crail.statistics 			true
crail.namenode.address			crail://<hostname>:9060
..

It would be nice if we can qualify all parameters that are listed there ... for example
crail.blocksize can be crail.fs.blocksize, and crail.buffersize can be crail.client.buffersize. This modification explicitly shows which parameter is relevant for which Crail component.

Access permission to cachepath and datapath files

I have been having the following issue while running Crail-Spark-TeraSort. It looks like an access permission problem in cachepath and datapath that occurs inside MappedBufferCache's constructor. This is probably because I am using Cloudera Hadoop, and it uses other usernames such as "yarn", "hdfs" for YARN and HDFS.

...
WARN scheduler.DAGScheduler: Creating new stage failed due to exception - job: 0
java.lang.NullPointerException
at com.ibm.crail.memory.MappedBufferCache.(MappedBufferCache.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...

I was running Crail-TeraSort with my own username. But "yarn" is also creating some cache file as below. So essentially the files are created by a different user than the one trying to read them.
ls -lth /memory/cache/
drwxrwxrwx 2 yarn hadoop 4.0K Dec 6 17:06 1512608812456
drwxrwxrwx 2 yarn hadoop 4.0K Dec 5 11:43 1512503023326

I was wondering if there is a way to permanently fix it? For example, does it help to change the access permission when creating RandomAccessFile for MappedBufferCache in the datapath?
RandomAccessFile dataFile = new RandomAccessFile(dataFilePath, "rw"); // (MappedBufferCache.java:89)

Thanks,
Kevin

A question in MrCache$DeviceMrCahce#close()

in the close() of DeviceMrCache:
Line100:mr.deregMr()

why not:
mr.deregMr().execute().free()???

Hang when writing directory record entry

Possible concurrency problem when using the BufferedOutputStream. Process hangs here:

"Executor task launch worker for task 143" #152 daemon prio=5 os_prio=0 tid=0x00007f8cd0e78000 nid=0x6ae3 runnable [0x00007f8cd4253000]
java.lang.Thread.State: RUNNABLE
at com.ibm.crail.storage.rdma.client.RdmaStoragePassiveEndpoint.write(RdmaStoragePassiveEndpoint.java:194)
at com.ibm.crail.core.CoreOutputStream.trigger(CoreOutputStream.java:110)
at com.ibm.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:238)
at com.ibm.crail.core.CoreStream.dataOperation(CoreStream.java:104)
at com.ibm.crail.core.CoreOutputStream.write(CoreOutputStream.java:67)
at com.ibm.crail.core.DirectoryOutputStream.writeRecord(DirectoryOutputStream.java:53)
at com.ibm.crail.core.CoreFileSystem._createNode(CoreFileSystem.java:211)
at com.ibm.crail.core.CreateNodeFuture.process(CoreMetaDataOperation.java:164)
at com.ibm.crail.core.CreateNodeFuture.process(CoreMetaDataOperation.java:150)
at com.ibm.crail.core.CoreMetaDataOperation.get(CoreMetaDataOperation.java:87)
at com.ibm.crail.core.CoreEarlyFile.file(CoreFile.java:167)
- eliminated <0x00007f8d7bb59890> (a com.ibm.crail.core.CoreEarlyFile)
at com.ibm.crail.core.CoreEarlyFile.getDirectOutputStream(CoreFile.java:104)
- locked <0x00007f8d7bb59890> (a com.ibm.crail.core.CoreEarlyFile)
at com.ibm.crail.CrailBufferedOutputStream.outputStream(CrailBufferedOutputStream.java:329)
at com.ibm.crail.CrailBufferedOutputStream.syncSlice(CrailBufferedOutputStream.java:320)
at com.ibm.crail.CrailBufferedOutputStream.write(CrailBufferedOutputStream.java:124)
at com.ibm.crail.CrailBufferedOutputStream.write(CrailBufferedOutputStream.java:102)
at com.ibm.crail.terasort.serializer.F22SerializerStream.writeObject(F22Serializer.scala:92)
at com.ibm.crail.terasort.serializer.F22SerializerStream.writeValue(F22Serializer.scala:102)
at org.apache.spark.storage.CrailObjectWriter.write(CrailStore.scala:717)
at org.apache.spark.shuffle.crail.CrailShuffleWriter$$anonfun$write$1.apply(CrailShuffleWriter.scala:67)
at org.apache.spark.shuffle.crail.CrailShuffleWriter$$anonfun$write$1.apply(CrailShuffleWriter.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.shuffle.crail.CrailShuffleWriter.write(CrailShuffleWriter.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Use map from test case name to test case function

CrailBenchmark and HdfsIOBenchmark should use a map from testnames (command line) to test case functions (see PR 35).

Example:
[...]
int locationClass = 0;
boolean useBuffered = true;

	String benchmarkTypes = "write|writeAsync|readSequential|readRandom|readSequentialAsync|readMultiStream|"
			+ "createFile|createFileAsync|createMultiFile|getKey|getFile|getFileAsync|getMultiFile"
			+ "getMultiFileAsync|enumerateDir|browseDir|"
			+ "writeInt|readInt|seekInt|readMultiStreamInt|printLocationclass";
	Option typeOption = Option.builder("t").desc("type of experiment [" + benchmarkTypes + "]").hasArg().build();
	Option fileOption = Option.builder("f").desc("filename").hasArg().build();

[...]

potential bug?

crail/client/src/main/java/com/ibm/crail/core/CoreStream.java

Line 235 in 8055c50

dataBuf.position(opDesc.getBufferPosition());

A dataBuf.clear() might be required here before setting limit and position? Otherwise in my parquet reader I get

java.lang.IllegalArgumentException
	at java.nio.Buffer.position(Buffer.java:244)
	at com.ibm.crail.memory.OffHeapBuffer.position(OffHeapBuffer.java:46)
	at com.ibm.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:239)
	at com.ibm.crail.core.CoreStream.dataOperation(CoreStream.java:105)
	at com.ibm.crail.core.CoreInputStream.read(CoreInputStream.java:77)
	...

Favor exceptions to null return value

Null return often lead to unexpected null pointer exceptions so we should favor exception where it makes sense.

fsck tool issues

https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/tools/CrailFsck.java#L181

The usage show getLocation whereas the check is on getLocations

https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/tools/CrailFsck.java#L176

Please show more details to tell the user what is a wrong option. Just exiting is non-intuitive and less helpful to a user who might not know what went wrong with the system.

the -r feature is undocumented.
https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/tools/CrailFsck.java#L49

A bit more explanation of commands would be helpful too (as I keep forgetting which one to use or what is the difference between getLocation and blockStatistics).

multiple IPs per interface

Crail picks IP from interface name to report to namenode, fails to work if multiple IPs are assigned to the interface. Better solution might be to choose interface via specified IP.