zrlio / crail Goto Github PK
View Code? Open in Web Editor NEW[Archived] A Fast Multi-tiered Distributed Storage System based on User-Level I/O
Home Page: http://crail.incubator.apache.org/
License: Apache License 2.0
[Archived] A Fast Multi-tiered Distributed Storage System based on User-Level I/O
Home Page: http://crail.incubator.apache.org/
License: Apache License 2.0
fsck -t ping returns all counters zero
NameNodeSerivce atomic counters are not incremented.
In current version, I find that crail.storage.types replace the crail.datanode.types.
In the conf/crail-site.conf.template, it is like below:
crail.storage.types com.ibm.crail.datanode.rdma.RdmaDataNode
but in the README.md, it is this
crail.datanode.types com.ibm.crail.storage.rdma.RdmaStorageTier
In fact crail.storage.types com.ibm.crail.storage.rdma.RdmaStorageTier
is the right one in current version.
Here com.ibm.crail.storage.rdma.RdmaStorageTier
is in the StorageServer.java
.
If using com.ibm.crail.datanode.rdma.RdmaDataNode
, then storageTierIndex
will be 1 which leading to java arrayindexoutofboundsexception at the namenode.
So this STORAGE_TYPES should be consistent in README.me, core-site.conf.template, CrailConstants.java and StorageServer.java.
The usage show getLocation
whereas the check is on getLocations
Please show more details to tell the user what is a wrong option. Just exiting is non-intuitive and less helpful to a user who might not know what went wrong with the system.
the -r
feature is undocumented.
A bit more explanation of commands would be helpful too (as I keep forgetting which one to use or what is the difference between getLocation
and blockStatistics
).
Crail picks IP from interface name to report to namenode, fails to work if multiple IPs are assigned to the interface. Better solution might be to choose interface via specified IP.
None of us are using 2.6 hadoop. All Spark versions are also for 2.7 build. if by mistake i do mvn build i get 2.6 packages that later on creates problems with spark.
I propose to make the default build to 2.7, with the backward option for 2.6. Next time anyone doing a pull request please consider setting this (or I will do, what ever happens first)
Having a jar with source code can be useful for debugging. The following plugin should be added to the pom.xml file to generate the source code jar:
org.apache.maven.plugins maven-source-plugin attach-sources jarI have been having the following issue while running Crail-Spark-TeraSort. It looks like an access permission problem in cachepath and datapath that occurs inside MappedBufferCache's constructor. This is probably because I am using Cloudera Hadoop, and it uses other usernames such as "yarn", "hdfs" for YARN and HDFS.
...
WARN scheduler.DAGScheduler: Creating new stage failed due to exception - job: 0
java.lang.NullPointerException
at com.ibm.crail.memory.MappedBufferCache.(MappedBufferCache.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...
I was running Crail-TeraSort with my own username. But "yarn" is also creating some cache file as below. So essentially the files are created by a different user than the one trying to read them.
ls -lth /memory/cache/
drwxrwxrwx 2 yarn hadoop 4.0K Dec 6 17:06 1512608812456
drwxrwxrwx 2 yarn hadoop 4.0K Dec 5 11:43 1512503023326
I was wondering if there is a way to permanently fix it? For example, does it help to change the access permission when creating RandomAccessFile for MappedBufferCache in the datapath?
RandomAccessFile dataFile = new RandomAccessFile(dataFilePath, "rw"); // (MappedBufferCache.java:89)
Thanks,
Kevin
If running with a block size less than 1MB no blocks seem to be registered which leads to a divide by zero exception:
com.ibm.crail.namenode.rpc.darpc.DaRPCServiceDispatcher:130 - ERROR: Unknown error/ by zero
java.lang.ArithmeticException: / by zero
at com.ibm.crail.namenode.StorageTier$RoundRobinBlockSelection.getNext(BlockStore.java:185)
at com.ibm.crail.namenode.StorageTier$DataNodeArray.get(BlockStore.java:225)
at com.ibm.crail.namenode.StorageTier$DataNodeArray.access$0(BlockStore.java:220)
at com.ibm.crail.namenode.StorageTier.getBlock(BlockStore.java:118)
at com.ibm.crail.namenode.BlockStore.getBlock(BlockStore.java:63)
at com.ibm.crail.namenode.NameNodeService.createFile(NameNodeService.java:99)
at com.ibm.crail.namenode.rpc.darpc.DaRPCServiceDispatcher.processServerEvent(DaRPCServiceDispatcher.java:78)
at com.ibm.darpc.RpcServerGroup.processServerEvent(RpcServerGroup.java:122)
at com.ibm.darpc.RpcServerEndpoint.dispatchReceive(RpcServerEndpoint.java:75)
at com.ibm.darpc.RpcEndpoint.dispatchCqEvent(RpcEndpoint.java:157)
at com.ibm.darpc.RpcCluster.dispatchCqEvent(RpcCluster.java:36)
at com.ibm.darpc.RpcCluster.dispatchCqEvent(RpcCluster.java:1)
at com.ibm.disni.rdma.RdmaCqProcessor.dispatchCqEvent(RdmaCqProcessor.java:106)
at com.ibm.disni.rdma.RdmaCqProcessor.run(RdmaCqProcessor.java:136)
at java.lang.Thread.run(Thread.java:745)
in the close() of DeviceMrCache:
Line100:mr.deregMr()
why not:
mr.deregMr().execute().free()???
Hi,
I read the code of crail and found it supports hugetlbfs, I perform some experiments on hugetlbfs(not associated with crail).my question is we use the MappedByteBuffer which mapped from a file in hugetlbfs and never unmap it, it says GC will take care of it, but the fact is sometimes I run to this error:
java.lang.Error: Cleaner terminated abnormally at sun.misc.Cleaner$1.run(Cleaner.java:147) at sun.misc.Cleaner$1.run(Cleaner.java:144) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.Cleaner.clean(Cleaner.java:144) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:141) Caused by: java.io.IOException: Invalid argument at sun.nio.ch.FileChannelImpl.unmap0(Native Method) at sun.nio.ch.FileChannelImpl.access$000(FileChannelImpl.java:40) at sun.nio.ch.FileChannelImpl$Unmapper.run(FileChannelImpl.java:787) at sun.misc.Cleaner.clean(Cleaner.java:142)
Can anyone give me some hints?I am not very familiar with hugetlbfs.
CrailBuffer could be garbage collected at some point and a new buffer could get the same address as the old and the lookup on the address will not work anymore
Catch all clauses sometimes obfuscate the original exception. For example, in client/src/main/java/com/ibm/crail/CrailBufferedInputStream.java:230, the catch clause creates and throws a new exception which results in loosing the actual stack trace:
private void triggerFetch() throws IOException {
try {
if (future == null && internalBuf.remaining() == 0){
internalBuf.clear();
future = inputStream.read(internalBuf);
if (future == null){
internalBuf.clear().flip();
}
}
} catch(Exception e) {
throw new IOException(e);
}
}
A simple solution would be to print the original stack trace. A better solution would be to remove the try-catch and force the enclosed code to properly deal with all non-IOExceptions.
Actually looking at https://github.com/zrlio/crail/blob/master/bin/start-crail.sh
Do I have to set CRAIL_HOME? It seems like it is set in the beginning automatically. So $CRAIL_HOME will never be undefined?
CrailBenchmark and HdfsIOBenchmark should calculate the execution time on nanoseconds instead of milliseconds (see PR 35).
Example:
[...]
long end = System.currentTimeMillis();
double executionTime = ((double) (end - start));
double latency = executionTime*1000.0 / ((double) batch);
[...]
Possible concurrency problem when using the BufferedOutputStream. Process hangs here:
"Executor task launch worker for task 143" #152 daemon prio=5 os_prio=0 tid=0x00007f8cd0e78000 nid=0x6ae3 runnable [0x00007f8cd4253000]
java.lang.Thread.State: RUNNABLE
at com.ibm.crail.storage.rdma.client.RdmaStoragePassiveEndpoint.write(RdmaStoragePassiveEndpoint.java:194)
at com.ibm.crail.core.CoreOutputStream.trigger(CoreOutputStream.java:110)
at com.ibm.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:238)
at com.ibm.crail.core.CoreStream.dataOperation(CoreStream.java:104)
at com.ibm.crail.core.CoreOutputStream.write(CoreOutputStream.java:67)
at com.ibm.crail.core.DirectoryOutputStream.writeRecord(DirectoryOutputStream.java:53)
at com.ibm.crail.core.CoreFileSystem._createNode(CoreFileSystem.java:211)
at com.ibm.crail.core.CreateNodeFuture.process(CoreMetaDataOperation.java:164)
at com.ibm.crail.core.CreateNodeFuture.process(CoreMetaDataOperation.java:150)
at com.ibm.crail.core.CoreMetaDataOperation.get(CoreMetaDataOperation.java:87)
at com.ibm.crail.core.CoreEarlyFile.file(CoreFile.java:167)
- eliminated <0x00007f8d7bb59890> (a com.ibm.crail.core.CoreEarlyFile)
at com.ibm.crail.core.CoreEarlyFile.getDirectOutputStream(CoreFile.java:104)
- locked <0x00007f8d7bb59890> (a com.ibm.crail.core.CoreEarlyFile)
at com.ibm.crail.CrailBufferedOutputStream.outputStream(CrailBufferedOutputStream.java:329)
at com.ibm.crail.CrailBufferedOutputStream.syncSlice(CrailBufferedOutputStream.java:320)
at com.ibm.crail.CrailBufferedOutputStream.write(CrailBufferedOutputStream.java:124)
at com.ibm.crail.CrailBufferedOutputStream.write(CrailBufferedOutputStream.java:102)
at com.ibm.crail.terasort.serializer.F22SerializerStream.writeObject(F22Serializer.scala:92)
at com.ibm.crail.terasort.serializer.F22SerializerStream.writeValue(F22Serializer.scala:102)
at org.apache.spark.storage.CrailObjectWriter.write(CrailStore.scala:717)
at org.apache.spark.shuffle.crail.CrailShuffleWriter$$anonfun$write$1.apply(CrailShuffleWriter.scala:67)
at org.apache.spark.shuffle.crail.CrailShuffleWriter$$anonfun$write$1.apply(CrailShuffleWriter.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.shuffle.crail.CrailShuffleWriter.write(CrailShuffleWriter.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception allows to throw Exceptions e.g., InterruptedException
, which are not part of IOException
. Excepted changes in :
https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/storage/StorageClient.java#L31
https://github.com/zrlio/crail/blob/master/client/src/main/java/com/ibm/crail/storage/StorageEndpoint.java#L28
When experimenting with crail.directoryrecord
or crail.directorydepth
parameters, the error when the file name or depth exceeded these limits are cryptic. See below. It would be nice to check if it is the case and then print a more sensible error, for example "limit for these params are exceeded", etc. ;)
For the rest of the system it seems like a particular file just vanished underneath the crail file system.
17/09/21 13:44:48 2501 main WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/09/21 13:45:17 30985 main ERROR FileFormatWriter: Aborting job null.
java.io.FileNotFoundException: /sql/data1.pq/_temporary/0/task_20170921134508_0001_m_000021
at com.ibm.crail.hdfs.CrailHadoopFileSystem.listStatus(CrailHadoopFileSystem.java:199)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:426)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:47)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:128)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:209)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
at com.ibm.crail.spark.tools.ParquetGenerator$.main(ParquetGenerator.scala:116)
at com.ibm.crail.spark.tools.ParquetGenerator.main(ParquetGenerator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
at com.ibm.crail.spark.tools.ParquetGenerator$.main(ParquetGenerator.scala:116)
at com.ibm.crail.spark.tools.ParquetGenerator.main(ParquetGenerator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: /sql/data1.pq/_temporary/0/task_20170921134508_0001_m_000021
at com.ibm.crail.hdfs.CrailHadoopFileSystem.listStatus(CrailHadoopFileSystem.java:199)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:426)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:47)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:128)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:209)
... 44 more
Allow runtime registration of file types to make code cleaner, i.e. no more new else if when adding new file types.
Hi, I have met something wrong when I ran iobench.
When I run ./bin/crail iobench -t writeClusterDirect -s 1048576 -k 1024 -f /tmp.dat
, it worked well. And I can see the result like below and find the tmp.dat using ./bin/crail fs -l /
:
starting benchmark...
execution time 0.157
ops 1024.0
sumbytes 1.073741824E9
throughput 54712.95918471338
latency 153.3203125
But when I read the file. It went wrong. I use the command like below
./bin/crail iobench -t readSequentialDirect -s 1048576 -k 1024 -f /tmp.dat
The result is this and halt there:
starting benchmark...
17/04/10 15:28:29 INFO crail: faulty request, status 12
Could any one tell me where is wrong?
I don't believe Crail supports multiple devices per datanode for a specific tier. For instance, exporting of two nvmef targets from a storagetier on a single datanode, like:
crail.storage.blkdev.datapath /dev/nvme0n1,/dev/nvme1n1,
I'm trying to figure out scope out how much effort this would be, but I first wanted to check and see if there is already any plans or existing work to support such functionality? This would probably be most useful to the blk-dev repo, where we could just expose multiple iscsi/nvmef targets to a namenode, like in the above conf example
Thanks,
Tim
Null return often lead to unexpected null pointer exceptions so we should favor exception where it makes sense.
CrailBenchmark and HdfsIOBenchmark should use a map from testnames (command line) to test case functions (see PR 35).
Example:
[...]
int locationClass = 0;
boolean useBuffered = true;
String benchmarkTypes = "write|writeAsync|readSequential|readRandom|readSequentialAsync|readMultiStream|"
+ "createFile|createFileAsync|createMultiFile|getKey|getFile|getFileAsync|getMultiFile"
+ "getMultiFileAsync|enumerateDir|browseDir|"
+ "writeInt|readInt|seekInt|readMultiStreamInt|printLocationclass";
Option typeOption = Option.builder("t").desc("type of experiment [" + benchmarkTypes + "]").hasArg().build();
Option fileOption = Option.builder("f").desc("filename").hasArg().build();
[...]
A dataBuf.clear()
might be required here before setting limit
and position
? Otherwise in my parquet reader I get
java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at com.ibm.crail.memory.OffHeapBuffer.position(OffHeapBuffer.java:46)
at com.ibm.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:239)
at com.ibm.crail.core.CoreStream.dataOperation(CoreStream.java:105)
at com.ibm.crail.core.CoreInputStream.read(CoreInputStream.java:77)
...
I think you want to do :
Currently it is an infinite recursion.
The default config that is generated in assembly/target contains crail-site.conf, which has a list of parameters:
crail.blocksize 1048576
crail.buffersize 1048576
crail.regionsize 1073741824
crail.cachelimit 1073741824
crail.cachepath <path, should be huge page mountpoint>
crail.singleton true
crail.statistics true
crail.namenode.address crail://<hostname>:9060
..
It would be nice if we can qualify all parameters that are listed there ... for example
crail.blocksize
can be crail.fs.blocksize
, and crail.buffersize
can be crail.client.buffersize
. This modification explicitly shows which parameter is relevant for which Crail component.
Warn if an option in the configuration file (crail-site.conf) is not used.
Hello,
A problem confusing me a lot. I believe it is a code bug of Crail
When running spark-io with Hibench-pagerank with 3 iterations. The stage 0 is fine, as the stage 1 starts, the problem pops out: java.io.StreamCorruptedException: invalid type code: AC;
the debug info is as follows:
WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 40, 172.18.0.17): java.io.StreamCorruptedException: invalid type code: AC
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1381)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:157)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.shuffle.crail.CrailInputCloser.hasNext(CrailInputCloser.scala:33)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.shuffle.crail.CrailShuffleWriter.write(CrailShuffleWriter.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
How to solve this? A little help will be appreciated
Please add single byte get and put methods which are available on the ByteBuffer
to CrailBuffer
https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#get()
These are needed for the parquet header reader.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.