- Flink:
/flink-0.9-SNAPSHOT
includes a modified flink system which has implemented the algorithms mentioned in the papers - Hadoop: we suggest to deploy
hadoop-2.4.1
HDFS to save the checkpoints and set a small block size so that the data would be partitioned locally - Spark: we suggest to employ
spark-1.3.1-bin-hadoop2.6
to generate input dataset for K-means - Procrustes:
procrustes-flink-1.0-SNAPSHOT.jar
includes the pagerank, connected compoents and k-means implementations on Flink
Shell scripts to drive the experiments (please ignore peel-xml)
taskmanager.tmp.dirs: the process id for jobmanager or taskmanager might be saved here
taskmanager.checkpoint.dir: hdfs to save checkpoint
To enable confined recovery, please set
confined.recovery: true
replication.recovery: false
To enable replica recovery, please set
confined.recovery: false
replication.recovery: true
For PageRank and Connected components, we use WebGraph as input dataset
java -cp "*" it.unimi.dsi.webgraph.ArcListASCIIGraph -g BVGraph $input path$ $output path$
For K-Means, we use spark and procrustes to generate the dataset
./spark-submit --class eu.stratosphere.procrustes.datagen.spark.SparkClusterGenerator ../procrustes-datagen-1.0-SNAPSHOT.jar spark://localhost:port $#parallelism$ $#items$ file://clusters-D3-K3.csv hdfs://.../input/clusters
- Head checkpoint
/…/flink-0.9-SNAPSHOT/bin/flink run -v -c eu.stratosphere.procrustes.experiments.recovery.PageRank ${app.path.jobs}/procrustes-flink-1.0-SNAPSHOT.jar ${system.hadoop-2.path.input}/input ${system.hadoop-2.path.output}/output ${# of pages} ${# of iteration} ${# of checkpoint interval}
Tips: checkpoint would be saved if ${# of checkpoint interval}>0
and there is no checkpoint if ${# of checkpoint interval}=0
- Tail checkpoint
/…/flink-0.9-SNAPSHOT/bin/flink run -v -c eu.stratosphere.procrustes.experiments.recovery.PageRankLateCpt ${app.path.jobs}/procrustes-flink-1.0-SNAPSHOT.jar ${system.hadoop-2.path.input}/input ${system.hadoop-2.path.output}/output ${# of pages} ${# of iteration} ${# of checkpoint interval}
- Head checkpoint
/…/flink-0.9-SNAPSHOT/bin/flink run -v -c eu.stratosphere.procrustes.experiments.recovery.ConnectedComponentsBulk ${app.path.jobs}/procrustes-flink-1.0-SNAPSHOT.jar ${system.hadoop-2.path.input}/webbase-raw ${system.hadoop-2.path.output}/concomp ${# of iteration} ${# of checkpoint interval}
- Tail checkpoint
/…/flink-0.9-SNAPSHOT/bin/flink run -v -c eu.stratosphere.procrustes.experiments.recovery.ConnectedComponentsBulkLateCpt ${app.path.jobs}/procrustes-flink-1.0-SNAPSHOT.jar ${system.hadoop-2.path.input}/webbase-raw ${system.hadoop-2.path.output}/concomp ${# of iteration} ${# of checkpoint interval}
/…/flink-0.9-SNAPSHOT/bin/flink run -v -c eu.stratosphere.procrustes.experiments.recovery.KMeansPureTuple /…/procrustes-flink-1.0-SNAPSHOT.jar hdfs://… /input/points hdfs://…/input/centroid hdfs://…/output ${# of iteration} ${# of checkpoint interval}