参考文章如下:
https://blog.csdn.net/camel84/article/details/81990383
https://cloud.tencent.com/developer/article/1078857
Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.
- Interactive Scala, Python and R shells
- Batch submissions in Scala, Java, Python
- Multiple users can share the same server (impersonation support)
- Can be used for submitting jobs from anywhere with REST
- Does not require any code change to your programs
以上是Livy
的官方介绍,具体使用请参照这篇文章。
大体思路是用 Java
模拟发送请求报文给 Livy
,
参见这篇文档
上传测试所用的jar
包到hdfs
。
export HADOOP_USER_NAME=hdfs
${HADOOP_HOME}/bin/hdfs dfs -mkdir /testJars
${HADOOP_HOME}/bin/hdfs dfs -put /opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/examples/jars/spark-examples_2.11-2.3.0.cloudera4.jar /testJars/
SparkJob job = new SparkJob();
job.setFile("hdfs://192.168.1.170:8020/testJars/spark-examples_2.11-2.3.0.cloudera4.jar");
job.setClassName("org.apache.spark.examples.SparkPi");
job.setName("SparkPi");
job.setExecutorCores(3);
int sparkJobID = livyService.startSparkJob(job);
if (sparkJobID > 0) {
System.out.println("\n创建任务,任务ID为:\n" + sparkJobID);
Map<String, Object> activeSparkJobs = livyService.getActiveSparkJobs();
System.out.println("\n查询当前所有任务:\n" + activeSparkJobs.toString());
Map<String, Object> info = livyService.getSparkJobInfo(sparkJobID);
System.out.println("\n查询任务ID为" + sparkJobID + "的任务详情:\n" + info.toString());
SparkJobState state = livyService.getSparkJobState(sparkJobID);
System.out.println("\n查询任务ID为" + sparkJobID + "的任务状态:\n" + state);
Map<String, Object> log = livyService.getSparkJoblog(sparkJobID);
System.out.println("\n查询任务ID为" + sparkJobID + "的任务日志:\n" + log.toString());
// Map<String, Object> del = livyService.deleteSparkJob(sparkJobID);
// System.out.println("删除任务ID为" + sparkJobID + "\n" + del.toString());
}
// 执行任务,一直到任务结束
// System.out.println(runSparkJob(job));