apache / incubator-livy Goto Github PK

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

License: Apache License 2.0

Java 26.45% Shell 1.66% Scala 63.16% Python 6.42% R 0.08% JavaScript 0.98% HTML 0.57% CSS 0.08% FreeMarker 0.16% Dockerfile 0.44%

livy bigdata spark apachelivy

incubator-livy's Introduction

Apache Livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

Interactive Scala, Python and R shells
Batch submissions in Scala, Java, Python
Multiple users can share the same server (impersonation support)
Can be used for submitting jobs from anywhere with REST
Does not require any code change to your programs

Pull requests are welcomed! But before you begin, please check out the Contributing section on the Community page of our website.

Online Documentation

Guides and documentation on getting started using Livy, example code snippets, and Livy API documentation can be found at livy.incubator.apache.org.

Before Building Livy

To build Livy, you will need:

Debian/Ubuntu:

mvn (from maven package or maven3 tarball)
openjdk-8-jdk (or Oracle JDK 8)
Python 2.7+
R 3.x

Redhat/CentOS:

mvn (from maven package or maven3 tarball)
java-1.8.0-openjdk (or Oracle JDK 8)
Python 2.7+
R 3.x

MacOS:

Xcode command line tools
Oracle's JDK 1.8
Maven (Homebrew)
Python 2.7+
R 3.x

Required python packages for building Livy:

cloudpickle
requests
requests-kerberos
flake8
flaky
pytest

To run Livy, you will also need a Spark installation. You can get Spark releases at https://spark.apache.org/downloads.html.

Livy requires Spark 2.4+. You can switch to a different version of Spark by setting the SPARK_HOME environment variable in the Livy server process, without needing to rebuild Livy.

Building Livy

Livy is built using Apache Maven. To check out and build Livy, run:

git clone https://github.com/apache/incubator-livy.git
cd incubator-livy
mvn package

You can also use the provided Dockerfile:

git clone https://github.com/apache/incubator-livy.git
cd incubator-livy
docker build -t livy-ci dev/docker/livy-dev-base/
docker run --rm -it -v $(pwd):/workspace -v $HOME/.m2:/root/.m2 livy-ci mvn package

Note: The docker run command maps the maven repository to your host machine's maven cache so subsequent runs will not need to download dependencies.

By default Livy is built against Apache Spark 2.4.5, but the version of Spark used when running Livy does not need to match the version used to build Livy. Livy internally handles the differences between different Spark versions.

The Livy package itself does not contain a Spark distribution. It will work with any supported version of Spark without needing to rebuild.

Build Profiles

Flag	Purpose
-Phadoop2	Choose Hadoop2 based build dependencies (default configuration)
-Pspark2	Choose Spark 2.x based build dependencies (default configuration)
-Pspark3	Choose Spark 3.x based build dependencies
-Pscala-2.11	Choose Scala 2.11 based build dependencies (default configuration)
-Pscala-2.12	Choose scala 2.12 based build dependencies

incubator-livy's People

Contributors

Stargazers

Watchers

Forkers

jerryshao ajbozarth zjffdu curryhuang pranayhasan zzcclp praveen-kanamarlapudi jakhani weiqingy teamdigitale meisam victsm shenh062326 codingcat claymoreterasa manavbagai erebo jinxliu pravin-raha davidxdh erenavsarogullari an243201509 ericjperry comtef mathurtx mitech11 aa8y accedian markfjohnson nikolayvoronchikhin allenn zhaoshiling1017 kjmrknsn bhagyashekar aromanenko-dev oopsoutofmemory amarouni engrsriram avneet15 altiscale dut-xiayuan hidhineshraja profunctor wangxiaobaidu11 chenfj068 liujiantong gdtm86 arijitt jaykelin reve1988 coderanger fishjoy idzikovsky fabiofumarola risdenk qiangcai phalodi logicalclocks ghenghao mineo buymoon wary mlog-engineer pmsgd sharpray ingoschuster nikpanos raghavendramanandi m-wcislo nokia chen0yang webweaves hyukjinkwon windpiger wangshisan wwjiang007 theshadow89 santhoshprasad lazycrazyowl kaushikranjan taoliseki mgaido91 eschizoid million-hailu-ring analysis230 tecsisa wangxuelei goungoun scuzhangp123 databand-ai lethetann javalibrary llbqhh vanzin yatian firemonk9 beautifulnow1992 liuxiaohu-geek amoghantarkarrs jannopaas

incubator-livy's Issues

{”msg”:“Rejected, Reason: Blacklisted configuration values in session config: spark.submit.deployMode”}

spark version 3.5.1
livy version. 0.8.0

Requesting through the Api gets the following return

org.springframework.web.client.HttpClientErrorException$BadRequest: 400 Bad Request: "{"msg":"Rejected, Reason: Blacklisted configuration values in session config: spark.submit.deployMode"}"

But I don't see any error reported in the logs, is it incompatible with spark 3.5.1

BTW, I also set upconf/spark-blacklist.conf and it doesn't seem to be taking effect

Can Livy not rely on Spark

Livy can interface with Java, Python, Shell, SQL, etc., not necessarily Spark, so it can integrate many languages through jdbc, hive, MySQL, Python, etc

【livy-8.0-2.12 spark3.2.1】kerberos认证问题

livy.conf

livy.server.launch.kerberos.keytab=xxx.keytab
[email protected]

我确定keytab文件和principal是正确的，但是在创建interactive sesison时报错

23/11/10 11:43:22 INFO rpc.RpcServer: Connected to the port 10000
23/11/10 11:43:22 WARN common.ClientConf: Your hostname, into5, resolves to a loopback address, but we couldn't find any external IP address!
23/11/10 11:43:22 WARN common.ClientConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
23/11/10 11:43:22 INFO sessions.InteractiveSessionManager: Registering new session 4
23/11/10 11:43:22 INFO sessions.InteractiveSessionManager: Registered new session 4
23/11/10 11:43:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mec
hanism level: Failed to find any Kerberos tgt)]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Class path contains multiple SLF4J bindings.
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Found binding in [jar:file:/app/spark-3.2.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Found binding in [jar:file:/app/spark-3.2.1-bin-hadoop2.7/jars/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
23/11/10 11:43:27 INFO utils.LineBufferedStream: 23/11/10 11:43:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/11/10 11:43:29 INFO utils.LineBufferedStream: 23/11/10 11:43:29 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
23/11/10 11:43:30 INFO utils.LineBufferedStream: 23/11/10 11:43:30 WARN Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSS
Exception: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:43:30 INFO utils.LineBufferedStream: Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "into5/192.168.1.65"; destination host is: "into1":8020; 
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)

若在spark-default.conf中增加

spark.kerberos.keytab=xxx.keytab
[email protected]

则以上问题解决，但是，livy server日志中一直在输出

23/11/10 11:43:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:44:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:44:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:45:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

请给我提供一些帮助，谢谢！

New livy release request

There have been several updates to master recently, including adding support for later Python versions by fixing a bug that would only allow one line per cell.
Can we possibly have a new release of livy? Amongst other things, this would allow AWS EMR to pull in the latest release with all the fixes.

Build fails when using -Pscala-2.12

Build fails on assembly module due to scala-2.11 dependencies.
I think this is due to the ${scala.binary.version} being used in the modules sections, and it's populated before the profiles are loaded.
When I ensure that scala-2.12 is the default in the <properties> then it works fine.

livy 传参问题

怎么用有两个代码片段，一个session ，通过sql 执行的结果数据集，传递给下一个片段，参数怎么传递，另外spark 3.5.1版本是否支持

livy:What are the plans for the later stage

What are the plans for the later stage, whether they support more engines, whether it is better for tenants, isolation, and context support。
What are the advantages and differences compared to Apache Linkis

和spark 3.2.0 及以上版本兼容问题

现在已经可以使用scala 2.12.15成功编译livy 0.8.0，但是在使用 spark3.2 及 3.3 以上版本时，pyspark 3.3 可以在yarn启动spark，但是提交不了任务，这是代码：
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("test").enableHiveSupport().getOrCreate()

spark.sql("show databases").show()
这是问题报错：
23/05/18 14:11:34 INFO BlockManagerMasterEndpoint: Registering block manager localhost:41641 with 366.3 MiB RAM, BlockManagerId(2, localhost, 41641, None)
23/05/18 14:11:34 INFO SparkEntries: Spark context finished initialization in 16437ms
23/05/18 14:11:34 INFO SparkEntries: Created Spark session.
23/05/18 14:11:41 ERROR PythonInterpreter: Process has died with 1
23/05/18 14:11:41 ERROR PythonInterpreter: Traceback (most recent call last):
File "/tmp/6067082446938324509", line 722, in
sys.exit(main())
File "/tmp/6067082446938324509", line 570, in main
exec('from pyspark.sql import HiveContext', global_dict)
File "", line 1, in
File "/home/cocdkl/soft/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/init.py", line 71
def since(version: Union[str, float]) -> Callable[[F], F]:
^
SyntaxError: invalid syntax

通过hue提交的代码，希望得到解答

livy0.7.1 Request failed

spark version: 3.2.1
live version: release-0.7.1
Request failed：
{"msg":"requirement failed: Cannot find Livy REPL jars."}。
Can you help me？thanks

Build with scala 2.12.15 is failing, spark3.3.1, hadoop:3.3.4, hive:3.1..3

Exception when compiling 13 sources to /opt/incubator-livy/test-lib/target/classes
java.lang.NoSuchMethodError: scala.tools.nsc.Settings.nowarn()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;

can user auto run the init script when the livy session is created?

e.g: I have a script: init.scala , created some common functions inside this script. init.scala should run first when I started livy session

Escape backtick from spark-submit arguments

Currently, livy does not escape backticks from user-provided spark-submit arguments. So if a customer is passing any arguments that contain backticks, it will be considered as command substitution during spark-submit causing that argument to become blank or invalid.

Example:

--query 'select * from test_db.`test_table`'

will become

--query 'select * from test_db.'

The SparkStreaming operator fails to execute

kafka as SparkStreaming input and output

Use spark.readStream.format("kafka") read kafka data and decode binary data to string
Use df.map(_.Seq.foldLeft(""))(_ + separtor + _).writeStream("kafka") output data to kafka
If I fail to output to kafka, then no matter how I change the kafka topic later, the stream computing will fail,ArrayIndexOutOfBoundsException: 1 exception will report.If I only output to the console there will be no error
If I run the same code snippet directly in spark-shell without using livy, the effect is the same as 3

Dockerfile build fails on `livy-server` step

Hi,

I try host locally this solution for apache spark on its language flavors with command run docker build -t livy-ci dev/docker/livy-dev-base/. After some installation steps, the error log appers on terminal. I attempt on Linux Ubuntu 20.04.

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Livy Project Parent POM ............................ SUCCESS [01:26 min]
[INFO] livy-api ........................................... SUCCESS [03:30 min]
[INFO] livy-client-common ................................. SUCCESS [  4.819 s]
[INFO] livy-test-lib ...................................... SUCCESS [  2.996 s]
[INFO] multi-scala-project-root ........................... SUCCESS [  1.042 s]
[INFO] livy-core-parent ................................... SUCCESS [  0.177 s]
[INFO] livy-core_2.11 ..................................... SUCCESS [  9.340 s]
[INFO] livy-rsc ........................................... SUCCESS [ 50.358 s]
[INFO] livy-repl-parent ................................... SUCCESS [ 25.652 s]
[INFO] livy-repl_2.11 ..................................... SUCCESS [04:05 min]
[INFO] livy-server ........................................ FAILURE [ 48.405 s]
[INFO] livy-assembly ...................................... SKIPPED
[INFO] livy-client-http ................................... SKIPPED
[INFO] livy-scala-api-parent .............................. SKIPPED
[INFO] livy-scala-api_2.11 ................................ SKIPPED
[INFO] livy-integration-test .............................. SKIPPED
[INFO] livy-coverage-report ............................... SKIPPED
[INFO] livy-examples ...................................... SKIPPED
[INFO] livy-python-api .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:26 min
[INFO] Finished at: 2023-06-22T21:17:57+00:00
[INFO] Final Memory: 108M/1364M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project livy-server: Could not resolve dependencies for project org.apache.livy:livy-server:jar:0.8.0-incubating-SNAPSHOT: Failed to collect dependencies at io.dropwizard.metrics:metrics-healthchecks:jar:3.1.0: Failed to read artifact descriptor for io.dropwizard.metrics:metrics-healthchecks:jar:3.1.0: Could not transfer artifact io.dropwizard.metrics:metrics-healthchecks:pom:3.1.0 from/to central (https://repo1.maven.org/maven2): Connection reset -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :livy-server

Error[Failed to launch livy session, session status is dead] on connecting to spark through livy using R

I am using sparklyr version 1.7.8
and latest livy version in master branch incubator-livy

On connecting to spark through livy using R

library(sparklyr)
sc <- spark_connect(master = "local", method = "livy", version ="3.1.1")

Command is throwing an error:

Error in livy_connection(master, config, app_name, version, hadoop_version, :
Failed to launch livy session, session status is dead

Could anyone please help me understand what could the issue be.

Does Livy support Flink

Does it support FlinkSQL, PyFlink, and Flinkml