Giter Site home page Giter Site logo

apache / incubator-livy Goto Github PK

View Code? Open in Web Editor NEW
854.0 57.0 594.0 3.55 MB

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Home Page: https://livy.apache.org/

License: Apache License 2.0

Java 26.45% Shell 1.66% Scala 63.16% Python 6.42% R 0.08% JavaScript 0.98% HTML 0.57% CSS 0.08% FreeMarker 0.16% Dockerfile 0.44%
livy bigdata spark apachelivy

incubator-livy's Introduction

Apache Livy

Build Status

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

  • Interactive Scala, Python and R shells
  • Batch submissions in Scala, Java, Python
  • Multiple users can share the same server (impersonation support)
  • Can be used for submitting jobs from anywhere with REST
  • Does not require any code change to your programs

Pull requests are welcomed! But before you begin, please check out the Contributing section on the Community page of our website.

Online Documentation

Guides and documentation on getting started using Livy, example code snippets, and Livy API documentation can be found at livy.incubator.apache.org.

Before Building Livy

To build Livy, you will need:

Debian/Ubuntu:

  • mvn (from maven package or maven3 tarball)
  • openjdk-8-jdk (or Oracle JDK 8)
  • Python 2.7+
  • R 3.x

Redhat/CentOS:

  • mvn (from maven package or maven3 tarball)
  • java-1.8.0-openjdk (or Oracle JDK 8)
  • Python 2.7+
  • R 3.x

MacOS:

  • Xcode command line tools
  • Oracle's JDK 1.8
  • Maven (Homebrew)
  • Python 2.7+
  • R 3.x

Required python packages for building Livy:

  • cloudpickle
  • requests
  • requests-kerberos
  • flake8
  • flaky
  • pytest

To run Livy, you will also need a Spark installation. You can get Spark releases at https://spark.apache.org/downloads.html.

Livy requires Spark 2.4+. You can switch to a different version of Spark by setting the SPARK_HOME environment variable in the Livy server process, without needing to rebuild Livy.

Building Livy

Livy is built using Apache Maven. To check out and build Livy, run:

git clone https://github.com/apache/incubator-livy.git
cd incubator-livy
mvn package

You can also use the provided Dockerfile:

git clone https://github.com/apache/incubator-livy.git
cd incubator-livy
docker build -t livy-ci dev/docker/livy-dev-base/
docker run --rm -it -v $(pwd):/workspace -v $HOME/.m2:/root/.m2 livy-ci mvn package

Note: The docker run command maps the maven repository to your host machine's maven cache so subsequent runs will not need to download dependencies.

By default Livy is built against Apache Spark 2.4.5, but the version of Spark used when running Livy does not need to match the version used to build Livy. Livy internally handles the differences between different Spark versions.

The Livy package itself does not contain a Spark distribution. It will work with any supported version of Spark without needing to rebuild.

Build Profiles

Flag Purpose
-Phadoop2 Choose Hadoop2 based build dependencies (default configuration)
-Pspark2 Choose Spark 2.x based build dependencies (default configuration)
-Pspark3 Choose Spark 3.x based build dependencies
-Pscala-2.11 Choose Scala 2.11 based build dependencies (default configuration)
-Pscala-2.12 Choose scala 2.12 based build dependencies

incubator-livy's People

Contributors

ajbozarth avatar alex-the-man avatar aromanenko-dev avatar arunmahadevan avatar askhatri avatar captainzmc avatar dacort avatar dependabot[bot] avatar gyogal avatar harishreedharan avatar itholic avatar jerryshao avatar jianzhenwu avatar ksumit avatar leesf avatar lmccay avatar lresende avatar lys0716 avatar manikandan89 avatar meisam avatar mgaido91 avatar praveen-kanamarlapudi avatar purechoc avatar rajshekharmuchandi avatar runzhiwang avatar sfuruyam avatar wypoon avatar yantzu avatar yiheng avatar zjffdu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-livy's Issues

{”msg”:“Rejected, Reason: Blacklisted configuration values in session config: spark.submit.deployMode”}

spark version 3.5.1
livy version. 0.8.0

Requesting through the Api gets the following return

org.springframework.web.client.HttpClientErrorException$BadRequest: 400 Bad Request: "{"msg":"Rejected, Reason: Blacklisted configuration values in session config: spark.submit.deployMode"}"

But I don't see any error reported in the logs, is it incompatible with spark 3.5.1

BTW, I also set upconf/spark-blacklist.conf and it doesn't seem to be taking effect

Can Livy not rely on Spark

Livy can interface with Java, Python, Shell, SQL, etc., not necessarily Spark, so it can integrate many languages through jdbc, hive, MySQL, Python, etc

【livy-8.0-2.12 spark3.2.1】kerberos认证问题

livy.conf

livy.server.launch.kerberos.keytab=xxx.keytab
[email protected]

我确定keytab文件和principal是正确的,但是在创建interactive sesison时报错

23/11/10 11:43:22 INFO rpc.RpcServer: Connected to the port 10000
23/11/10 11:43:22 WARN common.ClientConf: Your hostname, into5, resolves to a loopback address, but we couldn't find any external IP address!
23/11/10 11:43:22 WARN common.ClientConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
23/11/10 11:43:22 INFO sessions.InteractiveSessionManager: Registering new session 4
23/11/10 11:43:22 INFO sessions.InteractiveSessionManager: Registered new session 4
23/11/10 11:43:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mec
hanism level: Failed to find any Kerberos tgt)]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Class path contains multiple SLF4J bindings.
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Found binding in [jar:file:/app/spark-3.2.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Found binding in [jar:file:/app/spark-3.2.1-bin-hadoop2.7/jars/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
23/11/10 11:43:27 INFO utils.LineBufferedStream: 23/11/10 11:43:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/11/10 11:43:29 INFO utils.LineBufferedStream: 23/11/10 11:43:29 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
23/11/10 11:43:30 INFO utils.LineBufferedStream: 23/11/10 11:43:30 WARN Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSS
Exception: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:43:30 INFO utils.LineBufferedStream: Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "into5/192.168.1.65"; destination host is: "into1":8020; 
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)

若在spark-default.conf中增加

spark.kerberos.keytab=xxx.keytab
[email protected]

则以上问题解决,但是,livy server日志中一直在输出

23/11/10 11:43:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:44:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:44:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:45:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

请给我提供一些帮助,谢谢!

New livy release request

There have been several updates to master recently, including adding support for later Python versions by fixing a bug that would only allow one line per cell.
Can we possibly have a new release of livy? Amongst other things, this would allow AWS EMR to pull in the latest release with all the fixes.

Build fails when using -Pscala-2.12

Build fails on assembly module due to scala-2.11 dependencies.
I think this is due to the ${scala.binary.version} being used in the modules sections, and it's populated before the profiles are loaded.
When I ensure that scala-2.12 is the default in the <properties> then it works fine.

livy 传参问题

怎么用有两个代码片段,一个session ,通过sql 执行的结果数据集,传递给下一个片段,参数怎么传递,另外spark 3.5.1版本是否支持

livy:What are the plans for the later stage

What are the plans for the later stage, whether they support more engines, whether it is better for tenants, isolation, and context support。
What are the advantages and differences compared to Apache Linkis

和spark 3.2.0 及以上版本兼容问题

现在已经可以使用scala 2.12.15成功编译livy 0.8.0,但是在使用 spark3.2 及 3.3 以上版本时,pyspark 3.3 可以在yarn启动spark,但是提交不了任务,这是代码:
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("test").enableHiveSupport().getOrCreate()

spark.sql("show databases").show()
这是问题报错:
23/05/18 14:11:34 INFO BlockManagerMasterEndpoint: Registering block manager localhost:41641 with 366.3 MiB RAM, BlockManagerId(2, localhost, 41641, None)
23/05/18 14:11:34 INFO SparkEntries: Spark context finished initialization in 16437ms
23/05/18 14:11:34 INFO SparkEntries: Created Spark session.
23/05/18 14:11:41 ERROR PythonInterpreter: Process has died with 1
23/05/18 14:11:41 ERROR PythonInterpreter: Traceback (most recent call last):
File "/tmp/6067082446938324509", line 722, in
sys.exit(main())
File "/tmp/6067082446938324509", line 570, in main
exec('from pyspark.sql import HiveContext', global_dict)
File "", line 1, in
File "/home/cocdkl/soft/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/init.py", line 71
def since(version: Union[str, float]) -> Callable[[F], F]:
^
SyntaxError: invalid syntax

通过hue提交的代码,希望得到解答

livy0.7.1 Request failed

spark version: 3.2.1
live version: release-0.7.1
Request failed:
{"msg":"requirement failed: Cannot find Livy REPL jars."}。
Can you help me?thanks

Escape backtick from spark-submit arguments

Currently, livy does not escape backticks from user-provided spark-submit arguments. So if a customer is passing any arguments that contain backticks, it will be considered as command substitution during spark-submit causing that argument to become blank or invalid.

Example:

--query 'select * from test_db.`test_table`' 

will become

--query 'select * from test_db.' 

The SparkStreaming operator fails to execute

kafka as SparkStreaming input and output

  1. Use spark.readStream.format("kafka") read kafka data and decode binary data to string
  2. Use df.map(_.Seq.foldLeft(""))(_ + separtor + _).writeStream("kafka") output data to kafka
  3. If I fail to output to kafka, then no matter how I change the kafka topic later, the stream computing will fail,ArrayIndexOutOfBoundsException: 1 exception will report.If I only output to the console there will be no error
  4. If I run the same code snippet directly in spark-shell without using livy, the effect is the same as 3

Dockerfile build fails on `livy-server` step

Hi,

I try host locally this solution for apache spark on its language flavors with command run docker build -t livy-ci dev/docker/livy-dev-base/. After some installation steps, the error log appers on terminal. I attempt on Linux Ubuntu 20.04.

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Livy Project Parent POM ............................ SUCCESS [01:26 min]
[INFO] livy-api ........................................... SUCCESS [03:30 min]
[INFO] livy-client-common ................................. SUCCESS [  4.819 s]
[INFO] livy-test-lib ...................................... SUCCESS [  2.996 s]
[INFO] multi-scala-project-root ........................... SUCCESS [  1.042 s]
[INFO] livy-core-parent ................................... SUCCESS [  0.177 s]
[INFO] livy-core_2.11 ..................................... SUCCESS [  9.340 s]
[INFO] livy-rsc ........................................... SUCCESS [ 50.358 s]
[INFO] livy-repl-parent ................................... SUCCESS [ 25.652 s]
[INFO] livy-repl_2.11 ..................................... SUCCESS [04:05 min]
[INFO] livy-server ........................................ FAILURE [ 48.405 s]
[INFO] livy-assembly ...................................... SKIPPED
[INFO] livy-client-http ................................... SKIPPED
[INFO] livy-scala-api-parent .............................. SKIPPED
[INFO] livy-scala-api_2.11 ................................ SKIPPED
[INFO] livy-integration-test .............................. SKIPPED
[INFO] livy-coverage-report ............................... SKIPPED
[INFO] livy-examples ...................................... SKIPPED
[INFO] livy-python-api .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:26 min
[INFO] Finished at: 2023-06-22T21:17:57+00:00
[INFO] Final Memory: 108M/1364M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project livy-server: Could not resolve dependencies for project org.apache.livy:livy-server:jar:0.8.0-incubating-SNAPSHOT: Failed to collect dependencies at io.dropwizard.metrics:metrics-healthchecks:jar:3.1.0: Failed to read artifact descriptor for io.dropwizard.metrics:metrics-healthchecks:jar:3.1.0: Could not transfer artifact io.dropwizard.metrics:metrics-healthchecks:pom:3.1.0 from/to central (https://repo1.maven.org/maven2): Connection reset -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :livy-server

Error[Failed to launch livy session, session status is dead] on connecting to spark through livy using R

I am using sparklyr version 1.7.8
and latest livy version in master branch incubator-livy

On connecting to spark through livy using R

library(sparklyr)
sc <- spark_connect(master = "local", method = "livy", version ="3.1.1")

Command is throwing an error:

Error in livy_connection(master, config, app_name, version, hadoop_version, :
Failed to launch livy session, session status is dead

Could anyone please help me understand what could the issue be.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.