Giter Site home page Giter Site logo

apache / incubator-livy Goto Github PK

View Code? Open in Web Editor NEW
856.0 57.0 594.0 3.55 MB

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Home Page: https://livy.apache.org/

License: Apache License 2.0

Java 26.45% Shell 1.66% Scala 63.16% Python 6.42% R 0.08% JavaScript 0.98% HTML 0.57% CSS 0.08% FreeMarker 0.16% Dockerfile 0.44%
livy bigdata spark apachelivy

incubator-livy's Issues

livy0.7.1 Request failed

spark version: 3.2.1
live version: release-0.7.1
Request failed:
{"msg":"requirement failed: Cannot find Livy REPL jars."}。
Can you help me?thanks

和spark 3.2.0 及以上版本兼容问题

现在已经可以使用scala 2.12.15成功编译livy 0.8.0,但是在使用 spark3.2 及 3.3 以上版本时,pyspark 3.3 可以在yarn启动spark,但是提交不了任务,这是代码:
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("test").enableHiveSupport().getOrCreate()

spark.sql("show databases").show()
这是问题报错:
23/05/18 14:11:34 INFO BlockManagerMasterEndpoint: Registering block manager localhost:41641 with 366.3 MiB RAM, BlockManagerId(2, localhost, 41641, None)
23/05/18 14:11:34 INFO SparkEntries: Spark context finished initialization in 16437ms
23/05/18 14:11:34 INFO SparkEntries: Created Spark session.
23/05/18 14:11:41 ERROR PythonInterpreter: Process has died with 1
23/05/18 14:11:41 ERROR PythonInterpreter: Traceback (most recent call last):
File "/tmp/6067082446938324509", line 722, in
sys.exit(main())
File "/tmp/6067082446938324509", line 570, in main
exec('from pyspark.sql import HiveContext', global_dict)
File "", line 1, in
File "/home/cocdkl/soft/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/init.py", line 71
def since(version: Union[str, float]) -> Callable[[F], F]:
^
SyntaxError: invalid syntax

通过hue提交的代码,希望得到解答

【livy-8.0-2.12 spark3.2.1】kerberos认证问题

livy.conf

livy.server.launch.kerberos.keytab=xxx.keytab
[email protected]

我确定keytab文件和principal是正确的,但是在创建interactive sesison时报错

23/11/10 11:43:22 INFO rpc.RpcServer: Connected to the port 10000
23/11/10 11:43:22 WARN common.ClientConf: Your hostname, into5, resolves to a loopback address, but we couldn't find any external IP address!
23/11/10 11:43:22 WARN common.ClientConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
23/11/10 11:43:22 INFO sessions.InteractiveSessionManager: Registering new session 4
23/11/10 11:43:22 INFO sessions.InteractiveSessionManager: Registered new session 4
23/11/10 11:43:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mec
hanism level: Failed to find any Kerberos tgt)]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Class path contains multiple SLF4J bindings.
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Found binding in [jar:file:/app/spark-3.2.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Found binding in [jar:file:/app/spark-3.2.1-bin-hadoop2.7/jars/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
23/11/10 11:43:25 INFO utils.LineBufferedStream: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
23/11/10 11:43:27 INFO utils.LineBufferedStream: 23/11/10 11:43:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/11/10 11:43:29 INFO utils.LineBufferedStream: 23/11/10 11:43:29 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
23/11/10 11:43:30 INFO utils.LineBufferedStream: 23/11/10 11:43:30 WARN Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSS
Exception: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:43:30 INFO utils.LineBufferedStream: Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "into5/192.168.1.65"; destination host is: "into1":8020; 
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
23/11/10 11:43:30 INFO utils.LineBufferedStream:        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)

若在spark-default.conf中增加

spark.kerberos.keytab=xxx.keytab
[email protected]

则以上问题解决,但是,livy server日志中一直在输出

23/11/10 11:43:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:44:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:44:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
23/11/10 11:45:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

请给我提供一些帮助,谢谢!

Error[Failed to launch livy session, session status is dead] on connecting to spark through livy using R

I am using sparklyr version 1.7.8
and latest livy version in master branch incubator-livy

On connecting to spark through livy using R

library(sparklyr)
sc <- spark_connect(master = "local", method = "livy", version ="3.1.1")

Command is throwing an error:

Error in livy_connection(master, config, app_name, version, hadoop_version, :
Failed to launch livy session, session status is dead

Could anyone please help me understand what could the issue be.

The SparkStreaming operator fails to execute

kafka as SparkStreaming input and output

  1. Use spark.readStream.format("kafka") read kafka data and decode binary data to string
  2. Use df.map(_.Seq.foldLeft(""))(_ + separtor + _).writeStream("kafka") output data to kafka
  3. If I fail to output to kafka, then no matter how I change the kafka topic later, the stream computing will fail,ArrayIndexOutOfBoundsException: 1 exception will report.If I only output to the console there will be no error
  4. If I run the same code snippet directly in spark-shell without using livy, the effect is the same as 3

Build fails when using -Pscala-2.12

Build fails on assembly module due to scala-2.11 dependencies.
I think this is due to the ${scala.binary.version} being used in the modules sections, and it's populated before the profiles are loaded.
When I ensure that scala-2.12 is the default in the <properties> then it works fine.

Escape backtick from spark-submit arguments

Currently, livy does not escape backticks from user-provided spark-submit arguments. So if a customer is passing any arguments that contain backticks, it will be considered as command substitution during spark-submit causing that argument to become blank or invalid.

Example:

--query 'select * from test_db.`test_table`' 

will become

--query 'select * from test_db.' 

livy 传参问题

怎么用有两个代码片段,一个session ,通过sql 执行的结果数据集,传递给下一个片段,参数怎么传递,另外spark 3.5.1版本是否支持

livy:What are the plans for the later stage

What are the plans for the later stage, whether they support more engines, whether it is better for tenants, isolation, and context support。
What are the advantages and differences compared to Apache Linkis

Can Livy not rely on Spark

Livy can interface with Java, Python, Shell, SQL, etc., not necessarily Spark, so it can integrate many languages through jdbc, hive, MySQL, Python, etc

New livy release request

There have been several updates to master recently, including adding support for later Python versions by fixing a bug that would only allow one line per cell.
Can we possibly have a new release of livy? Amongst other things, this would allow AWS EMR to pull in the latest release with all the fixes.

{”msg”:“Rejected, Reason: Blacklisted configuration values in session config: spark.submit.deployMode”}

spark version 3.5.1
livy version. 0.8.0

Requesting through the Api gets the following return

org.springframework.web.client.HttpClientErrorException$BadRequest: 400 Bad Request: "{"msg":"Rejected, Reason: Blacklisted configuration values in session config: spark.submit.deployMode"}"

But I don't see any error reported in the logs, is it incompatible with spark 3.5.1

BTW, I also set upconf/spark-blacklist.conf and it doesn't seem to be taking effect

Dockerfile build fails on `livy-server` step

Hi,

I try host locally this solution for apache spark on its language flavors with command run docker build -t livy-ci dev/docker/livy-dev-base/. After some installation steps, the error log appers on terminal. I attempt on Linux Ubuntu 20.04.

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Livy Project Parent POM ............................ SUCCESS [01:26 min]
[INFO] livy-api ........................................... SUCCESS [03:30 min]
[INFO] livy-client-common ................................. SUCCESS [  4.819 s]
[INFO] livy-test-lib ...................................... SUCCESS [  2.996 s]
[INFO] multi-scala-project-root ........................... SUCCESS [  1.042 s]
[INFO] livy-core-parent ................................... SUCCESS [  0.177 s]
[INFO] livy-core_2.11 ..................................... SUCCESS [  9.340 s]
[INFO] livy-rsc ........................................... SUCCESS [ 50.358 s]
[INFO] livy-repl-parent ................................... SUCCESS [ 25.652 s]
[INFO] livy-repl_2.11 ..................................... SUCCESS [04:05 min]
[INFO] livy-server ........................................ FAILURE [ 48.405 s]
[INFO] livy-assembly ...................................... SKIPPED
[INFO] livy-client-http ................................... SKIPPED
[INFO] livy-scala-api-parent .............................. SKIPPED
[INFO] livy-scala-api_2.11 ................................ SKIPPED
[INFO] livy-integration-test .............................. SKIPPED
[INFO] livy-coverage-report ............................... SKIPPED
[INFO] livy-examples ...................................... SKIPPED
[INFO] livy-python-api .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:26 min
[INFO] Finished at: 2023-06-22T21:17:57+00:00
[INFO] Final Memory: 108M/1364M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project livy-server: Could not resolve dependencies for project org.apache.livy:livy-server:jar:0.8.0-incubating-SNAPSHOT: Failed to collect dependencies at io.dropwizard.metrics:metrics-healthchecks:jar:3.1.0: Failed to read artifact descriptor for io.dropwizard.metrics:metrics-healthchecks:jar:3.1.0: Could not transfer artifact io.dropwizard.metrics:metrics-healthchecks:pom:3.1.0 from/to central (https://repo1.maven.org/maven2): Connection reset -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :livy-server

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.