Current workaround for me is to pass <code class="notran

[Improvement][Spark] Support Local Spark Cluster about dolphinscheduler HOT 3 OPEN

git-blame commented on June 18, 2024

[Improvement][Spark] Support Local Spark Cluster

from dolphinscheduler.

Comments (3)

pegasas commented on June 18, 2024

I would like to have a try on this issue.

from dolphinscheduler.

git-blame commented on June 18, 2024

Current workaround for me is to pass --master ... --deploy-mode cluster in the extra options. Since spark-submit will use the last values, this will send task to local cluster. For example look at this log which has my own --master option which overrides Dolphin --master local:

[INFO] 2024-02-02 14:27:38.934 +0000 - Final Shell file is : 
#!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.0-bin-hadoop3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
${SPARK_HOME}/bin/spark-submit --master local 
--class com.example.monitor.ScanMonitor --conf spark.driver.cores=1 --conf spark.driver.memory=512M 
--conf spark.executor.instances=2 --conf spark.executor.cores=2 
--conf spark.executor.memory=2G 
--master spark://devel:7077 --deploy-mode cluster 
file:/opt/apache-dolphinscheduler-3.2.0-bin/standalone-server/files/default/resources/monitor-0.1-jdk11.jar producer
...
24/02/02 14:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20240202142754-0003
2024-02-02 14:28:00.038 +0000 -  -> 
24/02/02 14:27:59 INFO ClientEndpoint: State of driver-20240202142754-0003 is RUNNING
24/02/02 14:27:59 INFO ClientEndpoint: Driver running on 172.16.254.204:35595 (worker-20240202141308-172.16.254.204-35595)
24/02/02 14:27:59 INFO ClientEndpoint: spark-submit not configured to wait for completion, exiting spark-submit JVM.

from dolphinscheduler.

pegasas commented on June 18, 2024

[INFO] 2024-02-02 14:27:38.934 +0000 - Final Shell file is : 
#!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.0-bin-hadoop3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
${SPARK_HOME}/bin/spark-submit --master local 
--class com.example.monitor.ScanMonitor --conf spark.driver.cores=1 --conf spark.driver.memory=512M 
--conf spark.executor.instances=2 --conf spark.executor.cores=2 
--conf spark.executor.memory=2G 
--master spark://devel:7077 --deploy-mode cluster 
file:/opt/apache-dolphinscheduler-3.2.0-bin/standalone-server/files/default/resources/monitor-0.1-jdk11.jar producer
...
24/02/02 14:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20240202142754-0003
2024-02-02 14:28:00.038 +0000 -  -> 
24/02/02 14:27:59 INFO ClientEndpoint: State of driver-20240202142754-0003 is RUNNING
24/02/02 14:27:59 INFO ClientEndpoint: Driver running on 172.16.254.204:35595 (worker-20240202141308-172.16.254.204-35595)
24/02/02 14:27:59 INFO ClientEndpoint: spark-submit not configured to wait for completion, exiting spark-submit JVM.

Thanks @git-blame for quick work around, indeed it will work in the extra options, but master is a important parameter among spark as mentioned.

I will communicate with community to see if it is by design in previous discussions.

If not, I will add paramater into spark task.

from dolphinscheduler.

[Improvement][Spark] Support Local Spark Cluster about dolphinscheduler HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent