nvidia / spark-xgboost-examples Goto Github PK
View Code? Open in Web Editor NEWXGBoost GPU accelerated on Spark example applications
License: Apache License 2.0
XGBoost GPU accelerated on Spark example applications
License: Apache License 2.0
Hi
I have trained and saved the xgboost pyspark using model.booster.save_model(path).
I loaded into xgboost python using xgb.booster(path)
The predictions of the XGBoost pyspark and python are different and the accuracy was dropped by 2 to 5% on the same data.
I have trained a XGBoost Python model with same features (from the pyspark). The results are different from Pyspark. The accuracy was very low in python when compared to pyspark.
Can you help why this was happening in both the cases.
Describe the bug
While running the mortgage notebook in https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/examples/notebooks/python/mortgage-gpu.ipynb
Error: Method setFeaturesCols([class scala.collection.convert.Wrappers$JListWrapper]) does not exist
Steps/Code to reproduce bug
Followed steps mentioned here to Setup https://github.com/NVIDIA/spark-xgboost-examples/tree/spark-3/getting-started-guides/csp/databricks
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete the following information)
Hi
I am using XGBoost Spark 3.0 GPU version
I couldn't find featureImportances method for the model object. Can you guide me how to get feature importances from the trained model.
and
Can you share any notebook or code for hyper-parameter tuning using hyperopt, If you already have it.
Thanks in advance
I am getting error when trying to run Nyc Taxi or mortgage examples with Spark 3.1.2 operator in Kubernetes. We are submitting our Sparkapplication via Kubectl and getting below error. I tried with different version of spark catalyst jar (3.0.0 and 3.1.2) but still same.
Traceback (most recent call last):
File "/tmp/spark-a0673c21-9c04-4ba0-ae54-13b825af94e7/mortgage.py", line 78, in
model = with_benchmark('Training', lambda: classifier.fit(train_data))
File "/tmp/spark-a0673c21-9c04-4ba0-ae54-13b825af94e7/mortgage.py", line 74, in with_benchmark
result = action()
File "/tmp/spark-a0673c21-9c04-4ba0-ae54-13b825af94e7/mortgage.py", line 78, in
model = with_benchmark('Training', lambda: classifier.fit(train_data))
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 161, in fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 335, in _fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 332, in _fit_java
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in call
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o82.fit.
: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/TimeSub
at com.nvidia.spark.rapids.shims.spark300.Spark300Shims.getExprs(Spark300Shims.scala:251)
at com.nvidia.spark.rapids.shims.spark301.Spark301Shims.getExprs(Spark301Shims.scala:84)
at com.nvidia.spark.rapids.GpuOverrides$.(GpuOverrides.scala:2544)
at com.nvidia.spark.rapids.GpuOverrides$.(GpuOverrides.scala)
at org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.convert(InternalColumnarRddConverter.scala:477)
at com.nvidia.spark.rapids.ColumnarRdd$.convert(ColumnarRdd.scala:47)
at com.nvidia.spark.rapids.ColumnarRdd.convert(ColumnarRdd.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuUtils$.toColumnarRdd(GpuUtils.scala:39)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpuInternal(GpuXGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainDistributedOnGpu(GpuXGBoost.scala:186)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpu(GpuXGBoost.scala:91)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.fitOnGpu(GpuXGBoost.scala:52)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:170)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/TimeSub
... 29 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.expressions.TimeSub
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 29 more
I am trying to run the sample mortgage notebook in spark 3.0 with Databricks 7.0 environment.
Followed the steps mentioned in 7.0 initialization notebook.
Library install was successful for dbfs:/FileStore/jars/xgboost4j-spark_3.0-1.0.0-0.1.0.jar
Any pointer to fix this will be helpful. Thanks
Hi
i am trying to run spark-xgboost-examples with scala and python
but i got the same error ,
ml.dmlc.xgboost4j.java.XGBoostError: [12:56:19] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490:
Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)
My environment:
OS : Ubuntu 18.04.4
GPU : 4*V100 (16G)
Driver : 418.87.01
CUDA : V10.1.243
NCCL : 2.4.7-1+cuda10.1
java : java-1.8.0-openjdk-amd64
scala : scala-2.12.10
hadoop : hadoop-3.1.3
spark : spark-3.0.0-bin-hadoop3.2
root@master:~# dpkg -l | grep nccl
ii libnccl-dev 2.4.7-1+cuda10.1 amd64 NVIDIA Collectives Communication Library (NCCL) Development Files
ii libnccl2 2.4.7-1+cuda10.1 amd64 NVIDIA Collectives Communication Library (NCCL) Runtime
ii nccl-repo-ubuntu1804-2.4.7-ga-cuda10.1 1-1 amd64 nccl repository configuration files
spark-submit command
export DATA_PATH=hdfs:/tmp/xgboost4j_spark/data
export JARS_PATH=hdfs:/tmp/xgboost4j_spark/jars
export EXAMPLE_CLASS=com.nvidia.spark.examples.mortgage.GPUMain
export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.2.2-jar-with-dependencies.jar
export JAR_RAPIDS=${JARS_PATH}/rapids-4-spark_2.12-0.1.0.jar
${SPARK_HOME}/bin/spark-submit \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.memory.gpu.pooling.enabled=false \
--conf spark.executor.resource.gpu.amount=1 \
--conf spark.task.resource.gpu.amount=1 \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--conf spark.task.cpus=1 \
--files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh \
--jars ${JAR_RAPIDS},${JAR_EXAMPLE} \
--master yarn \
--deploy-mode client \
--num-executors 2 \
--executor-cores 1 \
--driver-memory 4g \
--executor-memory 8g \
--class ${EXAMPLE_CLASS} \
${JAR_EXAMPLE} \
-dataPath=train::${DATA_PATH}/mortgage/csv/train/mortgage_train_merged.csv \
-dataPath=trans::${DATA_PATH}/mortgage/csv/test/mortgage_eval_merged.csv \
-format=csv \
-numWorkers=2 \
-treeMethod=gpu_hist \
-numRound=100 \
-maxDepth=8
container/stderr
2020-07-06 12:01:41,790 INFO executor.YarnCoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@master:37781
2020-07-06 12:01:41,803 INFO resource.ResourceDiscoveryScriptPlugin: Discovering resources for gpu with script: ./getGpusResources.sh
2020-07-06 12:01:41,938 INFO resource.ResourceUtils: ==============================================================
2020-07-06 12:01:41,939 INFO resource.ResourceUtils: Resources for spark.executor:
gpu -> [name: gpu, addresses: 0]
2020-07-06 12:01:41,939 INFO resource.ResourceUtils: ==============================================================
2020-07-06 12:01:42,060 INFO executor.YarnCoarseGrainedExecutorBackend: Successfully registered with driver
2020-07-06 12:01:42,064 INFO executor.Executor: Starting executor ID 1 on host master
2020-07-06 12:01:42,152 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44915.
2020-07-06 12:01:42,152 INFO netty.NettyBlockTransferService: Server created on master:44915
2020-07-06 12:01:42,154 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-07-06 12:01:42,162 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(1, master, 44915, None)
2020-07-06 12:01:42,174 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(1, master, 44915, None)
2020-07-06 12:01:42,176 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(1, master, 44915, None)
2020-07-06 12:01:42,209 INFO rapids.RapidsExecutorPlugin: Initializing memory from Executor Plugin
2020-07-06 12:01:50,020 INFO rapids.GpuDeviceManager: Initializing RMM 14517.44921875 MB on gpuId 0
2020-07-06 12:01:50,035 INFO plugin.ExecutorPluginContainer: Initialized executor component for plugin com.nvidia.spark.SQLPlugin.
2020-07-06 12:01:50,135 INFO executor.YarnCoarseGrainedExecutorBackend: Got assigned task 0
2020-07-06 12:01:50,144 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
2020-07-06 12:01:50,250 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1 with 1 pieces (estimated total size 4.0 MiB)
2020-07-06 12:01:50,302 INFO client.TransportClientFactory: Successfully created connection to master/172.16.2.17:37137 after 2 ms (0 ms spent in bootstraps)
2020-07-06 12:01:50,349 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 7.4 KiB, free 4.1 GiB)
2020-07-06 12:01:50,359 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 109 ms
2020-07-06 12:01:50,433 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 14.6 KiB, free 4.1 GiB)
2020-07-06 12:01:50,812 INFO datasources.FileScanRDD: Reading File path: hdfs://master:9000/tmp/xgboost4j_spark/data/mortgage/csv/train/mortgage_train_merged.csv, range: 0-993993, partition values: [empty row]
2020-07-06 12:01:50,814 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0 with 1 pieces (estimated total size 4.0 MiB)
2020-07-06 12:01:50,824 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 43.2 KiB, free 4.1 GiB)
2020-07-06 12:01:50,827 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 13 ms
2020-07-06 12:01:50,865 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 600.3 KiB, free 4.1 GiB)
2020-07-06 12:01:51,592 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2866 bytes result sent to driver
2020-07-06 12:01:51,643 INFO executor.YarnCoarseGrainedExecutorBackend: Got assigned task 1
2020-07-06 12:01:51,643 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
2020-07-06 12:01:51,648 INFO spark.MapOutputTrackerWorker: Updating epoch to 1 and clearing cache
2020-07-06 12:01:51,666 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2 with 1 pieces (estimated total size 4.0 MiB)
2020-07-06 12:01:51,673 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 5.6 KiB, free 4.1 GiB)
2020-07-06 12:01:51,675 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 8 ms
2020-07-06 12:01:51,676 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 9.8 KiB, free 4.1 GiB)
2020-07-06 12:01:51,847 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 0, fetching them
2020-07-06 12:01:51,848 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@master:37781)
2020-07-06 12:01:51,940 INFO spark.MapOutputTrackerWorker: Got the output locations
2020-07-06 12:01:52,092 INFO storage.ShuffleBlockFetcherIterator: Getting 1 (34.9 KiB) non-empty blocks including 1 (34.9 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) remote blocks
2020-07-06 12:01:52,093 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms
2020-07-06 12:01:52,105 INFO XGBoostSpark: XGboost GPU training using device: 0
2020-07-06 12:01:52,251 INFO java.DMatrix: load XGBoost libs
2020-07-06 12:01:52,257 INFO java.EnvironmentDetector: Found CUDA version from /usr/local/cuda/version.txt: 10.1.243
2020-07-06 12:01:52,257 INFO java.NativeLibLoader: found folder cuda10.1/ for CUDA 10.1.243
2020-07-06 12:01:57,055 ERROR XGBoostSpark: XGBooster worker 0 has failed due to
ml.dmlc.xgboost4j.java.XGBoostError: [12:01:57] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490:
Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)
Stack trace:
[bt] (0) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x35) [0x7f86edec2e35]
[bt] (1) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x15a6) [0x7f86ee146076]
[bt] (2) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x4e3) [0x7f86edf583e3]
[bt] (3) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xc29) [0x7f86edf5a5e9]
[bt] (4) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x2b2) [0x7f86edf74302]
[bt] (5) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(XGBoosterUpdateOneIter+0x29) [0x7f86edec6f49]
[bt] (6) [0x7f8811018427]
at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:50)
at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:181)
at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:190)
at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:68)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.buildDistributedBooster(XGBoost.scala:210)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainPreferGpu$1(XGBoost.scala:592)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-07-06 12:01:57,059 WARN storage.BlockManager: Putting block rdd_10_0 failed due to exception ml.dmlc.xgboost4j.java.XGBoostError: [12:01:57] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490: Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)
Stack trace:
[bt] (0) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x35) [0x7f86edec2e35]
[bt] (1) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x15a6) [0x7f86ee146076]
[bt] (2) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x4e3) [0x7f86edf583e3]
[bt] (3) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xc29) [0x7f86edf5a5e9]
[bt] (4) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x2b2) [0x7f86edf74302]
[bt] (5) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(XGBoosterUpdateOneIter+0x29) [0x7f86edec6f49]
[bt] (6) [0x7f8811018427]
.
2020-07-06 12:01:57,060 WARN storage.BlockManager: Block rdd_10_0 could not be removed as it was not found on disk or in memory
2020-07-06 12:01:57,067 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
ml.dmlc.xgboost4j.java.XGBoostError: [12:01:57] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490: Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)
Describe the bug
When trying to train an XGBoost classifier with GPU's, it produces the following error:
IllegalArgumentException: features does not exist
Steps/Code to reproduce bug
Calling the fit method as follows:
val xgbClassifier = new XGBoostClassifier(paramMap)
.setLabelCol(labelName)
.setFeaturesCols(featureCols)
xgbClassifier.fit(trainDF)
Expected behavior
I expected the model to successfully train when running on GPU's.
Environment details (please complete the following information)
Running Spark job on GCP Dataproc with Nvidia Tesla T4 GPU.
The following JAR's are in the /usr/lib/spark/jars/ classPath:
Using the following DataProc initializers to install GPU Drivers and Rapids Accelerators:
Using the following Spark parameter configurations:
"spark.executor.resource.gpu.amount": "1"
"spark.task.resource.gpu.amount": "1"
"spark.rapids.sql.explain": "ALL"
"spark.rapids.sql.concurrentGpuTasks": "2"
"spark.rapids.memory.pinnedPool.size": "2G"
"spark.executor.extraJavaOptions": "-Dai.rapids.cudf.prefer-pinned=true"
"spark.locality.wait": "0s"
"spark.plugins": "com.nvidia.spark.SQLPlugin"
"spark.rapids.sql.hasNans": "false"
"spark.rapids.sql.batchSizeBytes": "512M"
"spark.rapids.sql.reader.batchSizeBytes": "768M"
"spark.rapids.sql.variableFloatAgg.enabled": "true"
"spark.rapids.sql.decimalType.enabled": "true"
"spark.rapids.memory.gpu.pooling.enabled": "false"
"spark.executor.resource.gpu.discoveryScript": "/usr/lib/spark/scripts/gpu/getGpusResources.sh"
Hi
I am trying to run XGBoost4j Spark 3.0 GPU version on the azure databricks.
I was following the procedure from this post .
While I was running Mortage-gpu notebook. I got this error "features does not exist". The code is expecting a feature with name "features".
Can anyone help me with this ? I am running it on azure databricks with 7.0 ML GPU runtime, with NC6 GPU worker and driver node, as mention in the post.
and
I had a doubt regarding GpuDataReader, I found that GpuDataReader was removed in the mortage notebook from its previous version. Does this might be the reason for this error.
Newer Version:
Older Version:
Thanks in Advance
Hi
I am using XGBoost Spark 3.0 GPU version.
After transform or predict, the unique identifier or key is not available in the result data frame.
when I run my code:
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
and I have 2070super on every work nodes.
When I run the code, there is a WARN SparkContext:
The configuration of resource: gpu (exec = 1, task = 1, runnable tasks = 1) will result in wasted resources due to resource limiting the number of runnable tasks per executor to: -1. Please adjust your configuration.
As a result, use GPU is slower than not use GPU.
How to configuration GPU?
I tried running the example https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/examples/notebooks/scala/taxi-gpu.ipynb . After following the instructions to set up on a Databricks cluster 7.0 ML with GPUs, I got the following exception when calling xgbRegressor.fit(trainSet)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:848)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainDistributedPreferGpu$1(XGBoost.scala:656)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainDistributedPreferGpu$1$adapted(XGBoost.scala:636)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributedPreferGpu(XGBoost.scala:635)
at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.trainHonorGpu(XGBoostRegressor.scala:257)
at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.trainGpu(XGBoostRegressor.scala:196)
at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.fit(XGBoostRegressor.scala:204)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw$$iw$$iw.$anonfun$model$1(command-3584891357336110:4)
at line4c05786d06ce4af79d5461b86d74659147.$read$$iw$$iw$$iw$$iw$$iw$$iw$Benchmark$.time(command-3584891357336108:5)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw$$iw$$iw.(command-3584891357336110:4)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw$$iw.(command-3584891357336110:55)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw.(command-3584891357336110:57)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw.(command-3584891357336110:59)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw.(command-3584891357336110:61)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw.(command-3584891357336110:63)
at line4c05786d06ce4af79d5461b86d74659149.$read.(command-3584891357336110:65)
at line4c05786d06ce4af79d5461b86d74659149.$read$.(command-3584891357336110:69)
at line4c05786d06ce4af79d5461b86d74659149.$read$.(command-3584891357336110)
at line4c05786d06ce4af79d5461b86d74659149.$eval$.$print$lzycompute(:7)
at line4c05786d06ce4af79d5461b86d74659149.$eval$.$print(:6)
at line4c05786d06ce4af79d5461b86d74659149.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
Command took 9.03 seconds -- by [email protected] at 8/4/2020, 4:01:59 PM on carol
Describe the bug
Pretty simple, broken link to the Dockerfile in the Kubernetes getting started guide. Broken link found here: https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/getting-started-guides/on-prem-cluster/kubernetes.md#build-a-gpu-spark-docker-image
Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete the following information)
hi,
I meet a error when I run GPU Mortgage example, Spark Standalone cluster, python application and cudf10.2
Below is the error log:
21/04/22 02:11:38 ERROR GpuXGBoostSpark: The job was aborted due to
java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuUtils$.toColumnarRdd(GpuUtils.scala:39)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpuInternal(GpuXGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainDistributedOnGpu(GpuXGBoost.scala:186)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpu(GpuXGBoost.scala:91)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.fitOnGpu(GpuXGBoost.scala:52)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:170)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:41)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NoClassDefFoundError: ai/rapids/cudf/ColumnView
at com.nvidia.spark.rapids.CastExprMeta.convertToGpu(GpuCast.scala:88)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.GpuOverrides$$anon$147.$anonfun$convertToGpu$21(GpuOverrides.scala:2490)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.force(Stream.scala:274)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:432)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at com.nvidia.spark.rapids.GpuOverrides.addSortsIfNeeded(GpuOverrides.scala:2854)
at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:2814)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2787)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2776)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:514)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:513)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:513)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:482)
at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:324)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:324)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:112)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:138)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:112)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:105)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3200)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3198)
at org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.convert(InternalColumnarRddConverter.scala:485)
at com.nvidia.spark.rapids.ColumnarRdd$.convert(ColumnarRdd.scala:47)
at com.nvidia.spark.rapids.ColumnarRdd.convert(ColumnarRdd.scala)
... 22 more
Caused by: java.lang.ClassNotFoundException: ai.rapids.cudf.ColumnView
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 82 more
21/04/22 02:11:38 INFO RabitTracker$TrackerProcessLogger: Tracker Process ends with exit code 143
21/04/22 02:11:38 INFO SparkUI: Stopped Spark web UI at http://7e7a98e233be:4040
21/04/22 02:11:38 INFO StandaloneSchedulerBackend: Shutting down all executors
21/04/22 02:11:38 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
21/04/22 02:11:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/04/22 02:11:38 INFO MemoryStore: MemoryStore cleared
21/04/22 02:11:38 INFO BlockManager: BlockManager stopped
21/04/22 02:11:38 INFO BlockManagerMaster: BlockManagerMaster stopped
21/04/22 02:11:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/04/22 02:11:38 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/opt/xgboost/main.py", line 18, in
main()
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/main.py", line 21, in main
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/mortgage/gpu_main.py", line 41, in main
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/utility/utils.py", line 46, in with_benchmark
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/mortgage/gpu_main.py", line 41, in
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 129, in fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 321, in _fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 318, in _fit_java
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in call
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 128, in deco
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o57.fit.
: java.lang.NoClassDefFoundError: ai/rapids/cudf/ColumnView
at com.nvidia.spark.rapids.CastExprMeta.convertToGpu(GpuCast.scala:88)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.GpuOverrides$$anon$147.$anonfun$convertToGpu$21(GpuOverrides.scala:2490)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.force(Stream.scala:274)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:432)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at com.nvidia.spark.rapids.GpuOverrides.addSortsIfNeeded(GpuOverrides.scala:2854)
at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:2814)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2787)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2776)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:514)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:513)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:513)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:482)
at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:324)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:324)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:112)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:138)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:112)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:105)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3200)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3198)
at org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.convert(InternalColumnarRddConverter.scala:485)
at com.nvidia.spark.rapids.ColumnarRdd$.convert(ColumnarRdd.scala:47)
at com.nvidia.spark.rapids.ColumnarRdd.convert(ColumnarRdd.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuUtils$.toColumnarRdd(GpuUtils.scala:39)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpuInternal(GpuXGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainDistributedOnGpu(GpuXGBoost.scala:186)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpu(GpuXGBoost.scala:91)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.fitOnGpu(GpuXGBoost.scala:52)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:170)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:41)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: ai.rapids.cudf.ColumnView
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 82 more
init-notebook-for-rapids-spark-xgboost-on-databricks-gpu-7.0-ml.ipynb
line28 : "wget -O rapids-4-spark_2.12-0.5.0.jar …… should be rapids-4-spark_2.12-0.6.0.jar
Previous issues are filed but lacks of some key information such as running environment, running command¶meters. We'd better put a template like other projects such as spark-rapids :
**Describe the bug**
A clear and concise description of what the bug is.
**Steps/Code to reproduce bug**
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment details (please complete the following information)**
- Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
- Spark configuration settings related to the issue
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.