Giter Site home page Giter Site logo

spark-xgboost-examples's Introduction

Note:

This repo has been deprecated. We provide a new public repo spark-rapids-examples, which includes not only XGBoost examples but also Spark ETL examples on GPU with our spark-rapids.

spark-xgboost-examples's People

Contributors

abellina avatar chuanlihao avatar firestarman avatar garyshen2008 avatar krajendrannv avatar mengdong avatar mgzhao avatar nartal1 avatar richwhitjr avatar rongou avatar rwlee avatar sameerz avatar shotai avatar tgravescs avatar viadea avatar wbo4958 avatar wjxiz1992 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-xgboost-examples's Issues

Results mismatch during XGBoost Pyspark to Python conversion

Hi

I have trained and saved the xgboost pyspark using model.booster.save_model(path).
I loaded into xgboost python using xgb.booster(path)
The predictions of the XGBoost pyspark and python are different and the accuracy was dropped by 2 to 5% on the same data.

I have trained a XGBoost Python model with same features (from the pyspark). The results are different from Pyspark. The accuracy was very low in python when compared to pyspark.

Can you help why this was happening in both the cases.

Gpu mortgage notebook failure in Databricks

Describe the bug
While running the mortgage notebook in https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/examples/notebooks/python/mortgage-gpu.ipynb
Error: Method setFeaturesCols([class scala.collection.convert.Wrappers$JListWrapper]) does not exist

Steps/Code to reproduce bug

Followed steps mentioned here to Setup https://github.com/NVIDIA/spark-xgboost-examples/tree/spark-3/getting-started-guides/csp/databricks

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Environment location: Databricks 7.3 LTS ML Runtime
  • Spark configuration settings related to the issue
    spark.plugins com.nvidia.spark.SQLPlugin
    spark.task.resource.gpu.amount 0.1
    spark.locality.wait 0s
    spark.databricks.delta.optimizeWrite.enabled false
    spark.sql.adaptive.enabled false
    spark.rapids.sql.concurrentGpuTasks 2

featureImportances method doesn't exist

Hi
I am using XGBoost Spark 3.0 GPU version

I couldn't find featureImportances method for the model object. Can you guide me how to get feature importances from the trained model.
and
Can you share any notebook or code for hyper-parameter tuning using hyperopt, If you already have it.

Thanks in advance

Getting error when trying to run xgboost examples on Spark 3.1.2

I am getting error when trying to run Nyc Taxi or mortgage examples with Spark 3.1.2 operator in Kubernetes. We are submitting our Sparkapplication via Kubectl and getting below error. I tried with different version of spark catalyst jar (3.0.0 and 3.1.2) but still same.

Traceback (most recent call last):
File "/tmp/spark-a0673c21-9c04-4ba0-ae54-13b825af94e7/mortgage.py", line 78, in
model = with_benchmark('Training', lambda: classifier.fit(train_data))
File "/tmp/spark-a0673c21-9c04-4ba0-ae54-13b825af94e7/mortgage.py", line 74, in with_benchmark
result = action()
File "/tmp/spark-a0673c21-9c04-4ba0-ae54-13b825af94e7/mortgage.py", line 78, in
model = with_benchmark('Training', lambda: classifier.fit(train_data))
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 161, in fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 335, in _fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 332, in _fit_java
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in call
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o82.fit.
: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/TimeSub
at com.nvidia.spark.rapids.shims.spark300.Spark300Shims.getExprs(Spark300Shims.scala:251)
at com.nvidia.spark.rapids.shims.spark301.Spark301Shims.getExprs(Spark301Shims.scala:84)
at com.nvidia.spark.rapids.GpuOverrides$.(GpuOverrides.scala:2544)
at com.nvidia.spark.rapids.GpuOverrides$.(GpuOverrides.scala)
at org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.convert(InternalColumnarRddConverter.scala:477)
at com.nvidia.spark.rapids.ColumnarRdd$.convert(ColumnarRdd.scala:47)
at com.nvidia.spark.rapids.ColumnarRdd.convert(ColumnarRdd.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuUtils$.toColumnarRdd(GpuUtils.scala:39)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpuInternal(GpuXGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainDistributedOnGpu(GpuXGBoost.scala:186)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpu(GpuXGBoost.scala:91)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.fitOnGpu(GpuXGBoost.scala:52)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:170)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/TimeSub
... 29 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.expressions.TimeSub
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 29 more

Exception in gpu_hist: NCCL failure :unhandled cuda error

Hi
i am trying to run spark-xgboost-examples with scala and python

  • YARN for Scala
  • YARN for Python

but i got the same error ,

ml.dmlc.xgboost4j.java.XGBoostError: [12:56:19] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490: 
Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)

My environment:

OS      : Ubuntu 18.04.4
GPU     : 4*V100 (16G)
Driver  : 418.87.01
CUDA  : V10.1.243
NCCL  : 2.4.7-1+cuda10.1

java  : java-1.8.0-openjdk-amd64
scala : scala-2.12.10
hadoop : hadoop-3.1.3
spark : spark-3.0.0-bin-hadoop3.2
root@master:~# dpkg -l | grep nccl
ii  libnccl-dev                             2.4.7-1+cuda10.1                                amd64        NVIDIA Collectives Communication Library (NCCL) Development Files
ii  libnccl2                                2.4.7-1+cuda10.1                                amd64        NVIDIA Collectives Communication Library (NCCL) Runtime
ii  nccl-repo-ubuntu1804-2.4.7-ga-cuda10.1  1-1                                             amd64        nccl repository configuration files

spark-submit command

export DATA_PATH=hdfs:/tmp/xgboost4j_spark/data
export JARS_PATH=hdfs:/tmp/xgboost4j_spark/jars
export EXAMPLE_CLASS=com.nvidia.spark.examples.mortgage.GPUMain
export JAR_EXAMPLE=${JARS_PATH}/sample_xgboost_apps-0.2.2-jar-with-dependencies.jar
export JAR_RAPIDS=${JARS_PATH}/rapids-4-spark_2.12-0.1.0.jar

${SPARK_HOME}/bin/spark-submit                                                    \
 --conf spark.plugins=com.nvidia.spark.SQLPlugin                                \
 --conf spark.rapids.memory.gpu.pooling.enabled=false                           \
 --conf spark.executor.resource.gpu.amount=1                                    \
 --conf spark.task.resource.gpu.amount=1                                        \
 --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh       \
 --conf spark.task.cpus=1                                                       \
 --files $SPARK_HOME/examples/src/main/scripts/getGpusResources.sh              \
 --jars ${JAR_RAPIDS},${JAR_EXAMPLE}                                            \
 --master yarn                                                                  \
 --deploy-mode client                                                           \
 --num-executors 2                                                              \
 --executor-cores 1                                                             \
 --driver-memory 4g                                                             \
 --executor-memory 8g                                                           \
 --class ${EXAMPLE_CLASS}                                                       \
 ${JAR_EXAMPLE}                                                                 \
 -dataPath=train::${DATA_PATH}/mortgage/csv/train/mortgage_train_merged.csv     \
 -dataPath=trans::${DATA_PATH}/mortgage/csv/test/mortgage_eval_merged.csv       \
 -format=csv                                                                    \
 -numWorkers=2                                                                  \
 -treeMethod=gpu_hist                                                           \
 -numRound=100                                                                  \
 -maxDepth=8   

container/stderr

2020-07-06 12:01:41,790 INFO executor.YarnCoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@master:37781
2020-07-06 12:01:41,803 INFO resource.ResourceDiscoveryScriptPlugin: Discovering resources for gpu with script: ./getGpusResources.sh
2020-07-06 12:01:41,938 INFO resource.ResourceUtils: ==============================================================
2020-07-06 12:01:41,939 INFO resource.ResourceUtils: Resources for spark.executor:
gpu -> [name: gpu, addresses: 0]
2020-07-06 12:01:41,939 INFO resource.ResourceUtils: ==============================================================
2020-07-06 12:01:42,060 INFO executor.YarnCoarseGrainedExecutorBackend: Successfully registered with driver
2020-07-06 12:01:42,064 INFO executor.Executor: Starting executor ID 1 on host master
2020-07-06 12:01:42,152 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44915.
2020-07-06 12:01:42,152 INFO netty.NettyBlockTransferService: Server created on master:44915
2020-07-06 12:01:42,154 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-07-06 12:01:42,162 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(1, master, 44915, None)
2020-07-06 12:01:42,174 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(1, master, 44915, None)
2020-07-06 12:01:42,176 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(1, master, 44915, None)
2020-07-06 12:01:42,209 INFO rapids.RapidsExecutorPlugin: Initializing memory from Executor Plugin
2020-07-06 12:01:50,020 INFO rapids.GpuDeviceManager: Initializing RMM  14517.44921875 MB on gpuId 0
2020-07-06 12:01:50,035 INFO plugin.ExecutorPluginContainer: Initialized executor component for plugin com.nvidia.spark.SQLPlugin.
2020-07-06 12:01:50,135 INFO executor.YarnCoarseGrainedExecutorBackend: Got assigned task 0
2020-07-06 12:01:50,144 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
2020-07-06 12:01:50,250 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1 with 1 pieces (estimated total size 4.0 MiB)
2020-07-06 12:01:50,302 INFO client.TransportClientFactory: Successfully created connection to master/172.16.2.17:37137 after 2 ms (0 ms spent in bootstraps)
2020-07-06 12:01:50,349 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 7.4 KiB, free 4.1 GiB)
2020-07-06 12:01:50,359 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 109 ms
2020-07-06 12:01:50,433 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 14.6 KiB, free 4.1 GiB)

2020-07-06 12:01:50,812 INFO datasources.FileScanRDD: Reading File path: hdfs://master:9000/tmp/xgboost4j_spark/data/mortgage/csv/train/mortgage_train_merged.csv, range: 0-993993, partition values: [empty row]
2020-07-06 12:01:50,814 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0 with 1 pieces (estimated total size 4.0 MiB)
2020-07-06 12:01:50,824 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 43.2 KiB, free 4.1 GiB)
2020-07-06 12:01:50,827 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 13 ms
2020-07-06 12:01:50,865 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 600.3 KiB, free 4.1 GiB)
2020-07-06 12:01:51,592 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2866 bytes result sent to driver
2020-07-06 12:01:51,643 INFO executor.YarnCoarseGrainedExecutorBackend: Got assigned task 1
2020-07-06 12:01:51,643 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
2020-07-06 12:01:51,648 INFO spark.MapOutputTrackerWorker: Updating epoch to 1 and clearing cache
2020-07-06 12:01:51,666 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2 with 1 pieces (estimated total size 4.0 MiB)
2020-07-06 12:01:51,673 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 5.6 KiB, free 4.1 GiB)
2020-07-06 12:01:51,675 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 8 ms
2020-07-06 12:01:51,676 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 9.8 KiB, free 4.1 GiB)
2020-07-06 12:01:51,847 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 0, fetching them
2020-07-06 12:01:51,848 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@master:37781)
2020-07-06 12:01:51,940 INFO spark.MapOutputTrackerWorker: Got the output locations
2020-07-06 12:01:52,092 INFO storage.ShuffleBlockFetcherIterator: Getting 1 (34.9 KiB) non-empty blocks including 1 (34.9 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) remote blocks
2020-07-06 12:01:52,093 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms
2020-07-06 12:01:52,105 INFO XGBoostSpark: XGboost GPU training using device: 0
2020-07-06 12:01:52,251 INFO java.DMatrix: load XGBoost libs
2020-07-06 12:01:52,257 INFO java.EnvironmentDetector: Found CUDA version from /usr/local/cuda/version.txt: 10.1.243
2020-07-06 12:01:52,257 INFO java.NativeLibLoader: found folder cuda10.1/ for CUDA 10.1.243
2020-07-06 12:01:57,055 ERROR XGBoostSpark: XGBooster worker 0 has failed due to
ml.dmlc.xgboost4j.java.XGBoostError: [12:01:57] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490: 
Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)

Stack trace:
  [bt] (0) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x35) [0x7f86edec2e35]
  [bt] (1) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x15a6) [0x7f86ee146076]
  [bt] (2) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x4e3) [0x7f86edf583e3]
  [bt] (3) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xc29) [0x7f86edf5a5e9]
  [bt] (4) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x2b2) [0x7f86edf74302]
  [bt] (5) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(XGBoosterUpdateOneIter+0x29) [0x7f86edec6f49]
  [bt] (6) [0x7f8811018427]


        at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:50)
        at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:181)
        at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:190)
        at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:68)
        at ml.dmlc.xgboost4j.scala.spark.XGBoost$.buildDistributedBooster(XGBoost.scala:210)
        at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainPreferGpu$1(XGBoost.scala:592)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
        at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
        at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:127)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2020-07-06 12:01:57,059 WARN storage.BlockManager: Putting block rdd_10_0 failed due to exception ml.dmlc.xgboost4j.java.XGBoostError: [12:01:57] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490: Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)

Stack trace:
  [bt] (0) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x35) [0x7f86edec2e35]
  [bt] (1) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x15a6) [0x7f86ee146076]
  [bt] (2) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x4e3) [0x7f86edf583e3]
  [bt] (3) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xc29) [0x7f86edf5a5e9]
  [bt] (4) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x2b2) [0x7f86edf74302]
  [bt] (5) /hadoop3/yarn/local/usercache/root/appcache/application_1594008041123_0001/container_1594008041123_0001_01_000002/tmp/libxgboost4j6390811092133043944.so(XGBoosterUpdateOneIter+0x29) [0x7f86edec6f49]
  [bt] (6) [0x7f8811018427]

.
2020-07-06 12:01:57,060 WARN storage.BlockManager: Block rdd_10_0 could not be removed as it was not found on disk or in memory
2020-07-06 12:01:57,067 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
ml.dmlc.xgboost4j.java.XGBoostError: [12:01:57] /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/updater_gpu_hist.cu:1490: Exception in gpu_hist: NCCL failure :unhandled cuda error /home/jenkinslc/workspace/llc-xgb-deploy-dev/src/tree/../common/device_helpers.cuh(1061)

IllegalArgumentExce

Describe the bug
When trying to train an XGBoost classifier with GPU's, it produces the following error:

IllegalArgumentException: features does not exist

Steps/Code to reproduce bug
Calling the fit method as follows:

val xgbClassifier = new XGBoostClassifier(paramMap)
  .setLabelCol(labelName)
  .setFeaturesCols(featureCols)
xgbClassifier.fit(trainDF)

Expected behavior
I expected the model to successfully train when running on GPU's.

Environment details (please complete the following information)

Running Spark job on GCP Dataproc with Nvidia Tesla T4 GPU.

The following JAR's are in the /usr/lib/spark/jars/ classPath:

  • Rapids-4-Spark: rapids-4-spark_2.12-21.08.0.jar
  • XGBoost4J: xgboost4j_3.0-1.4.2-0.1.0.jar
  • XGBoost4J-Spark: xgboost4j-spark_3.0-1.4.2-0.1.0.jar
  • CUDA: cudf-21.08.2-cuda11.jar

Using the following DataProc initializers to install GPU Drivers and Rapids Accelerators:

  • goog-dataproc-initialization-actions-us-central1/gpu/install_gpu_driver.sh
  • goog-dataproc-initialization-actions-us-central1/rapids/rapids.sh

Using the following Spark parameter configurations:
"spark.executor.resource.gpu.amount": "1"
"spark.task.resource.gpu.amount": "1"
"spark.rapids.sql.explain": "ALL"
"spark.rapids.sql.concurrentGpuTasks": "2"
"spark.rapids.memory.pinnedPool.size": "2G"
"spark.executor.extraJavaOptions": "-Dai.rapids.cudf.prefer-pinned=true"
"spark.locality.wait": "0s"
"spark.plugins": "com.nvidia.spark.SQLPlugin"
"spark.rapids.sql.hasNans": "false"
"spark.rapids.sql.batchSizeBytes": "512M"
"spark.rapids.sql.reader.batchSizeBytes": "768M"
"spark.rapids.sql.variableFloatAgg.enabled": "true"
"spark.rapids.sql.decimalType.enabled": "true"
"spark.rapids.memory.gpu.pooling.enabled": "false"
"spark.executor.resource.gpu.discoveryScript": "/usr/lib/spark/scripts/gpu/getGpusResources.sh"

IllegalArgumentException: features does not exist. [BUG]

Hi
I am trying to run XGBoost4j Spark 3.0 GPU version on the azure databricks.
I was following the procedure from this post .
While I was running Mortage-gpu notebook. I got this error "features does not exist". The code is expecting a feature with name "features".
Capture

Can anyone help me with this ? I am running it on azure databricks with 7.0 ML GPU runtime, with NC6 GPU worker and driver node, as mention in the post.
and

I had a doubt regarding GpuDataReader, I found that GpuDataReader was removed in the mortage notebook from its previous version. Does this might be the reason for this error.
Newer Version:
Capture2
Older Version:
Capture3

Thanks in Advance

How and where can I set to use GPU?

when I run my code:
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
and I have 2070super on every work nodes.

How to configuration GPU?wasted resources?

When I run the code, there is a WARN SparkContext:
The configuration of resource: gpu (exec = 1, task = 1, runnable tasks = 1) will result in wasted resources due to resource limiting the number of runnable tasks per executor to: -1. Please adjust your configuration.
As a result, use GPU is slower than not use GPU.
How to configuration GPU?

XGBoost traindistributedgpu exception

I tried running the example https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/examples/notebooks/scala/taxi-gpu.ipynb . After following the instructions to set up on a Databricks cluster 7.0 ML with GPUs, I got the following exception when calling xgbRegressor.fit(trainSet)

at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:848)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainDistributedPreferGpu$1(XGBoost.scala:656)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainDistributedPreferGpu$1$adapted(XGBoost.scala:636)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributedPreferGpu(XGBoost.scala:635)
at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.trainHonorGpu(XGBoostRegressor.scala:257)
at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.trainGpu(XGBoostRegressor.scala:196)
at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.fit(XGBoostRegressor.scala:204)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw$$iw$$iw.$anonfun$model$1(command-3584891357336110:4)
at line4c05786d06ce4af79d5461b86d74659147.$read$$iw$$iw$$iw$$iw$$iw$$iw$Benchmark$.time(command-3584891357336108:5)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw$$iw$$iw.(command-3584891357336110:4)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw$$iw.(command-3584891357336110:55)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw$$iw.(command-3584891357336110:57)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw$$iw.(command-3584891357336110:59)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw$$iw.(command-3584891357336110:61)
at line4c05786d06ce4af79d5461b86d74659149.$read$$iw.(command-3584891357336110:63)
at line4c05786d06ce4af79d5461b86d74659149.$read.(command-3584891357336110:65)
at line4c05786d06ce4af79d5461b86d74659149.$read$.(command-3584891357336110:69)
at line4c05786d06ce4af79d5461b86d74659149.$read$.(command-3584891357336110)
at line4c05786d06ce4af79d5461b86d74659149.$eval$.$print$lzycompute(:7)
at line4c05786d06ce4af79d5461b86d74659149.$eval$.$print(:6)
at line4c05786d06ce4af79d5461b86d74659149.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
Command took 9.03 seconds -- by [email protected] at 8/4/2020, 4:01:59 PM on carol

Broken Link in Kubernetes guide

Describe the bug
Pretty simple, broken link to the Dockerfile in the Kubernetes getting started guide. Broken link found here: https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/getting-started-guides/on-prem-cluster/kubernetes.md#build-a-gpu-spark-docker-image

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

GpuXGBoostSpark error when run GPU Mortgage example

hi,
I meet a error when I run GPU Mortgage example, Spark Standalone cluster, python application and cudf10.2
Below is the error log:
21/04/22 02:11:38 ERROR GpuXGBoostSpark: The job was aborted due to
java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuUtils$.toColumnarRdd(GpuUtils.scala:39)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpuInternal(GpuXGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainDistributedOnGpu(GpuXGBoost.scala:186)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpu(GpuXGBoost.scala:91)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.fitOnGpu(GpuXGBoost.scala:52)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:170)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:41)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NoClassDefFoundError: ai/rapids/cudf/ColumnView
at com.nvidia.spark.rapids.CastExprMeta.convertToGpu(GpuCast.scala:88)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.GpuOverrides$$anon$147.$anonfun$convertToGpu$21(GpuOverrides.scala:2490)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.force(Stream.scala:274)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:432)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at com.nvidia.spark.rapids.GpuOverrides.addSortsIfNeeded(GpuOverrides.scala:2854)
at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:2814)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2787)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2776)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:514)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:513)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:513)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:482)
at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:324)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:324)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:112)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:138)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:112)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:105)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3200)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3198)
at org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.convert(InternalColumnarRddConverter.scala:485)
at com.nvidia.spark.rapids.ColumnarRdd$.convert(ColumnarRdd.scala:47)
at com.nvidia.spark.rapids.ColumnarRdd.convert(ColumnarRdd.scala)
... 22 more
Caused by: java.lang.ClassNotFoundException: ai.rapids.cudf.ColumnView
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 82 more
21/04/22 02:11:38 INFO RabitTracker$TrackerProcessLogger: Tracker Process ends with exit code 143
21/04/22 02:11:38 INFO SparkUI: Stopped Spark web UI at http://7e7a98e233be:4040
21/04/22 02:11:38 INFO StandaloneSchedulerBackend: Shutting down all executors
21/04/22 02:11:38 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
21/04/22 02:11:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/04/22 02:11:38 INFO MemoryStore: MemoryStore cleared
21/04/22 02:11:38 INFO BlockManager: BlockManager stopped
21/04/22 02:11:38 INFO BlockManagerMaster: BlockManagerMaster stopped
21/04/22 02:11:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/04/22 02:11:38 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/opt/xgboost/main.py", line 18, in
main()
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/main.py", line 21, in main
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/mortgage/gpu_main.py", line 41, in main
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/utility/utils.py", line 46, in with_benchmark
File "/opt/xgboost/samples.zip/com/nvidia/spark/examples/mortgage/gpu_main.py", line 41, in
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 129, in fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 321, in _fit
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 318, in _fit_java
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in call
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 128, in deco
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o57.fit.
: java.lang.NoClassDefFoundError: ai/rapids/cudf/ColumnView
at com.nvidia.spark.rapids.CastExprMeta.convertToGpu(GpuCast.scala:88)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:755)
at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:747)
at com.nvidia.spark.rapids.GpuOverrides$$anon$147.$anonfun$convertToGpu$21(GpuOverrides.scala:2490)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
at scala.collection.immutable.Stream.force(Stream.scala:274)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:432)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:336)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:356)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:336)
at com.nvidia.spark.rapids.GpuOverrides.addSortsIfNeeded(GpuOverrides.scala:2854)
at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:2814)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2787)
at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2776)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:514)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:513)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:513)
at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:482)
at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:324)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:324)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:112)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:138)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:112)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:105)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3200)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3198)
at org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.convert(InternalColumnarRddConverter.scala:485)
at com.nvidia.spark.rapids.ColumnarRdd$.convert(ColumnarRdd.scala:47)
at com.nvidia.spark.rapids.ColumnarRdd.convert(ColumnarRdd.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuUtils$.toColumnarRdd(GpuUtils.scala:39)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpuInternal(GpuXGBoost.scala:240)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainDistributedOnGpu(GpuXGBoost.scala:186)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.trainOnGpu(GpuXGBoost.scala:91)
at ml.dmlc.xgboost4j.scala.spark.rapids.GpuXGBoost$.fitOnGpu(GpuXGBoost.scala:52)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:170)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.fit(XGBoostClassifier.scala:41)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: ai.rapids.cudf.ColumnView
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 82 more

jar file name error

init-notebook-for-rapids-spark-xgboost-on-databricks-gpu-7.0-ml.ipynb
line28 : "wget -O rapids-4-spark_2.12-0.5.0.jar …… should be rapids-4-spark_2.12-0.6.0.jar

Add issue template to this repository

Previous issues are filed but lacks of some key information such as running environment, running command&parameters. We'd better put a template like other projects such as spark-rapids :

**Describe the bug**
A clear and concise description of what the bug is.

**Steps/Code to reproduce bug**
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment details (please complete the following information)**
 - Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
 - Spark configuration settings related to the issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.