Giter Site home page Giter Site logo

Comments (24)

thesuperzapper avatar thesuperzapper commented on August 23, 2024 1

I am also working on PR #59.

from spark-sas7bdat.

thesuperzapper avatar thesuperzapper commented on August 23, 2024 1

They should definitely be separate PRs (as long as Spark 3.0.0 works without those changes)

from spark-sas7bdat.

saurfang avatar saurfang commented on August 23, 2024 1

Thanks to @thesuperzapper and others, a new version with Spark 3.x compatibility is now released. Please see the latest version on README, give it a try, and see if it works for you.

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

I also tried updating the scala version and rebuilding the JAR , facing compatibility issues while rebuilding the JAR

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

I got it compiling with minor changes, and seemingly just one test failure, related to the SQLContext implicit (which is not really to be used in Spark 3). I haven't tested it but getting a compiled JAR is easy enough. I could open a PR (or branch) as an example if anyone wants to try it or run with it.

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

@srowen could you pls share the JAR , I will test it in Databricks 7.0 ML with scala 2.12?

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

OK I'm curious to know if this Just Works:
https://drive.google.com/file/d/1VnF0gnG88ClyXWoHMlXFgvHcVUY7Yo-j/view?usp=sharing

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/epam/parso/impl/SasFileReaderImpl


Py4JJavaError Traceback (most recent call last)
in
----> 1 df=spark.read.format('com.github.saurfang.sas.spark').load("/mnt/tdp/External_DataRepository/USRDS/1_ESRD core/CORE/Datasets/capd.sas7bdat")

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
176 self.options(**options)
177 if isinstance(path, basestring):
--> 178 return self._df(self._jreader.load(path))
179 elif path is not None:
180 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
126 def deco(*a, **kw):
127 try:
--> 128 return f(*a, **kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(

Py4JJavaError: An error occurred while calling o227.load.
: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/epam/parso/impl/SasFileReaderImpl
at com.github.saurfang.sas.spark.SasRelation.inferSchema(SasRelation.scala:186)
at com.github.saurfang.sas.spark.SasRelation.(SasRelation.scala:73)
at com.github.saurfang.sas.spark.SasRelation$.apply(SasRelation.scala:45)
at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:209)
at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:42)
at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:364)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:366)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:355)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:355)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: com/epam/parso/impl/SasFileReaderImpl
... 23 more
Caused by: java.lang.ClassNotFoundException: com.epam.parso.impl.SasFileReaderImpl
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

Oops, right, I needed to produce the assembly JAR. Try this:

https://drive.google.com/file/d/1N_i-I34rFzboJ7J_aIs2rirGKjjIawDv/view?usp=sharing

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

df10 =spark.read.format('com.github.saurfang.sas.spark').load("/mnt/tdp/*/capd.sas7bdat")

java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError:


Py4JJavaError Traceback (most recent call last)
in
----> 1 df10 =spark.read.format('com.github.saurfang.sas.spark').load("/mnt/tdp/External_DataRepository/USRDS/1_ESRD core/CORE/Datasets/capd.sas7bdat")

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
176 self.options(**options)
177 if isinstance(path, basestring):
--> 178 return self._df(self._jreader.load(path))
179 elif path is not None:
180 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
126 def deco(*a, **kw):
127 try:
--> 128 return f(*a, **kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(

Py4JJavaError: An error occurred while calling o241.load.
: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError:
at com.github.saurfang.sas.spark.SasRelation.inferSchema(SasRelation.scala:186)
at com.github.saurfang.sas.spark.SasRelation.(SasRelation.scala:73)
at com.github.saurfang.sas.spark.SasRelation$.apply(SasRelation.scala:45)
at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:209)
at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:42)
at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:364)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:366)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:355)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:355)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

Darn, OK, maybe this doesn't Just Work. That's a weird error though. It doesn't say what class isn't found. And it compiles, so that's surprising. hm, not sure I have other ideas, sorry. But the changes to compile are at least not hard!

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

Thanks for the help :) looking for help from the core team @mulya @thadeusb @forest Fang

from spark-sas7bdat.

Tagar avatar Tagar commented on August 23, 2024

@srowen thank you!

It says it didn't find parso library

Py4JJavaError: An error occurred while calling o227.load.
: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/epam/parso/impl/SasFileReaderImpl

Perhaps that jar should also contain parso library?
https://mvnrepository.com/artifact/com.epam/parso

spark-sas7bdat knows how to split sas7bdat files, but actual parsing happens in the parso library.

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

Oh yeah, ignore that. In the Maven published version, it says parso is a dependency, so apps will correctly pick it up. For this test with a JAR file only I needed to create the assembly jar with dependencies bundled. The second error is what I'm now looking at.

from spark-sas7bdat.

Tagar avatar Tagar commented on August 23, 2024

Makes sense, got it now. Yep it's weird it didn't report class name in the exception.

cc @saurfang

from spark-sas7bdat.

thesuperzapper avatar thesuperzapper commented on August 23, 2024

I bet this error is caused by us breaking method protection with a reflection bug. (We are using this to access private method of parso classes)

I asked them to make the methods I needed public ages ago, see here

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

Could be, though I might expect a different error, not 'class not found'. It's entirely possible my assembly JAR isn't quite right. Like I notice the assembly from the build would include Scala libs (including scala-reflect) and I don't know, may cause some problem. But the scala version matches Spark 3's.

from spark-sas7bdat.

saurfang avatar saurfang commented on August 23, 2024

@srowen can you open a PR even if it is a WIP?

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

Sure, it's trivial, but: #58

from spark-sas7bdat.

Tagar avatar Tagar commented on August 23, 2024

@thesuperzapper
Parso now has those fixes as part of 2.0.12-SNAPSHOT deployed into maven.
Would it make sense to adjust PR #59 to use this new snapshot, or it has to wait the full release? Thx!

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

@srowen Could you pls send us the updated JAR?

from spark-sas7bdat.

srowen avatar srowen commented on August 23, 2024

@pkolli-caredx an you just build a new JAR from #59 ?

from spark-sas7bdat.

pkolli-caredx avatar pkolli-caredx commented on August 23, 2024

@srowen getting build errors, the owner of the repo should merge

from spark-sas7bdat.

Tagar avatar Tagar commented on August 23, 2024

Thank you @thesuperzapper and @saurfang

from spark-sas7bdat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.