Giter Site home page Giter Site logo

Comments (6)

maguilarh avatar maguilarh commented on August 23, 2024

I had to use parso-12.1 to can run the Spark's datasource.

If i didn't add this jar to my project i get the following execption:

Name: java.lang.NoClassDefFoundError
Message: com/ggasoftware/parso/SasFileReader
StackTrace: com.github.saurfang.sas.spark.SasRelation.inferSchema(SasRelation.scala:97)
com.github.saurfang.sas.spark.SasRelation.<init>(SasRelation.scala:31)
com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:34)
com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:23)
com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:11)
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
$line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
$line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
$line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
$line23.$read$$iwC$$iwC$$iwC.<init>(<console>:29)
$line23.$read$$iwC$$iwC.<init>(<console>:31)
$line23.$read$$iwC.<init>(<console>:33)
$line23.$read.<init>(<console>:35)
$line23.$read$.<init>(<console>:39)
$line23.$read$.<clinit>(<console>)
$line23.$eval$.<init>(<console>:7)
$line23.$eval$.<clinit>(<console>)
$line23.$eval.$print(<console>)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:483)
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreter.scala:356)
org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreter.scala:351)
org.apache.toree.global.StreamState$.withStreams(StreamState.scala:81)
org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:350)
org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:350)
org.apache.toree.utils.TaskManager$$anonfun$add$2$$anon$1.run(TaskManager.scala:140)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

from spark-sas7bdat.

AbdealiLoKo avatar AbdealiLoKo commented on August 23, 2024

I added the Parso 1.2.1 to my project too (because of the above mentioned NoClassDefFoundError) and am getting a very similar ArrayIndexOutOfBound error.

My error is:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 2928
        at com.ggasoftware.parso.SasFileParser.trimBytesArray(SasFileParser.java:1312)
        at com.ggasoftware.parso.SasFileParser.processByteArrayWithData(SasFileParser.java:1111)
        at com.ggasoftware.parso.SasFileParser.access$3300(SasFileParser.java:33)
        at com.ggasoftware.parso.SasFileParser$DataSubheader.processSubheader(SasFileParser.java:857)
        at com.ggasoftware.parso.SasFileParser.readNext(SasFileParser.java:878)

Any solution ?

from spark-sas7bdat.

saurfang avatar saurfang commented on August 23, 2024

For dataset as small as this, I would highly recommend convert it using other standalone programs. https://github.com/tidyverse/haven is an excellent choice.

from spark-sas7bdat.

AbdealiLoKo avatar AbdealiLoKo commented on August 23, 2024

I got the error on a 10gb data file...

from spark-sas7bdat.

saurfang avatar saurfang commented on August 23, 2024

@AbdealiJK Sorry but I'm afraid I don't have the capacity to perform #10 which could solve your issue. You are more than welcomed to contribute a PR if you're interested. I'd be happy to review.

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

This may not be completely solved... it seems quite similar to #32

from spark-sas7bdat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.