Giter Site home page Giter Site logo

aliyun / aliyun-emapreduce-datasources Goto Github PK

View Code? Open in Web Editor NEW
168.0 38.0 89.0 195.28 MB

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Home Page: http://www.aliyun.com/product/emapreduce

License: Artistic License 2.0

Java 34.40% Scala 62.68% Python 1.96% Shell 0.96%
aliyun e-mapreduce spark kafka datasources hadoop

aliyun-emapreduce-datasources's Introduction

version build HitCount License Apache2

E-MapReduce DataSources

Requirements

  • Spark 1.3+

Introduction

  • This project supports interaction with Aliyun's base service, e.g. OSS, ODPS, LogService and ONS, in Spark runtime environment.

Build and Install


	    git clone https://github.com/aliyun/aliyun-emapreduce-datasources.git
	    cd  aliyun-emapreduce-datasources
	    mvn clean package -DskipTests

Build emr-maxcompute with Spark3.2.0

	    git clone https://github.com/aliyun/aliyun-emapreduce-datasources.git
	    cd  aliyun-emapreduce-datasources/emr-maxcompute/
	    mvn clean package -Pspark3 -DskipTests

Use SDK in Eclipse project directly

  • copy sdk jar to your project
  • right click Eclipse project -> Properties -> Java Build Path -> Add JARs
  • choose and import the sdk
  • you can use the sdk in your Eclipse project

Maven

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-maxcompute_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-logservice_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-tablestore</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-ons_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-mns_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-redis_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-hbase_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-jdbc_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-dts_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-kudu_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-datahub_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-druid_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-sql_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-oss</artifactId>
            <version>2.0.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-common</artifactId>
            <version>2.2.0</version>
        </dependency>
        
        <dependency>
            <groupId>com.aliyun.emr</groupId>
            <artifactId>emr-kafka-client-metrics</artifactId>
            <version>2.2.0</version>
        </dependency>

Run tests

JindoFS/OSS support

MaxCompute support

ONS support

LogService support

TableStore support

License

Licensed under the Apache License 2.0

aliyun-emapreduce-datasources's People

Contributors

foresightyj avatar frankleaf avatar jiangmeng0606 avatar jkaka avatar kexianda avatar legendtkl avatar liketic avatar liukaitj avatar noahli avatar okingniko avatar plusplusjiajia avatar powerwu avatar realxujiang avatar sundapeng avatar tianshuang avatar unclegen avatar waitinfuture avatar wenxuanguan avatar windpiger avatar wsu13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aliyun-emapreduce-datasources's Issues

Redefine OSS URI

oss:// and ossn:// are easily confusing.

A better choice:

oss:// -> ossbfs://
ossn:// -> oss://

Throw "com.aliyun.oss.ClientException: Read timed out " when copy large object

com.aliyun.oss.ClientException: Read timed out

at com.aliyun.oss.common.utils.ExceptionFactory.createNetworkException(ExceptionFactory.java:65)
at com.aliyun.oss.common.comm.DefaultServiceClient.sendRequestCore(DefaultServiceClient.java:62)
at com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:126)
at com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:72)
at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:92)
at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:133)
at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:111)
at com.aliyun.oss.internal.OSSObjectOperation.copyObject(OSSObjectOperation.java:298)
at com.aliy
un.oss.OSSClient.copyObject(OSSClient.java:465)
at com.aliyun.oss.OSSClient.copyObject(OSSClient.java:459)

是否能出一个SBT的工程。

能不能出一个SBT的sample工程,可以用于自动下载org包的依赖,也便于在运行的时候集成com包人依赖,像osscmd一样,可以跟踪sdk、core模块的变更,验证接口的变更。

在编译和运行阶段都能应用到测试有效性上。

官网中的一项conf.set与github例子不一样。

在官网上conf的set有一项,如下:

conf.set("spark.hadoop.fs.oss.impl", "com.aliyun.fs.oss.nat.NativeOssFileSystem")

在github的readme里这一句已经去掉了,这句设置请问是否还延续有效?

[Bug]: GC overhead limit exceeded when list large number of oss objects

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:87)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:195)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:68)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:96)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:60)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:68)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:89)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:195)
at com.google.gson.Gson.toJson(Gson.java:593)
at com.google.gson.Gson.toJson(Gson.java:572)
at com.google.gson.Gson.toJson(Gson.java:527)
at com.google.gson.Gson.toJson(Gson.java:507)
at com.aliyun.fs.oss.utils.OSSClientAgent.listObjects(OSSClientAgent.java:204)
at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.list(JetOssNativeFileSystemStore.java:369)
at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.list(JetOssNativeFileSystemStore.java:358)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy4.list(Unknown Source)
at com.aliyun.fs.oss.nat.NativeOssFileSystem.listStatus(NativeOssFileSystem.java:375)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
at org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:1682)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1681)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1664)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPathRecursively(FileInputFormat.java:341)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPathRecursively(FileInputFormat.java:346)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPathRecursively(FileInputFormat.java:346)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPathRecursively(FileInputFormat.java:346)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPathRecursively(FileInputFormat.java:346)

java.lang.IllegalArgumentException: Path must be absolute: null/part-00009

java.lang.IllegalArgumentException: Path must be absolute: null/part-00009
at com.aliyun.fs.oss.nat.NativeOssFileSystem.pathToKey(NativeOssFileSystem.java:221)
at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.storeFile(JetOssNativeFileSystemStore.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.storeFile(Unknown Source)
at com.aliyun.fs.oss.nat.NativeOssFileSystem$NativeOssFsOutputStream.close(NativeOssFileSystem.java:146)

suppress some boring logs

like:

15/11/26 13:05:54 WARN comm.ServiceClient: Unable to execute HTTP request: Not Found 
[ErrorCode]: NoSuchKey 
[RequestId]: 8239BAIJio
[HostId]: null 
15/11/26 13:05:54 WARN comm.ServiceClient: Unable to execute HTTP request: Not Found 
[ErrorCode]: NoSuchKey 
[RequestId]: 56569332119C58B90458429D 
[HostId]: null 
15/11/26 13:05:54 INFO nat.NativeOssFileSystem: OSS File Path can not start with "/", so we need to scratch the first "/". 
15/11/26 13:05:54 WARN comm.ServiceClient: Unable to execute HTTP request: Not Found 
[ErrorCode]: NoSuchKey 
[RequestId]: AJLSD7A12
[HostId]: null 

包的依赖关系和com.aliyun.fs.oss.nat.NativeOssFileSystem

Core目录可以compile,mvn test不错,但可以跳过test 生成.jar包。
SDK->依赖Core

已经将emr-sdk_2.0跳过test , 生成jar包,进行了本地化安全。

并在要调用emr-sdk的.jar客户端程序的.pom加入了, emr-sdk的依赖。

<dependency>
  <groupId>com.aliyun</groupId>
  <artifactId>emr-sdk_2.10</artifactId>
  <version>1.0.0</version>
</dependency>

因为emr-sdk已经mvn install, 编译阶段是可以通过的。
但是在执行阶段, 即使把core和sdk生成一个包里,还会出现,找不到core的NativeOssFileSystem类的问题。com.aliyun.fs.oss.nat.NativeOssFileSystem

如果直接把core打成一个包, 放到工程的classpath中,出现另外的异常。

com/aliyun/oss/ServiceException

org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body

Error: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 24368795; received: 8317227
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:180)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at com.aliyun.oss.event.ProgressInputStream.read(ProgressInputStream.java:116)
at com.aliyun.fs.oss.nat.NativeOssFileSystem$NativeOssFsInputStream.read(NativeOssFileSystem.java:74)

java.lang.NumberFormatException: multiple points

py4j.protocol.Py4JJavaError: An error occurred while calling o27.saveToTable.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1.0 failed 1 times, most recent failure: Lost task 15.0 in stage 1.0 (TID 35, localhost): java.lang.NumberFormatException: multiple points
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.text.DigitList.getDouble(DigitList.java:169)
at java.text.DecimalFormat.parse(DecimalFormat.java:2056)
at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1869)
at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
at java.text.DateFormat.parse(DateFormat.java:364)
at org.apache.spark.aliyun.odps.PythonOdpsAPI$$anonfun$org$apache$spark$aliyun$odps$PythonOdpsAPI$$writeTransfer$2.apply(PythonOdpsAPI.scala:146)
at org.apache.spark.aliyun.odps.PythonOdpsAPI$$anonfun$org$apache$spark$aliyun$odps$PythonOdpsAPI$$writeTransfer$2.apply(PythonOdpsAPI.scala:136)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.aliyun.odps.PythonOdpsAPI.org$apache$spark$aliyun$odps$PythonOdpsAPI$$writeTransfer(PythonOdpsAPI.scala:136)

example例子的运行

spark-submit --class com.aliyun.emr.examples.TestOss --executor-memory 2G --total-executor-cores 4 --driver-class-path lib/emr-sdk_2.10-1.0.0.jar target/emr-examples_2.10-1.0.0.jar

已经把example TestOss类中,从args取得的参数,

val accessKeyId =""
val accessKeySecret = ""
val endpoint = "http://oss-cn-beijing.aliyuncs.com"
val inputPath = "oss://xxxlab-dns/20201010"
val numPartitions = 2

全部写成了固定值。

    <dependency>
        <groupId>com.aliyun</groupId>
        <artifactId>emr-sdk_2.10</artifactId>
        <version>1.0.0</version>
    </dependency>

新版本的git代码默认就有。

The top 10 lines are:
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.aliyun.fs.oss.nat.NativeOssFileSystem not found

请问能不能,不把sdk的源码和客户端调用放到一起,成为独立依赖的jar去执行oss的访问?

找不到类

按照官方的方式打包后,我把core下面的/emr-core-1.3.0-SNAPSHOT.jar文件和sdk下面的/emr-sdk_2.10-1.3.0-SNAPSHOT.jar 引入了spark-shell,然后出现错误,错误如下:

java.lang.NoClassDefFoundError: com/aliyun/oss/OSSException
at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.initialize(JetOssNativeFileSystemStore.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy30.initialize(Unknown Source)
at com.aliyun.fs.oss.nat.NativeOssFileSystem.initialize(NativeOssFileSystem.java:135)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1380)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1370)
at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1351)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.top(RDD.scala:1350)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC.(:43)
at $iwC$$iwC.(:45)
at $iwC.(:47)
at (:49)
at .(:53)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.aliyun.oss.OSSException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 89 more

请问我需要修改什么吗,pom中我看只有spark和hadoop的写着provided了,其余的按道理都应该一起被打包才对呀,为什么还找不到类呢 求指点

FAILED: RuntimeException Cannot create staging directory when run hive srcipt

[hadoop@emr-header-1 ~]$ hadoop fs -mkdir oss://id:[email protected]/test6/11
mkdir: Permission denied

i found /mnt/disk1/data/oss user:group is root:root

[hadoop@emr-header-1 ~]$ cd /mnt/disk1/data
[hadoop@emr-header-1 data]$ ll
total 4
drwxr-xr-x 2 root root 4096 Apr 18 18:23 oss

delete /mnt/disk1/data,when Run the following script to be successful

[hadoop@emr-header-1 ~]$ hadoop fs -mkdir oss://id:[email protected]/test6/11

NativeOssFileSystem.listStatus return null

NativeOssFileSystem.listStatus should not return null if the directory on OSS has nothing in it(files or directories). Otherwise, it will case NPE in many place.

Make sure it return a empty FileStatus[] please

image

使用master-2.x时,sparksql报错, Failed to find data source: org.apache.spark.aliyun.maxcompute.datasource.

错误如下,该如何解决呢?谢谢
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.aliyun.maxcompute.datasource. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:549) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:301) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) at ODPSBySPARK2XSQL$.main(ODPSBySPARK2XSQL.scala:19) at ODPSBySPARK2XSQL.main(ODPSBySPARK2XSQL.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.aliyun.maxcompute.datasource.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533) ... 7 more

NPE when inputfile does not exist in OSS

def readOssFileWithJava(
path: String,
minPartitions: Int): JavaRDD[String] = {
new JavaRDD(readOssFile(path, minPartitions))
}

when path does not exist in OSS, it will throw a NPE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.