Giter Site home page Giter Site logo

fast-avro-storage's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fast-avro-storage's Issues

com.linkedin.pig.FastAvroStorage fais to load enron.avro

enron.avro is available here: https://s3.amazonaws.com/rjurney.public/enron.avro
Documentation on the file and how it was created are here and here:

http://hortonworks.com/blog/the-data-lifecycle-part-one-avroizing-the-enron-emails/
https://github.com/rjurney/enron-avro

The error and script to reproduce are here:

grunt> REGISTER target/fastavrostorage-0.1-SNAPSHOT.jar
grunt> emails = LOAD '/me/Data/enron.avro' using com.linkedin.pig.FastAvroStorage();
2012-10-26 16:11:53,810 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245:
<line 1, column 9> Cannot get schema from loadFunc com.linkedin.pig.FastAvroStorage
2012-10-26 16:11:53,810 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245:
<line 1, column 9> Cannot get schema from loadFunc com.linkedin.pig.FastAvroStorage
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:155)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.visitor.CastLineageSetter.(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: avroSchemaToResourceSchema only processes records
at com.linkedin.pig.FastAvroStorageCommon.avroSchemaToResourceSchema(FastAvroStorageCommon.java:194)
at com.linkedin.pig.FastAvroStorageCommon.fieldToResourceFieldSchema(FastAvroStorageCommon.java:120)
at com.linkedin.pig.FastAvroStorageCommon.avroSchemaToResourceSchema(FastAvroStorageCommon.java:190)
at com.linkedin.pig.FastAvroStorageCommon.avroSchemaToResourceSchema(FastAvroStorageCommon.java:92)
at com.linkedin.pig.FastAvroStorage.getSchema(FastAvroStorage.java:118)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
... 22 more

Unable to store Enron emails in TrevniStorage format

grunt> register target/fastavrostorage-0.1-SNAPSHOT.jar
grunt> emails = LOAD '/me/Data/enron.avro' using com.linkedin.pig.FastAvroStorage();
grunt> describe emails
emails: {message_id: chararray,date: chararray,from: (address: chararray,name: chararray),subject: chararray,body: chararray,tos: {(())},ccs: {(())},bccs: {(())}}
grunt> store emails into '/me/Data/enron.trevni' using com.linkedin.pig.TrevniStorage();
2012-10-30 12:58:20,138 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-10-30 12:58:20,188 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias emails
2012-10-30 12:58:20,188 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias emails
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1552)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.NullPointerException
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:220)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:295)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:251)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorage.checkSchema(FastAvroStorage.java:281)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.access$400(PigServer.java:127)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
... 13 more

Details also at logfile: /private/tmp/pig_1351627054390.log

mvn install fails :(

[ERROR] Failed to execute goal on project fastavrostorage: Could not resolve dependencies for project com.linkedin:fastavrostorage:jar:0.1-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-core:jar:1.0.2-p1, voldemort:voldemort:jar:0.96.li3: Could not find artifact org.apache.hadoop:hadoop-core:jar:1.0.2-p1 in central (http://repo1.maven.org/maven2) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Unable to store Enron emails in FastAvroStorage Format

test.pig:

register target/fastavrostorage-0.1-SNAPSHOT.jar
emails = LOAD '/me/Data/enron.avro' using com.linkedin.pig.FastAvroStorage();
store emails into '/me/Data/test.avro' using com.linkedin.pig.FastAvroStorage();
-- store emails into '/me/Data/enron.trevni' using com.linkedin.pig.TrevniStorage();

Error:

Russells-MacBook-Pro:fast-avro-storage rjurney$ pig -l /tmp -x local -v -w test.pig
Warning: $HADOOP_HOME is deprecated.

2012-10-30 13:15:56,702 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0-SNAPSHOT (rexported) compiled Aug 31 2012, 17:08:01
2012-10-30 13:15:56,702 [main] INFO org.apache.pig.Main - Logging error messages to: /private/tmp/pig_1351628156682.log
2012-10-30 13:15:56.796 java[1504:1203] Unable to load realm info from SCDynamicStore
2012-10-30 13:15:56,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2012-10-30 13:15:57,366 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-10-30 13:15:57,406 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null
2012-10-30 13:15:57,406 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.NullPointerException
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:220)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:295)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:251)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorage.checkSchema(FastAvroStorage.java:281)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Details also at logfile: /private/tmp/pig_1351628156682.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.