josephadler / fast-avro-storage Goto Github PK
View Code? Open in Web Editor NEWRewrite of Avro storage functions for Pig (with Trevni support)
Rewrite of Avro storage functions for Pig (with Trevni support)
grunt> register target/fastavrostorage-0.1-SNAPSHOT.jar
grunt> emails = LOAD '/me/Data/enron.avro' using com.linkedin.pig.FastAvroStorage();
grunt> describe emails
emails: {message_id: chararray,date: chararray,from: (address: chararray,name: chararray),subject: chararray,body: chararray,tos: {(())},ccs: {(())},bccs: {(())}}
grunt> store emails into '/me/Data/enron.trevni' using com.linkedin.pig.TrevniStorage();
2012-10-30 12:58:20,138 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-10-30 12:58:20,188 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias emails
2012-10-30 12:58:20,188 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias emails
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1552)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.NullPointerException
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:220)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:295)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:251)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorage.checkSchema(FastAvroStorage.java:281)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.access$400(PigServer.java:127)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
... 13 more
Details also at logfile: /private/tmp/pig_1351627054390.log
enron.avro is available here: https://s3.amazonaws.com/rjurney.public/enron.avro
Documentation on the file and how it was created are here and here:
http://hortonworks.com/blog/the-data-lifecycle-part-one-avroizing-the-enron-emails/
https://github.com/rjurney/enron-avro
The error and script to reproduce are here:
grunt> REGISTER target/fastavrostorage-0.1-SNAPSHOT.jar
grunt> emails = LOAD '/me/Data/enron.avro' using com.linkedin.pig.FastAvroStorage();
2012-10-26 16:11:53,810 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245:
<line 1, column 9> Cannot get schema from loadFunc com.linkedin.pig.FastAvroStorage
2012-10-26 16:11:53,810 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245:
<line 1, column 9> Cannot get schema from loadFunc com.linkedin.pig.FastAvroStorage
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:155)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.visitor.CastLineageSetter.(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: avroSchemaToResourceSchema only processes records
at com.linkedin.pig.FastAvroStorageCommon.avroSchemaToResourceSchema(FastAvroStorageCommon.java:194)
at com.linkedin.pig.FastAvroStorageCommon.fieldToResourceFieldSchema(FastAvroStorageCommon.java:120)
at com.linkedin.pig.FastAvroStorageCommon.avroSchemaToResourceSchema(FastAvroStorageCommon.java:190)
at com.linkedin.pig.FastAvroStorageCommon.avroSchemaToResourceSchema(FastAvroStorageCommon.java:92)
at com.linkedin.pig.FastAvroStorage.getSchema(FastAvroStorage.java:118)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
... 22 more
[ERROR] Failed to execute goal on project fastavrostorage: Could not resolve dependencies for project com.linkedin:fastavrostorage:jar:0.1-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-core:jar:1.0.2-p1, voldemort:voldemort:jar:0.96.li3: Could not find artifact org.apache.hadoop:hadoop-core:jar:1.0.2-p1 in central (http://repo1.maven.org/maven2) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
test.pig:
register target/fastavrostorage-0.1-SNAPSHOT.jar
emails = LOAD '/me/Data/enron.avro' using com.linkedin.pig.FastAvroStorage();
store emails into '/me/Data/test.avro' using com.linkedin.pig.FastAvroStorage();
-- store emails into '/me/Data/enron.trevni' using com.linkedin.pig.TrevniStorage();
Error:
Russells-MacBook-Pro:fast-avro-storage rjurney$ pig -l /tmp -x local -v -w test.pig
Warning: $HADOOP_HOME is deprecated.
2012-10-30 13:15:56,702 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0-SNAPSHOT (rexported) compiled Aug 31 2012, 17:08:01
2012-10-30 13:15:56,702 [main] INFO org.apache.pig.Main - Logging error messages to: /private/tmp/pig_1351628156682.log
2012-10-30 13:15:56.796 java[1504:1203] Unable to load realm info from SCDynamicStore
2012-10-30 13:15:56,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2012-10-30 13:15:57,366 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-10-30 13:15:57,406 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null
2012-10-30 13:15:57,406 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.NullPointerException
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:220)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:295)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorageCommon.resourceFieldSchemaToAvroSchema(FastAvroStorageCommon.java:251)
at com.linkedin.pig.FastAvroStorageCommon.resourceSchemaToAvroSchema(FastAvroStorageCommon.java:221)
at com.linkedin.pig.FastAvroStorage.checkSchema(FastAvroStorage.java:281)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Details also at logfile: /private/tmp/pig_1351628156682.log
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.