Comments (7)
Dmitriy V. Ryaboy / @dvryaboy:
Is this an actual apache requirement, or simply a convention?
I am not sure how this will help our users, but it will certainly cause migration pain.
from parquet-java.
Julien Le Dem / @julienledem:
as we're going to change the maven coordinates we need to change the package name as well.
Otherwise you can have conflicts where 2 artifacts that look distinct in maven pull the same classes.
from parquet-java.
Ryan Blue / @rdblue:
I'm attaching my refactor script, which does most of the work and has been verified up through parquet-pig. The parquet-hadoop module needs a lot of changes outside of what is done in the script because there are a lot of code references that use full package names when there are collisions with parquet-format classes.
This is handled correctly (and probably more reliably) by IntelliJ refactoring, but that requires you to move the package in each module separately, for both test and main. This also causes unused import statements in tests, but that is less of a concern than getting the classes wrong in parquet-hadoop. Either way, the refactor is going to require manual steps and will take a couple of hours.
from parquet-java.
Ryan Blue / @rdblue:
Opened PR for parquet-format: #18
from parquet-java.
Ryan Blue / @rdblue:
I think PR #18 is ready for review. I've also posted a parquet-mr branch that can build using a snapshot of #18. The build and tests pass and I can use parquet-tools for existing parquet files so I don't think there is an incompatibility from the parquet-format changes.
The only problem I ran into and still have to track down is a bug in parquet-tools dump:
[cloudera@quickstart ~]$ hadoop jar parquet-tools-1.6.0rc3-SNAPSHOT.jar dump --debug test.parquet
java.lang.NoSuchMethodError: parquet.hadoop.metadata.ColumnChunkMetaData.getPath()Lparquet/common/schema/ColumnPath;
at parquet.tools.util.MetadataUtils.showDetails(MetadataUtils.java:92)
at parquet.tools.command.DumpCommand.dump(DumpCommand.java:173)
at parquet.tools.command.DumpCommand.execute(DumpCommand.java:134)
at parquet.tools.Main.main(Main.java:219)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
parquet.hadoop.metadata.ColumnChunkMetaData.getPath()Lparquet/common/schema/ColumnPath;
2014-11-26 10:59:33 PST
I don't think this is from the parquet-format changes, but I want to track it down before we merge.
from parquet-java.
Ryan Blue / @rdblue:
Submitted PR #179 with the rename changes.
from parquet-java.
Ryan Blue / @rdblue:
Merged #182
from parquet-java.
Related Issues (20)
- Bump io.airlift:aircompressor to 0.27 in parquet-hadoop
- Add usage documentation for the Java library
- HadoopInputFile to pass down FileStatus when opening file.
- Update .gitattributes to properly classify png files as binary.
- Upgrade merge script to run on python3
- ParquetWriter::close sometimes fail
- Update references to parquet-mr in parquet-format's Readmes HOT 3
- Hadoop vector IO API doesn't handle empty list of ranges
- Update NOTICE to Apache Parquet Java
- Bump Spotless to 2.43.0
- Remove Jackson JDK8 module
- missing changelog in release note
- Parquet check-stats command only supports checking BINARY type columns.
- Migrate Parquet Jira issues to GitHub
- Old Parquet files with wrong Compressed Size not Readable
- Is there any actual conversion implementation for arrow and parquet? HOT 4
- Why doesn't Parquet currently support writing multiple row groups simultaneously? HOT 1
- Introduce issue templates
- Adjust PR template to reference GitHub issue tracker HOT 5
- Double close of ParquetFileWriter in ParquetWriter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet-java.