Comments (8)
We're glad that you found our package as a way to improve the performance of your code. We hope it will eventually work in your scenario too.
Unfortunately, I see some unclarities in your example implementation, so it's hard to investigate this case without better insight into your real program/schemas (as I don't know what is an error in your code and what is just an error in code pasting). Similar routine (DataFileReader, FastSpecificDatumReader, 1M LargeAndFlat records) have 2x boost in my tests - so indeed it is very likely that your code is falling back to the default implementation.
Could you carefully verify that you don't have any error logs (especially at the start of processing) which could hint us what does not work?
from avro-fastserde.
I've updated the typos in the original post but thought it best to include the complete class for processing byte messages off a jms queue;
private class jmsListener implements MessageListener, ExceptionListener {
final FastSerdeCache serdeCache = new FastSerdeCache("./build/classes/main/com/schema/avro/hf_data");
final FastSpecificDatumReader<HfReadData> fastSpecificDatumReader = new FastSpecificDatumReader<HfReadData>(HfReadData.SCHEMA$, HfReadData.SCHEMA$, serdeCache);
DataFileReader<HfReadData> dataFileReader;
@Override
public void onMessage(Message message) {
BytesMessage bytesMessage = (BytesMessage) message;
try {
byte[] byteArray = new byte[(int) bytesMessage.getBodyLength()];
bytesMessage.readBytes(byteArray);
SeekableByteArrayInput byteStream = new SeekableByteArrayInput(byteArray);
dataFileReader = new DataFileReader<HfReadData>(byteStream, fastSpecificDatumReader);
while (dataFileReader.hasNext()) {
HfReadData hfRead = hfReadQueue.take();
hfRead = dataFileReader.next(hfRead);
parseAvroMessage(hfRead);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
The only difference between using fastserde and pure avro is replacing the Avro SpecificDatumReader
final SpecificDatumReader<HfReadData> avroHfReader = new SpecificDatumReader<HfReadData>(HfReadData.SCHEMA$);
with the fast serde FastSpecificDatumReader
final FastSpecificDatumReader<HfReadData> fastSpecificDatumReader = new FastSpecificDatumReader<HfReadData>(HfReadData.SCHEMA$, HfReadData.SCHEMA$, serdeCache);
and passing the instance of the Avro SpecificDatumReader created above
dataFileReader = new DataFileReader<HfReadData>(byteStream, avroHfReader);
with the instance of the fastserde FastSpecificDatumReader (also created above)
dataFileReader = new DataFileReader<HfReadData>(byteStream, fastSpecificDatumReader);
The HfReadData class referenced in the code was generated by Avro using the schema.
I can see no errors in my logs during startup nor do I see any exceptions at runtime.
from avro-fastserde.
Do you have to reuse HfReadData object? For now, we don't support object reuse (as stated in "Limitations" paragraph in the readme), so I think that may be some issue. You can verify, that plain "dataFileReader.next()" loop will work faster indeed.
from avro-fastserde.
Hi,
We’ve had great success with avro-fastserde in our own project. This is a truly fantastic piece of software. Thank you for that!
We have decided to invest in it and add some additional performance enhancements. Among those is support for object re-use. Our fork of this project is available here, if you’d like to take a look:
https://github.com/linkedin/avro-util
-F
from avro-fastserde.
Hi,
Glad to hear that you find this library useful! I briefly looked over your branch and the enhancements you made are really worth incorporating into this repository. Thank you for adding the object reuse. We haven't provided it to this day, because we didn't really need that feature and I wasn't sure if it won't spoil the overall performance gain. Would you mind if I asked you to provide the pull-request?
from avro-fastserde.
Sure, we can look into providing a PR. We have shied away from doing so so far because we were not sure whether our changes were useful and appropriate for you. In particular:
-
We have a mildly-annoying requirement to support many versions of Avro, because we need to support a mix of modern use cases written against Avro 1.7 and 1.8 as well as decade-old applications written against Avro 1.4. Unfortunately, Avro has poor API compatibility practices, which makes this tricky. Therefore, most of our infra projects (and fast-avro falls into this category) need to code against our
avro-migration-helper
shim in order to be compatible with many versions of Avro. This indirection adds a bit of complexity to the code, and is not really adding any value for users who have the luxury to depend on modern versions of Avro exclusively. -
We have removed the locally-cached code-generated classes in our fork, because we considered that the code-gen time seemed acceptable as is and, more importantly, because we wanted to maintain the ability to iterate over the code-generation such that we had guarantees that deploying a new version of the lib would not result in interference by previous runs of the code-gen. This is not an insurmountable problem, so if the cache is really desired, we could address the freshness guarantees concern by versioning the code-generation and including the version as part of the cached classes' name, but we have not bothered doing that so far.
-
Since code generation is quite complex in general, we have made some refactoring which we believe improves readability, but which may not merge super cleanly. This is not necessarily a big deal, as long as you agree that the refactoring we did does improve readability and not the other way around (: ...
cc @gaojieliu
LMK what you think.
-F
from avro-fastserde.
Thank you for the detailed answer and willingness to provide the PR. However upon further consideration, I decided to provide the pull request by myself, basing partially on your changes.
This will cause less hassle for you and for us as well.
To be more specific, I will surely add the reuse parameter support but I won't provide the support for legacy avro versions, because we surely don't need this and I think that majority of potential users don't need it as well. I will also consider adding the option to switch generated classes caching on/off.
And I also admit that the readability is rather poor as-is and has to be rethinked and improved.
From our perspective this is a side project, so it may take a while for us to introduce these changes, but nevertheless they will appear soon.
Anyways, thank you for sharing your fork and investing your time into this project!
from avro-fastserde.
Oh, I just saw this by chance. Not sure why I didn't get the notification for it... anyway.
BTW, we have a fix for the object re-use which is not yet merged in the other repo, here: linkedin/avro-util#10
That code (including the PR) runs in production and we have validated that object re-use works fully as expected.
-F
from avro-fastserde.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from avro-fastserde.