Giter Site home page Giter Site logo

Comments (8)

dervan avatar dervan commented on June 22, 2024

We're glad that you found our package as a way to improve the performance of your code. We hope it will eventually work in your scenario too.

Unfortunately, I see some unclarities in your example implementation, so it's hard to investigate this case without better insight into your real program/schemas (as I don't know what is an error in your code and what is just an error in code pasting). Similar routine (DataFileReader, FastSpecificDatumReader, 1M LargeAndFlat records) have 2x boost in my tests - so indeed it is very likely that your code is falling back to the default implementation.

Could you carefully verify that you don't have any error logs (especially at the start of processing) which could hint us what does not work?

from avro-fastserde.

zcargnop avatar zcargnop commented on June 22, 2024

I've updated the typos in the original post but thought it best to include the complete class for processing byte messages off a jms queue;

private class jmsListener implements MessageListener, ExceptionListener {

        final FastSerdeCache serdeCache = new FastSerdeCache("./build/classes/main/com/schema/avro/hf_data");
        final FastSpecificDatumReader<HfReadData> fastSpecificDatumReader = new FastSpecificDatumReader<HfReadData>(HfReadData.SCHEMA$, HfReadData.SCHEMA$, serdeCache);

        DataFileReader<HfReadData> dataFileReader;

        @Override
        public void onMessage(Message message) {
            BytesMessage bytesMessage = (BytesMessage) message;

            try {
                byte[] byteArray = new byte[(int) bytesMessage.getBodyLength()];
                bytesMessage.readBytes(byteArray);
                SeekableByteArrayInput byteStream = new SeekableByteArrayInput(byteArray);
                dataFileReader = new DataFileReader<HfReadData>(byteStream, fastSpecificDatumReader);
                while (dataFileReader.hasNext()) {
                    HfReadData hfRead = hfReadQueue.take();
                    hfRead = dataFileReader.next(hfRead);
                    parseAvroMessage(hfRead);
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
}

The only difference between using fastserde and pure avro is replacing the Avro SpecificDatumReader

final SpecificDatumReader<HfReadData> avroHfReader = new SpecificDatumReader<HfReadData>(HfReadData.SCHEMA$);

with the fast serde FastSpecificDatumReader

final FastSpecificDatumReader<HfReadData> fastSpecificDatumReader = new FastSpecificDatumReader<HfReadData>(HfReadData.SCHEMA$, HfReadData.SCHEMA$, serdeCache);

and passing the instance of the Avro SpecificDatumReader created above

dataFileReader = new DataFileReader<HfReadData>(byteStream, avroHfReader);

with the instance of the fastserde FastSpecificDatumReader (also created above)

dataFileReader = new DataFileReader<HfReadData>(byteStream, fastSpecificDatumReader);

The HfReadData class referenced in the code was generated by Avro using the schema.

I can see no errors in my logs during startup nor do I see any exceptions at runtime.

from avro-fastserde.

dervan avatar dervan commented on June 22, 2024

Do you have to reuse HfReadData object? For now, we don't support object reuse (as stated in "Limitations" paragraph in the readme), so I think that may be some issue. You can verify, that plain "dataFileReader.next()" loop will work faster indeed.

from avro-fastserde.

FelixGV avatar FelixGV commented on June 22, 2024

Hi,

We’ve had great success with avro-fastserde in our own project. This is a truly fantastic piece of software. Thank you for that!

We have decided to invest in it and add some additional performance enhancements. Among those is support for object re-use. Our fork of this project is available here, if you’d like to take a look:

https://github.com/linkedin/avro-util

-F

from avro-fastserde.

flowenol avatar flowenol commented on June 22, 2024

Hi,

Glad to hear that you find this library useful! I briefly looked over your branch and the enhancements you made are really worth incorporating into this repository. Thank you for adding the object reuse. We haven't provided it to this day, because we didn't really need that feature and I wasn't sure if it won't spoil the overall performance gain. Would you mind if I asked you to provide the pull-request?

from avro-fastserde.

FelixGV avatar FelixGV commented on June 22, 2024

Sure, we can look into providing a PR. We have shied away from doing so so far because we were not sure whether our changes were useful and appropriate for you. In particular:

  • We have a mildly-annoying requirement to support many versions of Avro, because we need to support a mix of modern use cases written against Avro 1.7 and 1.8 as well as decade-old applications written against Avro 1.4. Unfortunately, Avro has poor API compatibility practices, which makes this tricky. Therefore, most of our infra projects (and fast-avro falls into this category) need to code against our avro-migration-helper shim in order to be compatible with many versions of Avro. This indirection adds a bit of complexity to the code, and is not really adding any value for users who have the luxury to depend on modern versions of Avro exclusively.

  • We have removed the locally-cached code-generated classes in our fork, because we considered that the code-gen time seemed acceptable as is and, more importantly, because we wanted to maintain the ability to iterate over the code-generation such that we had guarantees that deploying a new version of the lib would not result in interference by previous runs of the code-gen. This is not an insurmountable problem, so if the cache is really desired, we could address the freshness guarantees concern by versioning the code-generation and including the version as part of the cached classes' name, but we have not bothered doing that so far.

  • Since code generation is quite complex in general, we have made some refactoring which we believe improves readability, but which may not merge super cleanly. This is not necessarily a big deal, as long as you agree that the refactoring we did does improve readability and not the other way around (: ...

cc @gaojieliu

LMK what you think.

-F

from avro-fastserde.

flowenol avatar flowenol commented on June 22, 2024

Thank you for the detailed answer and willingness to provide the PR. However upon further consideration, I decided to provide the pull request by myself, basing partially on your changes.
This will cause less hassle for you and for us as well.

To be more specific, I will surely add the reuse parameter support but I won't provide the support for legacy avro versions, because we surely don't need this and I think that majority of potential users don't need it as well. I will also consider adding the option to switch generated classes caching on/off.
And I also admit that the readability is rather poor as-is and has to be rethinked and improved.

From our perspective this is a side project, so it may take a while for us to introduce these changes, but nevertheless they will appear soon.

Anyways, thank you for sharing your fork and investing your time into this project!

from avro-fastserde.

FelixGV avatar FelixGV commented on June 22, 2024

Oh, I just saw this by chance. Not sure why I didn't get the notification for it... anyway.

BTW, we have a fix for the object re-use which is not yet merged in the other repo, here: linkedin/avro-util#10

That code (including the PR) runs in production and we have validated that object re-use works fully as expected.

-F

from avro-fastserde.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.