Giter Site home page Giter Site logo

Comments (14)

tsloughter avatar tsloughter commented on September 23, 2024

Are you positive that the file size would get bumped intermediately during the writing processes flush to disk?

We may also want to look at using some other limiting factor, like a max send config, as well.

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

My thought was something like this:

  1. logfile stands at 10 messages, all in the same file.
  2. fetch request for offset 4 arrives
  3. write request for message 11 arrives, it's a large message
  4. first chunk of 11 is flushed to files
  5. fetch request finds the current file size
  6. second chunk of 11 is flushed
  7. sendfile sends 4-10 and first chunk of 11

Is there something that makes this interleaving impossible?

We should definitely allow max bytes. I'd like to add offset ranges for fetch, but it doesn't seem to be supported by the protocol.

from vonnegut.

tsloughter avatar tsloughter commented on September 23, 2024

That is my question, would a flush of message 11 result in this interleaving, or would the size and file not actually be updated until the whole chunk was written.

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

It's probably a per-filesystem thing. This suggests that anything over 4k is potentially a problem: http://stackoverflow.com/questions/29866047/whats-the-atomic-disk-write-for-a-linux-filesystem

from vonnegut.

tsloughter avatar tsloughter commented on September 23, 2024

Ok. Kafka also has a limit on the max message size.

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

urgh, complicated config: http://stackoverflow.com/questions/21020347/kafka-sending-a-15mb-message

anyway, that's way bigger than 4k, and I don't imagine a 4k limit on message size is going to make people happy. also for speed, we're likely going to want to concatenate messages to write them to disk, which could end up breaking them anywhere.

from vonnegut.

tsloughter avatar tsloughter commented on September 23, 2024

I wouldn't be surprised if kafka just sends the extra data and the client is expected to drop it.

The only reason to use the size instead of the lookup is speed.

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

yeah, dropping it is a valid option, but inefficient for large messages, since you can potentially send and resend the data several times.

from vonnegut.

tsloughter avatar tsloughter commented on September 23, 2024

Yea, it is what kafka does. Not that we have to do the same here (we wouldn't even have to consider if we want to break protocol since it would be the same still) but I don't know that doing the lookup is worth it.

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

A thought: if we're passing the latest ack'd txid back up the chain (to allow read load to spread out), we could also pass the offset limit back up and keep it handy.

from vonnegut.

tsloughter avatar tsloughter commented on September 23, 2024

We'd have to keep it handy in the gen_server (bottleneck) or an ets table (meh).

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

Store it in the state for the currently written file, which we have to fetch anyway.

from vonnegut.

tsloughter avatar tsloughter commented on September 23, 2024

The index file? We don't read any other file besides with sendfile and that goes straight to the client.

from vonnegut.

evanmcc avatar evanmcc commented on September 23, 2024

there was some offline discussion; closing as not really important right now, as the client can always limit its over-read by specifying smaller byte limits, and the read optimization that I mentioned isn't likely to be needed for our usecase.

from vonnegut.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.