Since logfile reads and writes are happening on different processes, we need to make s

My thought was something like this: logfile stands at 10 messa

urgh, complicated config: <a href="http://stackoverflow.com/questions/21020347/kafka-s

current sendfile fetch is racy about vonnegut HOT 14 CLOSED

erleans commented on September 23, 2024

current sendfile fetch is racy

from vonnegut.

Comments (14)

tsloughter commented on September 23, 2024

Are you positive that the file size would get bumped intermediately during the writing processes flush to disk?

We may also want to look at using some other limiting factor, like a max send config, as well.

from vonnegut.

evanmcc commented on September 23, 2024

My thought was something like this:

logfile stands at 10 messages, all in the same file.
fetch request for offset 4 arrives
write request for message 11 arrives, it's a large message
first chunk of 11 is flushed to files
fetch request finds the current file size
second chunk of 11 is flushed
sendfile sends 4-10 and first chunk of 11

Is there something that makes this interleaving impossible?

We should definitely allow max bytes. I'd like to add offset ranges for fetch, but it doesn't seem to be supported by the protocol.

from vonnegut.

tsloughter commented on September 23, 2024

That is my question, would a flush of message 11 result in this interleaving, or would the size and file not actually be updated until the whole chunk was written.

from vonnegut.

evanmcc commented on September 23, 2024

It's probably a per-filesystem thing. This suggests that anything over 4k is potentially a problem: http://stackoverflow.com/questions/29866047/whats-the-atomic-disk-write-for-a-linux-filesystem

from vonnegut.

tsloughter commented on September 23, 2024

Ok. Kafka also has a limit on the max message size.

from vonnegut.

evanmcc commented on September 23, 2024

urgh, complicated config: http://stackoverflow.com/questions/21020347/kafka-sending-a-15mb-message

anyway, that's way bigger than 4k, and I don't imagine a 4k limit on message size is going to make people happy. also for speed, we're likely going to want to concatenate messages to write them to disk, which could end up breaking them anywhere.

from vonnegut.

tsloughter commented on September 23, 2024

I wouldn't be surprised if kafka just sends the extra data and the client is expected to drop it.

The only reason to use the size instead of the lookup is speed.

from vonnegut.

evanmcc commented on September 23, 2024

yeah, dropping it is a valid option, but inefficient for large messages, since you can potentially send and resend the data several times.

from vonnegut.

tsloughter commented on September 23, 2024

Yea, it is what kafka does. Not that we have to do the same here (we wouldn't even have to consider if we want to break protocol since it would be the same still) but I don't know that doing the lookup is worth it.

from vonnegut.

evanmcc commented on September 23, 2024

A thought: if we're passing the latest ack'd txid back up the chain (to allow read load to spread out), we could also pass the offset limit back up and keep it handy.

from vonnegut.

tsloughter commented on September 23, 2024

We'd have to keep it handy in the gen_server (bottleneck) or an ets table (meh).

from vonnegut.

evanmcc commented on September 23, 2024

Store it in the state for the currently written file, which we have to fetch anyway.

from vonnegut.

tsloughter commented on September 23, 2024

The index file? We don't read any other file besides with sendfile and that goes straight to the client.

from vonnegut.

evanmcc commented on September 23, 2024

there was some offline discussion; closing as not really important right now, as the client can always limit its over-read by specifying smaller byte limits, and the read optimization that I mentioned isn't likely to be needed for our usecase.

from vonnegut.

current sendfile fetch is racy about vonnegut HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent