Comments (14)
Are you positive that the file size would get bumped intermediately during the writing processes flush to disk?
We may also want to look at using some other limiting factor, like a max send config, as well.
from vonnegut.
My thought was something like this:
- logfile stands at 10 messages, all in the same file.
- fetch request for offset 4 arrives
- write request for message 11 arrives, it's a large message
- first chunk of 11 is flushed to files
- fetch request finds the current file size
- second chunk of 11 is flushed
- sendfile sends 4-10 and first chunk of 11
Is there something that makes this interleaving impossible?
We should definitely allow max bytes. I'd like to add offset ranges for fetch, but it doesn't seem to be supported by the protocol.
from vonnegut.
That is my question, would a flush of message 11 result in this interleaving, or would the size and file not actually be updated until the whole chunk was written.
from vonnegut.
It's probably a per-filesystem thing. This suggests that anything over 4k is potentially a problem: http://stackoverflow.com/questions/29866047/whats-the-atomic-disk-write-for-a-linux-filesystem
from vonnegut.
Ok. Kafka also has a limit on the max message size.
from vonnegut.
urgh, complicated config: http://stackoverflow.com/questions/21020347/kafka-sending-a-15mb-message
anyway, that's way bigger than 4k, and I don't imagine a 4k limit on message size is going to make people happy. also for speed, we're likely going to want to concatenate messages to write them to disk, which could end up breaking them anywhere.
from vonnegut.
I wouldn't be surprised if kafka just sends the extra data and the client is expected to drop it.
The only reason to use the size instead of the lookup is speed.
from vonnegut.
yeah, dropping it is a valid option, but inefficient for large messages, since you can potentially send and resend the data several times.
from vonnegut.
Yea, it is what kafka does. Not that we have to do the same here (we wouldn't even have to consider if we want to break protocol since it would be the same still) but I don't know that doing the lookup is worth it.
from vonnegut.
A thought: if we're passing the latest ack'd txid back up the chain (to allow read load to spread out), we could also pass the offset limit back up and keep it handy.
from vonnegut.
We'd have to keep it handy in the gen_server (bottleneck) or an ets table (meh).
from vonnegut.
Store it in the state for the currently written file, which we have to fetch anyway.
from vonnegut.
The index file? We don't read any other file besides with sendfile and that goes straight to the client.
from vonnegut.
there was some offline discussion; closing as not really important right now, as the client can always limit its over-read by specifying smaller byte limits, and the read optimization that I mentioned isn't likely to be needed for our usecase.
from vonnegut.
Related Issues (20)
- add operational tooling
- reconnect on 131/129
- Use shackle pool for replication
- Use rendevous hash or lexical sort for topic chain placement HOT 1
- Topic migration
- manually regress to test usefulness of proper test
- less encode/decode during replication HOT 1
- Check vg_log_segments:last_in_index/3 HOT 1
- Loading hwm on startup before starting supervisor is too slow HOT 1
- grafana dashboard
- Data integrity
- Partitions and chain layout HOT 2
- terminate after
- metadata mix up
- verify that multi-fetch properly treats error and non-error cases
- Kafka 1.0 record format HOT 2
- test is missing for manager recovery fix HOT 1
- vg_client doesn't validate that the return from the server is what it asked for
- use grpoc instead of atoms HOT 1
- get rid of global HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vonnegut.