I'd like to see engine.io
support a feature called conflation, i.e. the removal of some messages based on criteria like
- the current performance of the client targeted to receive the message
- the frequency of messages
- a newer message making stale all older messages of the same "topic" (a term used in messaging software, somwhat related to how socket.io uses "rooms")
So a more general version of the volatile
feature in socket.io
.
Conflation is especially beneficial to performance of both server and client, when broadcasting and multicasting (e.g. rooms) are frequently used, and protects the server from getting bogged down memory-wise by a single slow consumer who'll cause the buffer to grow to heaven (or until the heartbeat kills the connection, which could still be a lot depending on the msg frequency / heart beat frequency rate and the number of slow consumers). [1]
In case my description above failed to convey the usefulness of conflation, http://magmasystems.blogspot.jp/2006/08/conflation.html has a brief description of that feature and its application to the distribution of price quotes in finance. IBM, too, use conflation for the same purpose: http://publib.boulder.ibm.com/infocenter/imds/v3r0/index.jsp?topic=/com.ibm.imds.doc/welcome.html .
If engine.io wants to enable conflation feature based on the client's performance consuming messages, it has to get support from the engine.io layer, because that feature depends on the client's state (connection open and drained?), which is - understandably - hidden from the application layer. Conflation based on message frequency alone can obviously be done completely in the application layer, as the application has control over how often it calls emit
and can throttle it without the help of engine.io.
There's a rather straightforward way to implement it so that it is both flexible in terms of the conflation logic, yet does not require complex logic inside engine.io itself, and I've actually already implemented it in socket.io
v0.9.8.
Here's a simplified pseudo code diff, leaving out a couple of intermediate steps:
before
- in myApp
io.emit(myJavascriptObject);
- in socket.io/transport
transport.write(encodePacket(myJavascriptObject))
after
- in myApp, configuration
io.set('conflater', function(messages) { /* for example: */ return [messages[messages.length - 1]]; });
- in myApp, runtime
io.emit(myJavascriptObject);
- in socket.io/transport
conflationBuffer.push(myJavascriptObject);
.onDrained(function() { transport.actualWriteMessages(encodePackets(io.get('conflater')(conflationBuffer))); }
If no conflater function has been configured during initialization, no buffering or calling conflater
will be done.
The conflater
message can
- just return that buffer unchanged, in which case no conflation is performed
- simply remove elements from the array, performing conflation
- remove elements and replace them with fewer or different elements, i.e. performing aggregation
- even add elements, for whatever unknown reason I don't know and don't currently care about
The above is simplification of the algorithm is not the whole truth, however: In reality, the functions in socket.io/transport are NOT given the myJavascriptObject
, as provided by the client, but the already encoded version of it. They only get to see encodePacket(myJavascriptObject)
. There are 2 good reasons for this:
myJavascriptObject
will be encoded only once (in SocketNamespace.packet(..)
)
- the transports can be given serialized packet versions straight out of the RedisStore or whatever other store there might be that has the need to serialize messages
Now, I would not want to hand the encoded message into the application layer for the following 2 reasons:
- the app layer shouldn't know about how stuff gets encoded
- the app layer will have a hard time working on encoded strings, rather than on proper JS objects
I have an idea how to solve this in a way that
- hides the lower level encoding internals from the app layer
- avoid multiple encodings / decodings of the same message (i.e. caching of results)
- avoids having to maintain a 'cache hash table' or similar with all the related problems (when to garbage collect the cache?)
- works for both scenarios without the need to serialize up until the point data is sent to the client (in socket.io: MemoryStore) and with that need (RedisStore)
The question is - is the conflation feature deemed important enough, and is it further considered impossible to implement without adding changing engine.io, as I believe? In that case, I would actually like to prepare a pull request.