Why When inserting messages into SqlStreamStore, it is possible to

Our local workaround (I work with <a class="user-mention notranslate" data-hovercard-t

<a class="user-mention notranslate" data-hovercard-type="use

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

It is possible for subscriptions to miss messages when chasing the tail under high concurrent load,about sqlstreamstore/sqlstreamstore

Comments (30)

jschuetze commented on May 22, 2024 3

Our local workaround (I work with @areicher) essentially does what I think @yreynhout outlined above.

We've updated ReadOnlyStreamStoreBase.ReadAllForwards a little:

        /* --- REMOVED PREVIOUS FIX ---
	// https://github.com/damianh/SqlStreamStore/issues/31
	// Under heavy parallel load, gaps may appear in the position sequence due to sequence
	// number reservation of in-flight transactions.
	// Here we check if there are any gaps, and in the unlikely event there is, we delay a little bit
	// and re-issue the read. This is expected
	// not short circuiting this way anymore - it misses some page boundary cases and gaps in pages which are not the last page (happens to us sometimes)
	if(!page.IsEnd || page.Messages.Length <= 1)
	{
		return await FilterExpired(page, ReadNext, cancellationToken).NotOnCapturedContext();
	}
	*/

	// only short circuit now for empty pages or 'old' pages, where
	// 'old' is defined as pages whose last message is older than _positionWriteDelayThreshold ago
	if(page.Messages.Length == 0 || ((DateTime.UtcNow - page.Messages[page.Messages.Length-1].CreatedUtc) > _positionWriteDelayThreshold))
		return await FilterExpired(page, ReadNext, cancellationToken).NotOnCapturedContext();

	// Check for gap between last page and this.
	if(page.Messages[0].Position != fromPositionInclusive)
	{
		if(!page.IsEnd || page.Messages.Length == 1)
			Logger.DebugFormat("Gap detected at lower page boundary.  Potentially could have lost {lostMessageCount} events if the gap is transient", page.Messages[0].Position - fromPositionInclusive);
		page = await ReloadAfterDelay(fromPositionInclusive, maxCount, prefetchJsonData, ReadNext, cancellationToken);
	}

	// check for gap in messages collection
	for(int i = 0; i < page.Messages.Length - 1; i++)
	{
		var expectedNextPosition = page.Messages[i].Position + 1;
		if(expectedNextPosition != page.Messages[i + 1].Position)
		{
			Logger.InfoFormat("Gap detected in "+(page.IsEnd?"last":"(NOT the last)")+" page.  Returning partial page {fromPosition}-{toPosition}", fromPositionInclusive, fromPositionInclusive+i+1);

			// switched this to return the partial page, then re-issue load starting at gap
			// this speeds up the retry instead of taking a 3 second delay immediately
			var messagesBeforeGap = new StreamMessage[i+1];
			page.Messages.Take(i+1).ToArray().CopyTo(messagesBeforeGap, 0);
			return new ReadAllPage(page.FromPosition, expectedNextPosition, page.IsEnd, page.Direction, ReadNext, messagesBeforeGap);
		}
	}

	return await FilterExpired(page, ReadNext, cancellationToken).NotOnCapturedContext();

[edited to update formatting a little]

from sqlstreamstore.

ryanjshaw commented on May 22, 2024 2

@nordfjord I'm not sure I follow why you are using SERIALIZABLE?
You should execute ReadAllForward with READ COMMITTED and you should execute the count with READ UNCOMMITTED.

I was testing this on Postgres where READ UNCOMMITTED is automatically upgraded to READ COMMITTED so the solution is a no-go for Postgres if it requires READ UNCOMMITTED. Would be lovely if it works for sql-server though 😄

Ah, that's really interesting. This seems like an oversight on the part of the PostgreSQL designers: they support something (SERIAL/GENERATED AS IDENTITY) which allows UNCOMMITTED and ROLLED BACK transactions to affect a resource in the scope of another transaction, but they give you no way to read from that UNCOMMITTED state.

I understand now why you are trying to insert into those positions - it's the only way you can probe the UNCOMMITTED state. I wonder if you can do something like SELECT 1 FROM Messages WITH (UPDLOCK) WHERE Position IN (@TVP), and check the row count, to avoid performing a modification?

Here is what happens in SQL Server -- works perfectly (requires ALLOW_SNAPSHOT_ISOLATION ON and READ_COMMITTED_SNAPSHOT ON or the 2nd transaction will block until the 1st transaction completes):

EDIT: Both MySql and Sqlite support READ UNCOMMITTED. Oracle and PostgreSQL don't, but I don't think SQLStreamStore supports Oracle. So it seems like there is a solution for 3 / 4 supported RDBMSs if somebody wants to implement it.

from sqlstreamstore.

yreynhout commented on May 22, 2024 1

Hi, @areicher, we feel your pain, and I'm personally setting aside some time this weekend to perform a deep dive into this very problem. Trust me when I say, we want this this fixed sooner rather than later. I hope to come back to you with good news.

from sqlstreamstore.

ryanjshaw commented on May 22, 2024 1

For anybody still interested in this topic, I stumbled across @evgeniy-khist and their PostgreSQL event sourcing reference implementation today.

They have an interesting solution that uses transaction IDs to solve this problem, and a thorough discussion of their solution with beautiful sequence diagrams: https://github.com/evgeniy-khist/postgresql-event-sourcing#4-7-1.

You should also note an important caveat that is not discussed in that section, but later in the document:

A long-running transaction in the same database will effectively "pause" all event handlers. pg_snapshot_xmin(pg_snapshot) will return the ID of this long-running transaction and events created by all later transactions will be read by the event subscription processor only after this long-running transaction is committed.

from sqlstreamstore.

aidapsibr commented on May 22, 2024

In SQL Server at least you may be able to handle the uncommitted (2nd case) by using a sp_applock per sequence number as it is a reader/writer lock, you can block readers when it is being written, if you aren't blocked then it was a rollback. Unfortunately the other DBs may not have similar capabilities.

from sqlstreamstore.

areicher commented on May 22, 2024

How is the thinking about this going? As we're doing some acceptance tests trying to incorporate SQStreamStore into a new app, we're finding we can easily replicate the case under load where many events are missed by the subscriptions (~5-10 of 500000). We haven't tried the proposed PR, but we have a very hacky workaround just to prove that this is really the problem.

While at a gut level I don't like the delay approach any more than you appear to, this is a really serious problem that results in data integrity and trust issues, so I'm resigned to the delay. If we don't want to incorporate that into the base library, could we at least expose a IHandeEventStoreGaps or something we can provide a custom implementation for (so we don't have to fork the library)?

Also, we've found that there's another hole in the base implementation in addition to what's described here. When there's only a single message on a page, and the gap that's returned is just before it, that message is missed too. Our wrapper outputs these messages:

00:23:22 Returning contiguous page from base 20813..20815 <= page right before the problem
00:23:22 reading page starting at 20816 with maxCount 1000 <= calling base
00:23:22 Returning contiguous page from base 20817..20817 <= base returned a one-message page with a position higher than the ‘fromPositionInclusive’ it was told to load… and timestamps show that it didn’t wait and retry even though there was a gap

Aside from this issue - this is a really great library - very happy with it's improvements over NEventStore

from sqlstreamstore.

spadger commented on May 22, 2024

@areicher
I fixed this issue from a functional perspective here
just one thing to bear in mind - when we started getting gaps, this implementation's performance bombed. While it's safe, I'm not sure it's a reliable solution for the main project. I didn't try @psibernetic idea because we switched to Kafka for other reasons

from sqlstreamstore.

spadger commented on May 22, 2024

In SQL Server at least you may be able to handle the uncommitted (2nd case) by using a sp_applock per sequence number as it is a reader/writer lock, you can block readers when it is being written, if you aren't blocked then it was a rollback. Unfortunately the other DBs may not have similar capabilities.

from sqlstreamstore.

areicher commented on May 22, 2024

Thanks @yreynhout - that would be great! In case it's helpful: today we copied enough of the classes into our library to apply a temporary fix for the very short term (still based on the idea of delaying, but then on returning the contiguous subset of the page). If you're interested in looking at that for your implementation for some reason, the file's attached (only the ReadAllForwards has relevant changes). In addition to what I mentioned earlier, we think there's still another risk in only looking at the last page for this too, since the boundary is (for this purpose) arbitrary and the gap could appear across it. The attache impl has a time based filter instead of page with the idea that for our case the ES is both the reader and writer so time synchronization shouldn't be a problem.
CustomReadonlyStreamStoreBase.zip

from sqlstreamstore.

yreynhout commented on May 22, 2024

Before we start it's useful to explain the actual problem with our SqlStreamStore implementation on top of SQL Server. The position of a message in the virtual all stream is based on an IDENTITY column. Each message that is appended to a stream gets its own unique position within the all stream. Under normal circumstances, you can think of the position as being part of a monotonically incrementing sequence. However ... each time a stream-append-transaction is rolled back, whatever may be the reason (e.g. wrong expected version, a crash or fail-over, etc...), there's a chance that those positions are lost (this is by design - blame Microsoft SQL Server). In essence, what this means is that positions are NOT monotonic. Phrased differently, you can expect gaps. For example, [0,1,2,3,4,5,6,7,1000,1001,1002,...] is a perfectly acceptable sequence of positions. If only that was the end of our troubles. When stream-append-transactions are in-flight, they each allocate one or more positions. What we do not have control over is the order in which these transactions commit and thus the order in which other - mostly implicit read - transactions observe the committed positions. This due to the transaction isolation level being used and the ORDER BY [Position] clause. In other words, this is yet another source of gaps, but a temporary one. Once the transaction does commit, any later read transactions will observe the positions in the correct order. This is what the title of this issue alludes to - this only happens under higher concurrent load. So to summarize, there are two issues that may cause gaps, one is legit the other only of a temporary nature, without many hints as to which one we're dealing with. Why is this a problem? Mostly because a lot of consumers, especially subscribers, will expect to observe all positions in ascending order.

The only thing I've found that comes close to giving us a certain degree of order are Log Sequence Numbers (LSN) but to be honest I know too little about them, nor how they behave over time and space, let alone how to reconcile them with what we have today. Change data capture builds on top of LSN, so that might be worth exploring in the future but for now, again, I think it'd take us too far away from what we have already in place.

Thus I set out to build a mental model that attempts to overcome this issue as best as I can, building on top of what some of you have already covered.

When a subscription starts from a position that is "far way" from the head (I keep thinking tail), I think there's a window, either in time or number of positions between where the subscription is at and where the head is at, in which it's safe to say that we don't need to be detecting gaps, and can safely ignore them since, if they are there, it's probably because of rollbacks. Given that SQLStreamStore is in charge of the transaction life cycle I'm not envisioning transactions that have a very long lifespan. Nonetheless, I think our best option is to allow people to inject what window they feel comfortable with (think "it's a setting" with a reasonable default). As a subscription "enters" that window, the behavior it exhibits should switch from ignoring gaps to detecting gaps. If the positions being read are increasing monotonically, we're in what I'd call a gapless state. As soon as we detect a gap we enter into a gapped state where we may want to wait for that gap to be filled, albeit in a reasonable amount of time (think "another setting" people can leverage). When positions increase monotonically again, we can switch back to the gapless state. If we "exit" the window between the subscription position and the head we can go back to ignoring gaps (e.g. a subscriber that is slowing down).

I'll be the first to acknowledge that doing it this way does not provide absolute guarantees nor is it a fundamental shift from what has been proposed or implemented thus far. There are certainly edge cases to be covered, especially with regard to the position as of which one subscribes and / or gaps at the edges of a page. Because most of the subscription logic is written on top of the IReadonlyStreamStore abstraction, it's not that hard to emulate an implementation which exhibits gaps. It'll be my next order of business to spike these ideas, refine them and evaluate the result.

from sqlstreamstore.

spadger commented on May 22, 2024

Hmm. That seems like a reasonable idea. Mine bombed because it treated gaps due to rollback the same as gaps due to in-flight transactions. One interesting thing is that there will generally be a subscription that is ahead. That will have to deal with gaps by repolling I think, but it could always report back to a DB table where it got to. I haven't worked with them for a long time, but SQL 2005+ supports push notifications to clients which could be used to notify other active subscriptions that a gap is legitimate. All they'd need to know is where the absolute correct head is: any gaps before that have already been handled. It doesn't help the first consumer, but it will stop churn subsequently.

…

On Sun, 10 Feb 2019, 13:26 Yves Reynhout ***@***.*** wrote: Before we start it's useful to explain the actual problem with our SqlStreamStore implementation on top of SQL Server. The position <https://github.com/SQLStreamStore/SQLStreamStore/blob/master/src/SqlStreamStore/Streams/StreamMessage.cs#L12> of a message in the virtual all stream is based on an IDENTITY <https://github.com/SQLStreamStore/SQLStreamStore/blob/master/src/SqlStreamStore.MsSql/ScriptsV3/CreateSchema.sql#L55> column. Each message that is appended to a stream gets its own unique position within the all stream. Under normal circumstances, you can think of the position as being part of a monotonically incrementing sequence. However ... each time a stream-append-transaction is rolled back, *whatever may be the reason* (e.g. wrong expected version, a crash or fail-over, etc...), there's a chance that those positions are lost (this is by design - blame Microsoft SQL Server). In essence, what this means is that positions are *NOT* monotonic. Phrased differently, you can expect *gaps*. For example, [0,1,2,3,4,5,6,7,1000,1001,1002,...] is a perfectly acceptable sequence of positions. If only that was the end of our troubles. When stream-append-transactions are in-flight, they each allocate one or more positions. What we do not have control over is the *order* in which these transactions commit and thus the *order* in which other - mostly implicit read - transactions observe the committed positions. This due to the transaction isolation level being used and the ORDER BY [Position] clause. In other words, this is yet another source of *gaps*, but a temporary one. Once the transaction does commit, any later read transactions will observe the positions in the correct *order*. This is what the title of this issue alludes to - this only happens under higher concurrent load. So to summarize, there are two issues that may cause *gaps*, one is legit the other only of a temporary nature, without many hints as to which one we're dealing with. Why is this a problem? Mostly because a lot of consumers, especially subscribers, will expect to observe all positions in ascending order. The only thing I've found that comes close to giving us a certain degree of *order* are Log Sequence Numbers (LSN) but to be honest I know too little about them, nor how they behave over time and space, let alone how to reconcile them with what we have today. Change data capture builds on top of LSN, so that might be worth exploring in the future but for now, again, I think it'd take us too far away from what we have already in place. Thus I set out to build a mental model that attempts to overcome this issue as best as I can, building on top of what some of you have already covered. [image: image] <https://user-images.githubusercontent.com/142834/52533143-dadadc80-2d2f-11e9-8f28-218a9acc5fde.png> When a subscription starts from a position that is "far way" from the head (I keep thinking tail), I think there's a window, either in time or number of positions between where the subscription is at and where the head is at, in which it's safe to say that we don't need to be detecting gaps, and can safely ignore them since, if they are there, it's probably because of rollbacks. Given that SQLStreamStore is in charge of the transaction life cycle I'm not envisioning transactions that have a very long lifespan. Nonetheless, I think our best option is to allow people to inject what window they feel comfortable with (think "it's a setting" with a reasonable default). As a subscription "enters" that window, the behavior it exhibits should switch from ignoring gaps to detecting gaps. If the positions being read are increasing monotonically, we're in what I'd call a gapless state. As soon as we detect a gap we enter into a gapped state where we may want to wait for that gap to be filled, albeit in a reasonable amount of time (think "another setting" people can leverage). When positions increase monotonically again, we can switch back to the gapless state. If we "exit" the window between the subscription position and the head we can go back to ignoring gaps (e.g. a subscriber that is slowing down). I'll be the first to acknowledge that doing it this way does not provide absolute guarantees nor is it a fundamental shift from what has been proposed or implemented thus far. There are certainly edge cases to be covered, especially with regard to the position as of which one subscribes and / or gaps at the edges of a page. Because most of the subscription logic is written on top of the IReadonlyStreamStore abstraction, it's not that hard to emulate an implementation which exhibits gaps. It'll be my next order of business to spike these ideas, refine them and evaluate the result. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#121 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AANOrNV4u294F_qLfAYC2RyUZrnZAKEVks5vMB3-gaJpZM4TvPBJ> .

from sqlstreamstore.

yreynhout commented on May 22, 2024

I doubt that's worth the effort and complexity, but true that's also an assumption on my part and all paths are open at this point.

from sqlstreamstore.

aidapsibr commented on May 22, 2024

Something I've been pondering... Isn't Really Committed isolation supposed to prevent the in-flight reading anyway? Essentially, if you read over a section where a tx is in-flight it would block. My only thought here is perhaps it is RCSI (Read Committed Snapshot Isolation) that is "helping" us here by using a lock&free copy-on-read strategy. My suspicion is turning that off would make this go away. Apologies if this is already being done I just don't want to lose these thoughts. Regarding the gaps in rollbacks, can't we use an offset/fetch? Essentially, you hold your position in another table and each position change on the stream changes the offset in the query, so essentially you do ``` SELECT * FROM StreamStore Order by Id Offset &position Fetch 100 ``` And then save your position as you checkpoint.

…

________________________________ From: Yves Reynhout <[email protected]> Sent: Sunday, February 10, 2019 6:00:56 AM To: SQLStreamStore/SQLStreamStore Cc: Ovan Crone; Mention Subject: Re: [SQLStreamStore/SQLStreamStore] It is possible for subscriptions to miss messages when chasing the tail under high concurrent load (#121) I doubt that's worth the effort and complexity, but true that's also an assumption on my part and all paths are open at this point. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#121 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAl8JTLXKsTqiK9a0LVIjG-fvHTKSWNCks5vMCYYgaJpZM4TvPBJ>.

from sqlstreamstore.

yreynhout commented on May 22, 2024

I think only serializable gives that guarantee (but I'll be sure to measure). While correctness is preferred over speed, I'm not convinced this particular problem warrants throwing out the advantages of snapshot isolation. Granted, at this point I'm willing to try anything.

I'm a bit puzzled as to how offset / fetch is going to improve matters, especially given the sorting is happening on ID (I assume you're querying the [Messages] table) - identity column gaps happen because the number allocation happens "outside" of the user transaction (unless maybe if we turn off identity_cache at the database level but then there's a performance penalty for that as well). Could you elaborate a bit more?

from sqlstreamstore.

jschuetze commented on May 22, 2024

I am happy to submit this as a PR (or Aaron would, I'm sure), but given the hesitation around the previous PR, I think we wanted to avoid that unless it was likely to get accepted.

Would you like us to make a PR? To be honest, we would very much rather have this in a patched 1.1.x or in a trunk version than continue using a local workaround.

from sqlstreamstore.

Swoorup commented on May 22, 2024

What's the current consensus on this?

from sqlstreamstore.

bertzor commented on May 22, 2024

What is the status on this issue? Seems like a high priority issue. The SQLStreamStore is not reliable while this is unsolved.

Will this be fixed or is should you not use SQLStreamStore when you want a reliable store?

from sqlstreamstore.

aidapsibr commented on May 22, 2024

@bertzor I think one workaround that came up is running you database connection in serializeable transaction mode. This will definitely hurt performance, but should give guarentees.

from sqlstreamstore.

jschuetze commented on May 22, 2024

We had good luck with the changes I posted above, but ymmv. The missing piece from my snippet above is the backing field:

// new field: constant time window to look back for gaps
private TimeSpan _positionWriteDelayThreshold = TimeSpan.FromMinutes(5);

Keep in mind that my change above was applied to 1.1.3, and I haven't looked at updated sources. I am not actively using EventSourcing at the moment, and unfortunately I don't really have time to do an up-to-date PR for this against the latest sources with tests, but I will try to respond if there are specific questions. It would be great if this made it into official sources, though, because I expect to be back to using SQLStreamStore soon.

I want to add that there were certain activities in our system that could create huge bursts of events (in some cases millions of them). With the patch above, we haven't lost events, but we definitely see the logged messages showing the fixes operating. We also don't see a significant performance hit with this fix vs. the original 'lossy' implementation.

from sqlstreamstore.

ryanjshaw commented on May 22, 2024

An alternative idea: instead of guessing, ask the database. First, check if there are gaps by comparing row count returned vs. last position minus first position in the result set. If there are gaps, issue a READ UNCOMMITTED (e.g. WITH (NOLOCK)) COUNT over the same query you're using to retrieve the messages. The result of this query will (?) accurately account for rolled back transactions vs. uncommitted transactions. If the result matches the number of rows you previously received, you know you can process the page. If it does not match, you repeat the above process until it does. You could probably do this with a single Execute command to eliminate an additional roundtrip. Thoughts?

from sqlstreamstore.

nordfjord commented on May 22, 2024

@ryanjshaw That's a fine idea. I'm not sure it would work because the reads don't necessarily hit the serialization issues (at least not from my preliminary testing)

I did have success with just trying to append to the table with the missing position

-- transaction 1
BEGIN
-- this will be position 13 in the test
SELECT * from dbo.append_to_stream(
    'hashedid'::char(42), 
    'Test-123'::varchar(1000), 
    'hashedmetadataid'::char(42), 
    -2::integer, 
    now()::timestamp without time zone, 
    Array[(uuid_generate_v4(), 'Test', '{"test": 123}', NULL)]::dbo.new_stream_message[])

-- transaction 2
BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
insert into dbo.messages(stream_id_internal, stream_version, position, message_id, created_utc, type, json_data, json_metadata)
values (
        -- we insert position 13 triggering a wait for the 1st transaction
        1, 13, 13, uuid_generate_v4(), now(), '$$gapcheck', '{"position": 13}', null
       ) on conflict do nothing;
-- transaction 2 is now blocked
-- transaction 1
COMMIT;

-- transaction 2 is now unblocked
select * from dbo.read_all(10, 10, true, true);
commit;

So some pseudocode

function read_all (from, limit) {
  result = sql_read_all(from, limit)
  gaps = getGapPositions(result)
  if (gaps.count > 0) {
    gaps = getGaps(result)
    for (gap in gaps) {
      appendResult = insertGapCheckMessageAtPosition(gap)
      if (isSuccessful(appendResult)) {
        // this gap is fine (either a deleted message or reverted trx)
        deleteMessage(appendResult)
      }
    }
    return read_all(from, limit)
  }
}

from sqlstreamstore.

ryanjshaw commented on May 22, 2024

@nordfjord I'm not sure I follow. My understanding of the issue, based on comments from @spadger and @yreynhout, is as follows:

We want to retrieve a page of messages where the range of rows is being determined by an IDENTITY column. Unfortunately using an IDENTITY column in a range query under heavy concurrent load has the problem of ROLLED BACK and UNCOMMITTED transactions affecting the query result in a way that can result in us missing rows if we are not careful.

Note: fromPosition is the parameter @position supplied to https://github.com/SQLStreamStore/SQLStreamStore/blob/master/src/SqlStreamStore.MsSql/ScriptsV3/ReadAllForward.sql.

Scenario 1 - no rolled back or uncommitted transactions in the range

Position 1 COMMITTED
Position 2 COMMITTED
Position 3 COMMITTED

Query Results: 3 rows returned, fromPosition == 1 and lastPosition == 3
Analysis: lastPosition - fromPosition + 1 == rowCount => 3 - 1 + 1 == 3 therefore there are no gaps in our page, we can process the page

Scenario 2 - rolled back/uncommitted exclusively following committed

Position 1 COMMITTED
Position 2 ROLLED BACK
Position 3 UNCOMMITTED

Query Results: 1 row returned, fromPosition == 1 and lastPosition == 1
Analysis: lastPosition - fromPosition + 1 == rowCount => 1 - 1 + 1 == 1 therefore there are no gaps in our page, we can process the page

Position 3, once committed, will be selected for in a subsequent query, which will then map to one of these scenarios depending on exactly how things play out.

Scenario 3 - a rolled back transaction between committed transactions

Position 1 COMMITTED
Position 2 ROLLED BACK
Position 3 COMMITTED

Query Results: 2 rows returned, fromPosition == 1 and lastPosition == 3
Analysis: lastPosition - fromPosition + 1 != rowCount => 3 - 1 + 1 != 2 therefore there are gaps in our page, we need to check if any of these rows are uncommitted rows or if they are all just rolled back transactions

So we issue a READ UNCOMMITTED (e.g. WITH (NOLOCK)) COUNT(*) WHERE Position BETWEEN @fromPosition AND @lastPosition
Query Results: 2 (i.e. there's only COMMITTED/UNCOMMITTED rows between the two positions)

2 matches our row count, therefore we can process the page.

Scenario 4 - an uncommitted transaction between committed transactions

Position 1 COMMITTED
Position 2 UNCOMMITTED
Position 3 COMMITTED

As you can see, the analysis is the same as for scenario 3 -- we have a gap. But in this case we also have a problem, because if we process the page, we will start looking for @position >= 4 and miss @position == 2 forever.

So we issue a same READ UNCOMMITTED COUNT query as per above
Query Results: 3 (i.e. there's THREE COMMITTED/UNCOMMITTED rows between the two positions supplied)

3 does not match our row count, therefore we need to wait for the row in the gap to be committed (or rolled back). We can implement a small delay before restarting our logic, at which point we may see something like:

Position 1 COMMITTED
Position 2 COMMITTED OR ROLLED BACK
Position 3 COMMITTED

which is covered by one of the previous scenarios

Scenario 5 - a mix of rolled back and uncommitted transactions between committed transactions

Position 1 COMMITTED
Position 2 UNCOMMITTED
Position 3 ROLLED BACK
Position 4 COMMITTED

Query Results: 2 rows returned, fromPosition == 1 and lastPosition == 4
Analysis: lastPosition - fromPosition + 1 != rowCount => 4 - 1 + 1 != 2 therefore there are gaps in our page, we need to check if any of these rows are uncommitted rows or if they are all just rolled back transactions

So we issue a same READ UNCOMMITTED COUNT query as per above
Query Results: 3 (i.e. there's THREE COMMITTED/UNCOMMITTED rows between the two positions supplied)

Position 1 COMMITTED
Position 2 COMMITTED OR ROLLED BACK
Position 3 ROLLED BACK
Position 4 COMMITTED

which is covered by one of the previous scenarios

Additional Notes

We use fromPosition and not messages[0].position (i.e. firstPosition) so that it doesn't matter whether the first result is COMMITTED or not.

As I mentioned previously, you could include the READ UNCOMMITTED COUNT(*) query in ReadAllForward as a second result set, but I'm not sure what the performance impact would be. I think this is such a rare scenario it makes sense to only perform the second query when you need it. You may not even need to implement a delay -- by the time the COUNT(*) result is returned, if the results indicate you need to refresh your page due to UNCOMMITTED data being in range, you may find the outstanding transaction(s) have already been COMMITTED or ROLLED BACK.

It also seems as though there is an optimisation desired to only retrieve the 'gap' rows rather than refresh the entire page -- I imagine you could achieve this using a TVP of gap positions supplied to the COUNT(*) command, and supply the same list of positions to a subsequent select. This would return some subset (or complete set) of UNCOMMITTED rows. You can repeat this process until all your missing rows are returned or identified as ROLLED BACK. I'm not sure the complexity cost justifies the performance benefit. If you're running in such a busy environment that is triggering these issues, it seems likely your gaps will be filled by the time you hit the database a second time -- the main problem, in my opinion, is detecting if there is an UNCOMMITTED vs ROLLED BACK transaction in your range, not efficiently filling the gap once you have an answer to that question.

@jschuetze does your solution cater for both ROLLED BACK and UNCOMMITTED? As far as I can tell it only deals with the latter case.

from sqlstreamstore.

nordfjord commented on May 22, 2024

@ryanjshaw Are you sure READ UNCOMMITTED gives you the guarantees you want? I didn't have success even at the SERIALIZABLE isolation level with the following test (on a fresh schema):

transaction 1	transaction 2	result
`BEGIN`
`append_to_stream`		Appends at position 0
	`BEGIN`
	`SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`
	`read_all`	returns nothing
	`select count(*) where position >= 0 and position <= 10`	returns 0
`COMMIT`
	`COMMIT`

I get the same result with three transactions

t 1	t 2	t 3
`BEGIN`
`append_to_stream`
		`BEGIN`
		`append_to_stream`
		`COMMIT`
	`BEGIN`
	`SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`
	`read_all`
	`select count(*) where position >= 0 and position <= 10`
`COMMIT`
	`COMMIT`

Note: I tested this on Postgres

from sqlstreamstore.

ryanjshaw commented on May 22, 2024

@nordfjord I'm not sure I follow why you are using SERIALIZABLE?

You should execute ReadAllForward with READ COMMITTED and you should execute the count with READ UNCOMMITTED.

from sqlstreamstore.

nordfjord commented on May 22, 2024

@nordfjord I'm not sure I follow why you are using SERIALIZABLE?

You should execute ReadAllForward with READ COMMITTED and you should execute the count with READ UNCOMMITTED.

I was testing this on Postgres where READ UNCOMMITTED is automatically upgraded to READ COMMITTED so the solution is a no-go for Postgres if it requires READ UNCOMMITTED. Would be lovely if it works for sql-server though 😄

from sqlstreamstore.

ryanjshaw commented on May 22, 2024

@nordfjord I've been thinking about this and maybe those PostgreSQL guys are smart after all.

I think we could implement a new procedure, WaitForPageToCommit, which looks like this on SQL Server:
SELECT COUNT(*) FROM [Message] WITH (UPDLOCK) WHERE [Position] BETWEEN @fromPosition AND @toPosition

It seems that PostgreSQL has something similar, SELECT ... FOR UPDATE but I can't confirm the behaviour is the same - can you? See https://www.postgresql.org/docs/9.0/sql-select.html#SQL-FOR-UPDATE-SHARE: "if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has already locked a selected row or rows, SELECT FOR UPDATE will wait for the other transaction to complete, and will then lock and return the updated row (or no row, if the row was deleted)."

I tested this with SQL Server and it works perfectly in snapshot isolation mode (default mode for SQLStreamStore).

Aside from the same pattern being applicable to all databases, what's really nice about this approach is you execute this SQL when you detect a gap, and the DBMS will return when your page is ready -- there is no repeated polling, no arbitrary delays, etc. To avoid a 2nd roundtrip to fetch the now committed rows, you can actually just implement WaitForPageToCommit as ReadAllForwards but WITH (UPDLOCK) / FOR UPDATE - potentially with a TVP containing the gap positions if you want to minimise traffic.

In fact it might be possible to resolve this issue just by putting the locking hint directly into the existing ReadAllForwards SQL without any C# code changes, because I can't imagine any reason to ever want to read an incomplete page and WITH (UPDLOCK) / FOR UPDATE guarantee a complete page? There will be a performance hit, though, so I'm not sure this would be an acceptable solution given that introducing WaitForPageToCommit retains existing performance levels while minimising wait time in the pathological case. But I suppose it's good for people to know if they are facing this issue in production that there may be a simple solution until a permanent fix is developed?

from sqlstreamstore.

nordfjord commented on May 22, 2024

Hey, I've tried using FOR UPDATE in postgres but that does not seem to work unfortunately. I don't think there's any way in postgres to make an uncommitted row visible other than probing at a constraint.

But it sounds like there's a decent way forward for the rest of the sql databases.

I think you might run into deadlock scenarios if you introduce locking into ReadAllForwards. imagine you have an event at position 1, and an in flight transaction at position 2, then query with (UPDLOCK) where position between 0 and 10, then before the transaction at position 2 completes another one starts at position 3. Now the tx at pos 3 is waiting for the select to finish, and the select is waiting for the tx to finish 😕 .

from sqlstreamstore.

ryanjshaw commented on May 22, 2024

@nordfjord According to the PostgreSQL manual link I shared, FOR UPDATE only affects other "UPDATE, DELETE, or SELECT FOR UPDATE" transactions so INSERT shouldn't be blocked? But then it seems FOR UPDATE doesn't work as expected so maybe I'm missing something fundamental about the PostgreSQL MVCC model. Hopefully somebody else with PostgreSQL experience can see something we're not; on my part I will be performing more SQL Server testing and see if I can contribute a PR.

from sqlstreamstore.

nordfjord commented on May 22, 2024

Yet another idea for this. What if we used a separate table for the all stream that was updated via a deferrable constraint trigger?

CREATE TABLE all_messages
(
    position   bigserial not null,
    stream_id  text      not null,
    message_id uuid      not null,
    primary key (position)
);

create or replace function append_message_to_all()
    returns trigger
    language plpgsql
as
$$
begin
    insert into all_messages(stream_id, message_id) VALUES (NEW.stream_id_internal, NEW.id);
    return NEW;
end;

$$;

create constraint trigger append_to_all
    after insert
    on messages
    deferrable initially deferred
    for each row
execute procedure append_message_to_all();

The deferrable initially deferred means the procedure will execute at the very end of the transaction before it's committed. This might drastically reduce the likelihood of gaps in the first place.

Alternatively making the all stream an eventually consistent projection is starting to sound like a better and better idea 😂

edit: This definitely wouldn't work because it would screw up the "append result" of append_to_stream

from sqlstreamstore.

adamfur commented on May 22, 2024

Think I found a solution https://github.com/adamfur/Chaser. It runs 4 concurrent writers and one worker that chases the auto increment value. A lot of gaps occurs but it's doesn't cause any problems. I accumulated all the items auto increment values in the loop then compare to an aggregation to the actual content of the database. Always get the correct result.

from sqlstreamstore.

It is possible for subscriptions to miss messages when chasing the tail under high concurrent load about sqlstreamstore HOT 30 OPEN

Comments (30)

Scenario 1 - no rolled back or uncommitted transactions in the range

Scenario 2 - rolled back/uncommitted exclusively following committed

Scenario 3 - a rolled back transaction between committed transactions

Scenario 4 - an uncommitted transaction between committed transactions

Scenario 5 - a mix of rolled back and uncommitted transactions between committed transactions

Additional Notes

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent