Giter Site home page Giter Site logo

Comments (24)

sakno avatar sakno commented on July 22, 2024 1

It's hard to say what's the root cause of the problem because there is no stable repro. I can only guess. Possibly it happens because of network timeouts leading to cancellation of the token used by WAL internally to perform I/O. Some I/O were done in a way not safe for cancellation, I've prepared the potential fix. I can't release it right now.

from dotnext.

sakno avatar sakno commented on July 22, 2024

Have you set UseLegacyBinaryFormat property to true?

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

Yes, It is working when I set it back to the old mode. I always errase the previous data storage when I test.
@sakno

from dotnext.

sakno avatar sakno commented on July 22, 2024

Do you mean that it crashes on empty WAL with a new format?

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

It happen with empty WAL sometime and sometime after an amount of time with the existing WAL @sakno .

from dotnext.

sakno avatar sakno commented on July 22, 2024

Do you have a stable repro? I see that the second stack trace is from the tests in your repository.

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

It is a kind of random behavior.
But when it start to happen it does not stop.

First logs comes from our production.
Second comes from dev environment from one of 3 nodes at startup.
@sakno

from dotnext.

sakno avatar sakno commented on July 22, 2024

It could happen if you trying to open WAL produced by version < 5.4.0 with a new version >=5.4.0 without UseLegacyBinaryFormat set to true. Are you sure that dev environment starts with clean environment without older WAL files?

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

Yes I am sure. My store for testing was completly errased.
Idem for the updated production. @sakno

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

SlimFaas is compiled in AOT.

from dotnext.

sakno avatar sakno commented on July 22, 2024

The second stack trace indicates that WAL is trying to read existing files:

at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.Initialize() + 0x2a2
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<.ctor>g__CreateTables|28_1(SortedSet`1, DirectoryInfo, Int32, Int32, PersistentState.BufferManager&, Int32, PersistentState.WriteMode, Int64) + 0x14f

There is a code for Initialize:

internal override void Initialize()
{
using var handle = File.OpenHandle(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, FileOptions.SequentialScan);
// read header
if (RandomAccess.Read(Handle, header.Span, fileOffset: 0L) < HeaderSize)
{
header.Span.Clear();
}
else if (IsSealed)
{
// partition is completed, read table
var tableStart = RandomAccess.GetLength(Handle);
RandomAccess.Read(Handle, footer.Span, tableStart - footer.Length);
}
else
{
// read sequentially every log entry
int footerOffset;
long fileOffset;
if (PartitionNumber is 0L)
{
footerOffset = LogEntryMetadata.Size;
fileOffset = HeaderSize + LogEntryMetadata.Size;
}
else
{
footerOffset = 0;
fileOffset = HeaderSize;
}
for (Span<byte> metadataBuffer = this.metadataBuffer.Span, metadataTable = footer.Span; ; footerOffset += LogEntryMetadata.Size)
{
var count = RandomAccess.Read(Handle, metadataBuffer, fileOffset);
if (count < LogEntryMetadata.Size)
break;
fileOffset = LogEntryMetadata.GetEndOfLogEntry(metadataBuffer);
if (fileOffset <= 0L)
break;
metadataBuffer.CopyTo(metadataTable.Slice(footerOffset, LogEntryMetadata.Size));
}
}
}

To get an exception like in your stack trace the program needs to go to the second or third if branch. It is possible only if there is a file in the file system.

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

forgot the latest logs @sakno I may made a mistake in our dev kubernetes environment.

Here the logs my collegues sent to me from the crash in production. Occur with the new protocol (in random laps of time near 48 hours and do not happen with the old one). I think it manage near 400 000 writes operation by day.
slimfaas-1-slimfaas.log
slimfaas-2-slimfaas.log
slimfaas-0-slimfaas.log

I do no kown where can come from the negative number.

from dotnext.

sakno avatar sakno commented on July 22, 2024

How WAL is configured? How many records per partition, parallel IO, etc? What's the target architecture, x86_64?

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

Target architecture is x86 64.
The other options I do not know what it is. Here is the SlimData persistent constructor https://github.com/AxaFrance/SlimFaas/blob/2ca3a8c7589b87dcd560164d7ed643f8f17aa89b/src/SlimData/SlimPersistentState.cs#L19

Thank you @sakno for your help

from dotnext.

sakno avatar sakno commented on July 22, 2024

Did you have a chance to check the fix?

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

Hi @sakno do you have a way to publish an alpha?

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

My level in c# is not the best 😜
It my favorite one but i do not code a lot with (unfortunately).

from dotnext.

sakno avatar sakno commented on July 22, 2024

You can reference a project explicitly from your csproj file without published alpha.

from dotnext.

sakno avatar sakno commented on July 22, 2024

Release 5.7.0 has been published.

from dotnext.

guillaume-chervet avatar guillaume-chervet commented on July 22, 2024

Thank you @sakno I test it today and tell you if it fix the problem

from dotnext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.