dbolin / apex.serialization Goto Github PK
View Code? Open in Web Editor NEWHigh performance contract-less binary serializer for .NET
License: MIT License
High performance contract-less binary serializer for .NET
License: MIT License
This is way longer than I wanted.. (Thought it would be useful to be a little verbose as im still digging into this) the main question I have is how suspicious is it that the serialized byte count is different with the same input?
.NET 6 w/ 4.0.3
Soo.. I'm looking into another issue we are having.
The problem with this one is it is truly random. We have in-depth integration tests and on my machine after running the tests over and over... randomly we get exceptions (sometimes it takes 5 attempts, sometimes 20) from the serializer that it cant cast object A to B.
(In case you are curious the exception looks like the following.. but its all inside the generated expressions.. and specific to our Types so its not really helpful)
System.InvalidCastException: Unable to cast object of type 'Engine.DataStructures.Values.SimpleValue' to type 'Engine.DataStructures.Values.Value'.
at Apex.Serialization.Read_System.Collections.Immutable.ImmutableSortedDictionary`2+Node[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Engine.DataStructures.Values.Value, Engine.DataStructures, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]](Closure , BufferedStream& , Binary`2 )
at Apex.Serialization.Binary`2.ReadSealedInternal[T](Boolean useSerializedVersionId)
at Apex.Serialization.Read_System.Collections.Immutable.ImmutableSortedDictionary`2+Node[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Engine.DataStructures.Values.Value, Engine.DataStructures, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]](Closure , BufferedStream& , Binary`2 )
at Apex.Serialization.Binary`2.ReadSealedInternal[T](Boolean useSerializedVersionId)
at Apex.Serialization.Read_Engine.DataStructures.Values.Collection(Closure , BufferedStream& , Binary`2 )
at Apex.Serialization.Binary`2.ReadInternal()
at Apex.Serialization.Read_Engine.Stateless.State.ExecutionStateRoot(Closure , BufferedStream& , Binary`2 )
at Apex.Serialization.Binary`2.ReadSealedInternal[T](Boolean useSerializedVersionId)
at Apex.Serialization.Binary`2.ReadObjectEntry[T]()
at Apex.Serialization.Binary`2.Read[T](Stream
One of the first things I noticed is that the size of the bytes the serializer is different than another with the same input. (I need to check if this is 100% the same input.. its at least the same build up objects.. its all from Test Builders)
So my question is, is this serializer deterministic? What would make it not be?
I'm still digging into my actual problem, but I thought it was weird that the size of the bytes produced is sometimes different, and when the size is different than others it usually explodes deserializing.
Being that this is random is difficult for me to create a small reproducible set.. im still trying though
The exception originates from reading the data in the stream while trying to pull out an already loaded reference via (
) but its the wrong type. (The refIndex used to read the LoadedObjectRefs is off by one)I think the problem is on the writing side and not reading side so far just based on the # of bytes written is different.
This would allow better exceptions in error cases where a mismatching type is attempted to be deserialized (instead of possible crashing the program or deserializing incorrect data).
Hello there!
My name is Ana. I noted that you use the mutation testing tool strykernet in the project.
I am a postdoctoral researcher at the University of Seville (Spain), and my colleagues and I are studying how mutation testing tools are used in practice. With this aim in mind, we have analysed over 3,500 public GitHub repositories using mutation testing tools, including yours! This work has recently been published in a journal paper available at https://link.springer.com/content/pdf/10.1007/s10664-022-10177-8.pdf.
To complete this study, we are asking for your help to understand better how mutation testing is used in practice, please! We would be extremely grateful if you could contribute to this study by answering a brief survey of 21 simple questions (no more than 6 minutes). This is the link to the questionnaire https://forms.gle/FvXNrimWAsJYC1zB9.
Drop me an e-mail if you have any questions or comments ([email protected]). Thank you very much in advance!!
Hey Dominic.
I'm still looking into this, but I'm opening this (before I have a full fix PR) incase you know something off the top of your head that would help me diagnose this.
Basically our application has crashed a few times now due to memory corruption. I've traced it down to Apex.Serialization while deserializing some state. It looks to me that it incorrectly serialized that state. (Its pretty hard to debug compiled expressions.. more on that later...)
I think I got it worked down to a small reproducible sample. What is funny is that while serializing in debug it actually catches that its about to write some bad stuff and explodes.
Example exception while in DEBUG mode of the library
System.InvalidOperationException Operation is not valid due to the current state of the object. at Apex.Serialization.Internal.BufferedStream.CheckReserved(Int32 size) in C:\Github\Apex.Serialization\Apex.Serialization\Internal\BufferedStream.cs:line 152 at Apex.Serialization.Write_Apex.Serialization.Tests.Option(Closure , Option , BufferedStream& , Binary`2 ) at Apex.Serialization.Binary`2.WriteSealedInternal[T](T value, Boolean useSerializedVersionId) in C:\Github\Apex.Serialization\Apex.Serialization\Binary.Internal.cs:line 628
Current thoughts:
Apex.Serialization.Settings
property that turns these sanity checks on even in release mode? (I personally would eat the loss perf to check this to better know that the seriation was accurate and wont crash my app again lol)Settings.IsTypeSerializable()
has some shadiness to it.. im not sure this actually needed. The version of the library we are using is before the whitelisting.. so im prob doing someting wrong.DynamicCode.HandleNullableWrite
It is writing a byte 0 when hasValueMethod is actually null is what is setting off the InvalidOperationException
I'll provide updates to this issue as I continue to work on this.
Library Version Used & .NET Runtime Version:
Reproduced in 1.3.3 (version we were on at the time) and latest in master. (4.0.2)
.NET 3.1 & .NET 6
Steps to Reproduce:
In my branch
I added a test showing the problem.
My branch/commit: Zoxive@c254a0a
I commented out some [Conditional("DEV"]) statements so if you ran it in RELEASE mode you see the problem as well.
Expected Behavior:
Dont create serialized state which when deserializing cause memory corruption and crash the CLR
Actual Behavior:
Work without crashing : )
.NET 4.7.1 and v1.3.4
.NET Core 3.0 and v.2.0.1
Steps to Reproduce:
ApexTest
Expected Behavior:
Serialize and Deserialize two objects
Actual Behavior:
Runtime Error: System.InvalidOperationException: in CheckSize
I've tried to send a stream of Objects over a NetworkStream to another system,
but it crashes after a few objects.
Is this a bug or by design ??
Hello,
would it be possible for version 2.x to still support .NETStandard 2.0?
Don't emit a null byte marker for non-nullable reference types
Based on metadata described in https://github.com/dotnet/roslyn/blob/master/docs/features/nullable-metadata.md
Arrays are problematic for this, because while they offer the most potential benefit, it's not uncommon to have T[] where some elements are actually null (reserved space, sparse array, etc).
Library Version Used & .NET Runtime Version:
Reproduced in 1.3.3 (version we were on at the time) and 2.0.1
.NET 3.0 and 3.1 tested as well
Steps to Reproduce:
I'm still attempting to create a reproducible project.. the object we are serializing is quite a large graph so i havent been able to reproduce it in a simpler form yet.
Expected Behavior:
Deserialize
Actual Behavior:
Dont explode
Notes
I'm hopeful that the dynamic code generation can be rewritten to be more stack efficient, this appears to be the problem. (Feels similar to dotnet/aspnetcore#2737 fixed by dotnet/extensions#570)
If i manually set my StackSize to really high it deserializes fine. set COMPlus_DefaultStackSize=10000000
Hi,
I have run into some issues when combining Apex.Serialization and LZ4 compression.
Library Version Used & .NET Runtime Version:
.NET Framework 4.7.1
Apex.Serialization v1.3.3
Steps to Reproduce:
Expected Behavior:
It should work
Actual Behavior:
It does not works and throws an error:
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at Apex.Serialization.Internal.BufferedStream.Flush()
at Apex.Serialization.Read_APG.SimResults(Closure , BufferedStream& , Binary`1 )
at Apex.Serialization.Binary`1.ReadSealedInternal[T]()
at Apex.Serialization.Binary`1.ReadObjectEntry[T]()
at Apex.Serialization.Binary`1.Read[T](Stream inputStream)
I though it was related to the LZ4 compression I was using and raised and issue with them but they explained that Apex.Serialization does not handle some cases correctly:
MiloszKrajewski/K4os.Compression.LZ4#36 :
You can raise an issue of Apex.Serialization saying it does not work when underlying stream does not return full blocks at once (like network stream).
https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.read?view=netcore-3.1
Note: "Returns: The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached."
There is a lot of libraries which does not handle this case correctly.
In the meantime you can use K4os.Compression.Streams 1.2.2-beta which changed default behaviour (and blocks until full block is read).
Thanks,
Titas
Hi,
we have a project in MVC environment.
We use Apex.Serialization to serialize and then store byte[] in our DB.
Everything works fine until we stop and restart the IIS Application Pool.
After that (but not every time....) the deserialization of an object saved before result in a
"Index out of range. Non-negative value and less than the collection size required" exception.
New serialized object works fine (until the next restart).
Subsequent restarts of Application Pool lead to random results (like the object can be deserialized again or every deserialization throw the exception).
The same code, run under windows project, works fine every time.
Thanks
Andrea
Is there a way I can merge the two files without opening them?
Documentation doesn't mention thread safety but it says to reuse instances: "Always reuse serializer instances when possible, as the instance caches a lot of data to improve performance when repeatedly serializing or deserializing objects."
I have a small test case where two simultaneous tasks running serialization and sharing the same instance, it either write zero bytes or invalid bytes to stream.
ApexBug.zip
Hi,
I am trying to use Apex serialiser to serialise some objects with data stored in them. Some of that data is stored as byte[] and I noticed that the byte[] data is missing from the deserialised objects. I tried to serialise just byte[] and it seems to be throwing a OverflowException. Is there any way to work around this?
Many Thanks,
Titas
Library Version Used & .NET Runtime Version:
Apex.Serialization 1.3.4
.NET Framework 4.7.1
Platform Target: x64
Steps to Reproduce:
Expected Behavior:
It should work
Actual Behavior:
It throws OverflowException and fails to reload
Exception thrown: 'System.OverflowException' in Apex.Serialization.dll
An unhandled exception of type 'System.OverflowException' occurred in Apex.Serialization.dll
Array dimensions exceeded supported range.
Code to reproduce:
using Apex.Serialization;
using System;
using System.IO;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
// Create byte array
Random rnd = new Random(22);
byte[] byteData = new byte[10];
rnd.NextBytes(byteData);
// Save to file - this seems to work fine
Save(byteData, @"D:\test1.bin");
// Load from file - this throws OverflowException
byte[] b1 = Load<byte[]>(@"D:\test1.bin");
Console.WriteLine("Success");
Console.ReadLine();
}
public static void Save(object inv, string filepath)
{
using (FileStream writeFile = File.Create(filepath))
{
using (IBinary apex = Binary.Create())
{
apex.Write(inv, writeFile);
}
}
}
public static T Load<T>(string filepath)
{
T results;
using (FileStream readFile = File.OpenRead(filepath))
{
using (IBinary apex = Binary.Create())
{
results = apex.Read<T>(readFile);
}
}
return results;
}
}
}
Hi,
While deserializing byte array I was throws "The type initializer for 'PerTypeValues`1' threw an exception. "
InnerException: Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe, Version=4.0.4.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
I have referenced System.Runtime.CompilerServices.Unsafe nuget, but it did not help.
Hi.
Apex Not work with dotnet core 8!
Add an option to allow whitelisting types to serialize/deserialize. This may have to be a global option.
I also found this one:
https://github.com/rikimaru0345/Ceras
maybe you could include a comparison? I'd like to change our's and I'm still searching wich one I use
Just looking for a clarification, to make sure the nuget package is representative of the code here.
Library Version Used & .NET Runtime Version:
NET 5
Steps to Reproduce:
Nuget package: https://www.nuget.org/packages/Apex.Serialization/3.0.0
Expected Behavior:
Same version on the repo and nuget package.
Actual Behavior:
Not same version on the repo and nuget package.
Greetings and Regards.
I suggested that you add a compressor like Brotli or Zstd or Snappy to your beautiful library.
Both of these libraries have the highest compression and decompression speed and are very small in size.
Thanks
https://github.com/oleg-st/ZstdSharp
https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.brotlistream?view=net-8.0
C Size | ratio% | C MB/s | D MB/s | Name |
---|---|---|---|---|
32823983 | 32.8 | 3.40 | 67.92 | lzma 9 |
32872154 | 32.8 | 0.31 | 315.27 | brotli 11d27 |
32925079 | 32.9 | 1.70 | 70.67 | lzturbo 49 |
33936389 | 33.9 | 2.57 | 1701.35 | lzturbo 39 |
34105370 | 34.1 | 3.32 | 952.59 | zstd 22 |
36751363 | 36.7 | 48.30 | 1701.59 | lzturbo 32 |
36920708 | 36.7 | 2.98 | 2355.32 | lzturbo 29 |
46546059 | 46.5 | 163.77 | 1489.57 | lzturbo 31 |
46805879 | 46.8 | 44.66 | 940.64 | zstd 9 |
48152545 | 48.1 | 52.94 | 349.62 | brotli 4 |
49497505 | 49.4 | 2.48 | 2299.20 | lizard 49 |
49773790 | 49.7 | 38.08 | 1952.73 | lzturbo 22 |
49860700 | 49.8 | 16.94 | 295.99 | zlib 9 |
49962678 | 49.9 | 35.70 | 294.24 | zlib 6 |
50278958 | 50.2 | 282.43 | 1372.91 | lzturbo 30 |
52509931 | 52.5 | 290.96 | 347.16 | brotli 1 |
52549655 | 52.5 | 239.35 | 2153.41 | lzturbo 21 |
52928477 | 52.9 | 69.17 | 276.75 | zlib 1 |
52983490 | 52.9 | 393.67 | 984.00 | zstd 1 |
54251482 | 54.2 | 2.60 | 4367.15 | lzturbo 19 |
54410769 | 54.4 | 46.37 | 3305.22 | lz4 9 |
55923645 | 55.9 | 188.40 | 4200.23 | lzturbo 12 |
57606731 | 57.6 | 386.90 | 3948.64 | lzturbo 11 |
59085723 | 59.0 | 698.39 | 2196.24 | lzturbo 20 |
61455711 | 61.4 | 800.71 | 4003.54 | lzturbo 10 |
61938605 | 61.9 | 730.46 | 3330.40 | lz4 1 |
100098564 | 100.0 | 8647.84 | 8408.10 | memcpy |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.