Comments (4)
Can you please add your query.
from machete.
The issue is actually separate from any query. The problem is in parser.ParseStream
. Here's a full main function that can reproduce the issue with the file I attached.
private static async Task Main(string[] args) {
var filePath = "HR7Message.txt";
ISchema<HL7V26Entity> schema = Schema.Factory.CreateHL7<HL7V26Entity>(cfg => cfg.AddFromNamespaceContaining<HL7Version26>());
IEntityParser<HL7V26Entity> parser = Parser.Factory.CreateHL7(schema);
using var stream = File.OpenRead(filePath);
StreamText text = await new StreamTextReader(stream).Text;
var parse = await parser.ParseStream(text, new TextSpan(0, text.Length)); // <-- problem is here
}
from machete.
I created a branch with your message and added a performance benchmark
I am running 2.2 Ghz I7 2016 MacBookPro and here are the performance benchmarks I got
// * Detailed results *
StreamingParserBenchmarks.StreamingParserBenchmark: Job-KKEHVM(Runtime=Core, InvocationCount=5, LaunchCount=1, RunStrategy=Throughput, TargetCount=5, UnrollFactor=1, WarmupCount=5)
Runtime = .NET Core 2.1.13 (CoreCLR 4.6.28008.01, CoreFX 4.6.28008.01), 64bit RyuJIT; GC = Concurrent Workstation
Mean = 284.0624 ms, StdErr = 0.6272 ms (0.22%); N = 5, StdDev = 1.4024 ms
Min = 282.4000 ms, Q1 = 282.9314 ms, Median = 284.0312 ms, Q3 = 285.2089 ms, Max = 286.2371 ms
IQR = 2.2775 ms, LowerFence = 279.5152 ms, UpperFence = 288.6251 ms
ConfidenceInterval = [278.6631 ms; 289.4617 ms] (CI 99.9%), Margin = 5.3993 ms (1.90% of Mean)
Skewness = 0.4, Kurtosis = 1.56, MValue = 2
-------------------- Histogram --------------------
[281.682 ms ; 286.955 ms) | @@@@@
---------------------------------------------------
Total time: 00:00:26 (26.77 sec)
// * Summary *
BenchmarkDotNet=v0.10.14, OS=macOS 10.14.2 (18C54) [Darwin 18.2.0]
Intel Core i7-4770HQ CPU 2.20GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.201
[Host] : .NET Core 2.1.13 (CoreCLR 4.6.28008.01, CoreFX 4.6.28008.01), 64bit RyuJIT
Job-KKEHVM : .NET Core 2.1.13 (CoreCLR 4.6.28008.01, CoreFX 4.6.28008.01), 64bit RyuJIT
Runtime=Core InvocationCount=5 LaunchCount=1
RunStrategy=Throughput TargetCount=5 UnrollFactor=1
WarmupCount=5
Method | Mean | Error | StdDev | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
------------------------- |---------:|---------:|---------:|---------:|---------:|----------:|---------:|---------:|----------:|
StreamingParserBenchmark | 284.1 ms | 5.399 ms | 1.402 ms | 282.4 ms | 286.2 ms | 1200.0000 | 800.0000 | 400.0000 | 15.46 MB |
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Min : Minimum
Max : Maximum
Gen 0 : GC Generation 0 collects per 1k Operations
Gen 1 : GC Generation 1 collects per 1k Operations
Gen 2 : GC Generation 2 collects per 1k Operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ms : 1 Millisecond (0.001 sec)
// * Diagnostic Output - MemoryDiagnoser *
// ***** BenchmarkRunner: End *****
// * Artifacts cleanup *
from machete.
Right, this is the workaround. In your benchmark, you load the entire stream into memory first, and then use a StringReader
and finally a TextReaderStreamTextReader
. This will work because it never needs to load another chunk from the stream as it has already been loaded completely into memory by your call to File.ReadAllText(_largeFilePath)
.
If you change the benchmark to use the StreamTextReader directly, like this:
[Benchmark]
public async Task StreamingParserBenchmarkLargeFileDemo() {
Console.WriteLine("Starting streaming parser benchmark run");
using (var stream = File.OpenRead(_largeFilePath)) {
//Console.WriteLine("About to parse stream");
StreamText text = await new StreamTextReader(stream).Text;
ParseResult<HL7Entity> result = await _hl7Parser.ParseStream(text, new TextSpan(0, text.Length));
...
then I think you'll run into the bug I'm seeing. So while not using the StreamTextReader is a potential solution, there still exists a bug in the stream parsing logic for really long lines.
from machete.
Related Issues (20)
- X12 - incorrect mapping HOT 13
- Add Apache 2.0 license on source code HOT 1
- Documentation and Project Page Update HOT 3
- Cannot stream parse through multiple layouts in the same transaction in X12 HOT 1
- Map condition doesn't work when SegmentList is followed by Segment of same entity type HOT 1
- Create a way to initialize entities through Translate API
- X12 Select applicative throwing NullReferenceException when segment missing HOT 2
- Accessing missing Layout through indexer off of Select applicative throwing ValueMissingException HOT 4
- Fix issues with HL7 2.6 schema HOT 1
- Create way to override entity registration
- DateTimePeriod in X12 Schema not implemented HOT 3
- Layout parser failing to parse X12 loop HOT 1
- Layout parser failing to parse loop 2300 when PAT segment missing
- Not able to add user-defined layouts when registering the schema
- Add ability to return all registered layouts and entities
- Need ability to format a Layout
- Update 5010 AAA segment HOT 2
- 5010 CTT Segment - Description should be string HOT 2
- Support H3 Segment for 5010 spec HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from machete.