Giter Site home page Giter Site logo

Comments (4)

ahives avatar ahives commented on July 20, 2024

Can you please add your query.

from machete.

bryanmatteson avatar bryanmatteson commented on July 20, 2024

The issue is actually separate from any query. The problem is in parser.ParseStream. Here's a full main function that can reproduce the issue with the file I attached.

private static async Task Main(string[] args) {
    var filePath = "HR7Message.txt";

    ISchema<HL7V26Entity> schema = Schema.Factory.CreateHL7<HL7V26Entity>(cfg => cfg.AddFromNamespaceContaining<HL7Version26>());
    IEntityParser<HL7V26Entity> parser = Parser.Factory.CreateHL7(schema);

    using var stream = File.OpenRead(filePath);
    StreamText text = await new StreamTextReader(stream).Text;
    var parse = await parser.ParseStream(text, new TextSpan(0, text.Length)); // <-- problem is here
}

from machete.

dominiqueplante avatar dominiqueplante commented on July 20, 2024

Hi @bryanmatteson

I created a branch with your message and added a performance benchmark

See this branch

I am running 2.2 Ghz I7 2016 MacBookPro and here are the performance benchmarks I got

// * Detailed results *
StreamingParserBenchmarks.StreamingParserBenchmark: Job-KKEHVM(Runtime=Core, InvocationCount=5, LaunchCount=1, RunStrategy=Throughput, TargetCount=5, UnrollFactor=1, WarmupCount=5)
Runtime = .NET Core 2.1.13 (CoreCLR 4.6.28008.01, CoreFX 4.6.28008.01), 64bit RyuJIT; GC = Concurrent Workstation
Mean = 284.0624 ms, StdErr = 0.6272 ms (0.22%); N = 5, StdDev = 1.4024 ms
Min = 282.4000 ms, Q1 = 282.9314 ms, Median = 284.0312 ms, Q3 = 285.2089 ms, Max = 286.2371 ms
IQR = 2.2775 ms, LowerFence = 279.5152 ms, UpperFence = 288.6251 ms
ConfidenceInterval = [278.6631 ms; 289.4617 ms] (CI 99.9%), Margin = 5.3993 ms (1.90% of Mean)
Skewness = 0.4, Kurtosis = 1.56, MValue = 2
-------------------- Histogram --------------------
[281.682 ms ; 286.955 ms) | @@@@@
---------------------------------------------------

Total time: 00:00:26 (26.77 sec)

// * Summary *

BenchmarkDotNet=v0.10.14, OS=macOS 10.14.2 (18C54) [Darwin 18.2.0]
Intel Core i7-4770HQ CPU 2.20GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.201
  [Host]     : .NET Core 2.1.13 (CoreCLR 4.6.28008.01, CoreFX 4.6.28008.01), 64bit RyuJIT
  Job-KKEHVM : .NET Core 2.1.13 (CoreCLR 4.6.28008.01, CoreFX 4.6.28008.01), 64bit RyuJIT

Runtime=Core  InvocationCount=5  LaunchCount=1  
RunStrategy=Throughput  TargetCount=5  UnrollFactor=1  
WarmupCount=5  

                   Method |     Mean |    Error |   StdDev |      Min |      Max |     Gen 0 |    Gen 1 |    Gen 2 | Allocated |
------------------------- |---------:|---------:|---------:|---------:|---------:|----------:|---------:|---------:|----------:|
 StreamingParserBenchmark | 284.1 ms | 5.399 ms | 1.402 ms | 282.4 ms | 286.2 ms | 1200.0000 | 800.0000 | 400.0000 |  15.46 MB |

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Min       : Minimum
  Max       : Maximum
  Gen 0     : GC Generation 0 collects per 1k Operations
  Gen 1     : GC Generation 1 collects per 1k Operations
  Gen 2     : GC Generation 2 collects per 1k Operations
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ms      : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// * Artifacts cleanup *

from machete.

bryanmatteson avatar bryanmatteson commented on July 20, 2024

Right, this is the workaround. In your benchmark, you load the entire stream into memory first, and then use a StringReader and finally a TextReaderStreamTextReader. This will work because it never needs to load another chunk from the stream as it has already been loaded completely into memory by your call to File.ReadAllText(_largeFilePath).

If you change the benchmark to use the StreamTextReader directly, like this:

[Benchmark]
public async Task StreamingParserBenchmarkLargeFileDemo() {
    Console.WriteLine("Starting streaming parser benchmark run");

    using (var stream = File.OpenRead(_largeFilePath)) {
        //Console.WriteLine("About to parse stream");
        StreamText text = await new StreamTextReader(stream).Text;
        ParseResult<HL7Entity> result = await _hl7Parser.ParseStream(text, new TextSpan(0, text.Length));
        ...

then I think you'll run into the bug I'm seeing. So while not using the StreamTextReader is a potential solution, there still exists a bug in the stream parsing logic for really long lines.

from machete.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.