Giter Site home page Giter Site logo

blog's People

Contributors

jeremylandon avatar

Watchers

 avatar  avatar

blog's Issues

TinyCsvParser Benchmark

First of all congratulations to the nice blog post about CSV parsing. ๐Ÿ‘ It was an interesting read and nice to see how TinyCsvParser compares to other solutions. And it's often worth rolling your own stuff to be future-proof if a maintainer loses interest in a project.

You have used the library correctly, so there is no problem with the correctness of your figures!

Hope you appreciate my comment.

I just want to show how the TinyCsvParser API could be used to speed up the parsing. I know benchmarks are always complicated. A comparism between CsvHelpers and TinyCsvParser for example is unfair, because of me using PLINQ. Take PLINQ out of the equation and both would be just as fast... with CsvHelpers having a much, much larger functionality.

See what takes most of the time in your benchmarks isn't the mapping or reading of the file... it is the Tokenizer used in my library. Now in TinyCsvParser you can switch out the implementation and throw in your implementation.

That leads to TinyCsvParser having a Runtime of 120 ms for 100 000 lines. Now the unfair part: I am still doing all counting in the PLINQ pipeline, so it is a parallel operation and you may see different figures when sequentially evaluating the results.

Here is how I would do it.

using NUnit.Framework;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using TinyCsvParser.Mapping;
using TinyCsvParser.Tokenizer;

namespace TinyCsvParser.Test.Integration
{
    public static class MeasurementUtils
    {
        public static void MeasureElapsedTime(string description, Action action)
        {
            // Get the elapsed time as a TimeSpan value.
            TimeSpan ts = MeasureElapsedTime(action);

            // Format and display the TimeSpan value.
            string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
                ts.Hours, ts.Minutes, ts.Seconds,
                ts.Milliseconds / 10);

            TestContext.WriteLine("[{0}] Elapsed Time = {1}", description, elapsedTime);
        }

        private static TimeSpan MeasureElapsedTime(Action action)
        {
            Stopwatch stopWatch = new Stopwatch();
            stopWatch.Start();
            action();
            stopWatch.Stop();

            return stopWatch.Elapsed;
        }

    }

    public class TokenizerBenchmark
    {
        private class TestModel
        {
            public string Prop1 { get; set; }

            public string Prop2 { get; set; }

            public string Prop3 { get; set; }
        }

        private class TestModelMapping : CsvMapping<TestModel>
        {
            public TestModelMapping()
            {
                MapProperty(0, x => x.Prop1);
                MapProperty(1, x => x.Prop2);
                MapProperty(2, x => x.Prop3);
            }
        }

        private class CustomTokenizer : ITokenizer
        {
            public string[] Tokenize(string input)
            {
                var result = new List<string>();

                bool isInQuotes = false;
                
                var chars = input.ToCharArray();
                
                StringBuilder str = new StringBuilder(string.Empty);
                
                foreach (var t in chars)
                {
                    if (t == '"')
                    {
                        isInQuotes = !isInQuotes;
                    }
                    else if (t == ',' && !isInQuotes)
                    {
                        result.Add(str.ToString());

                        str.Clear();
                    }
                    else
                    {
                        str.Append(t);
                    }
                }

                result.Add(str.ToString());

                return result.ToArray();
            }
        }

        [Test]
        public void RunTest()
        {
            var options = new CsvParserOptions(false, new CustomTokenizer());
            var mapping = new TestModelMapping();
            var parser = new CsvParser<TestModel>(options, mapping);

            var testFilePath = GetTestFilePath();

            MeasurementUtils.MeasureElapsedTime("Reading 100 000 Lines", () =>
            {
                var lines = parser
                    .ReadFromFile(testFilePath, Encoding.UTF8)
                    .Where(x => x.IsValid)
                    .Count();

                Assert.AreEqual(100000, lines);
            });
        }

        [SetUp]
        public void SetUp()
        {
            StringBuilder stringBuilder = new StringBuilder();

            for(int i = 0; i < 100000; i++)
            {
                stringBuilder.AppendLine("1312452433443,93742834623543,234277237242");
            }

            var testFilePath = GetTestFilePath();

            File.WriteAllText(testFilePath, stringBuilder.ToString(), Encoding.UTF8);
        }

        [TearDown]
        public void TearDown()
        {
            var testFilePath = GetTestFilePath();

            File.Delete(testFilePath);
        }

        private string GetTestFilePath()
        {
#if NETCOREAPP1_1
            var basePath = AppContext.BaseDirectory;
#else
            var basePath = AppDomain.CurrentDomain.BaseDirectory;
#endif
            return Path.Combine(basePath, "test_file.txt");
        }
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.