Giter Site home page Giter Site logo

regextract's Introduction

RegExtract

Quick and dirty idiomatic C# line parser that emits typed ValueTuples.

dotnet NuGet Downloads License

Table of Contents

Usage Examples

Date from email header

DateTime date = "Date: Mon, 7 Dec 2020 19:43:24 -0800".Extract<DateTime>(@"Date: (.*)");

List of words

List<string> words = "The quick brown fox jumped over the lazy dogs.".Extract<List<string>>(@"(?:(\w+)\W*)+");

Alternation

var (n,s) = "str".Extract<(int?,string)>(@"(\d+)|(.*)");

Parsing fields

var (n1,y1,e1) = "Hello, earthling, from 2077!".Extract<ParseResult>(@"Hello, (.*), from (?:(\d+)|(.*))!");
var (n2,y2,e2) = "Hello, martian, from earth!".Extract<ParseResult>(@"Hello, (.*), from (?:(\d+)|(.*))!");

record ParseResult(string name, int? year, string loc);

Enums and Flags

var mode = "OpenOrCreate".Extract<FileMode>(@".*");
var flags = "Public,Static".Extract<BindingFlags>(".*");

History

This project came about during day 2 of Advent of Code 2020. The task involved parsing a strings that looked something like:

1-3 a: abcde
1-3 b: cdefg
2-9 c: ccccccccc

From each one, I needed two numbers, a character, and a string. 25 years ago I would have written the following absolutely trivial C code that would take care of parsing it:

int lo, hi; char ch; char pwd[50];

sscanf(line, "%d-%d %c: %s", &lo, &hi, &ch, pwd);

It bothered me that the C to parse this was so much simpler than what I would wirte in C#. For contests, like Advent of Code or Google Code Jam, I usually make extensive use of .Split() a la:

(int lo, int hi, char ch, string pwd) ParseLine(string line)
{
    var splits = line.Split(" ");
    var nums = splits[0].Split("-").Select(int.Parse).ToArray();
    return (nums[0], nums[1], splits[1][0], splits[2]);
}

But that's fiddly to write, and even fiddlier to read. It's good enough for contest conditions, but leaves lots to be desired for sharing solutions and discussing approaches with other competitors.

Undoubtedly, a regular expression has most of the simplicity of that sscanf template from above--you can see what the extra characters are in the template, and you can see what's being extracted:

@"(\d+)-(\d+) (.): (.*)"

Unfortunately .NET Regex Matches leave a lot of fiddling after matching to get to the simple types that you want to use for computation:

(int lo, int hi, char ch, string pwd) ParseLine(string line)
{
    var match = Regex.Match(line, @"(\d+)-(\d+) (.): (.*)");

    return (int.Parse(match.Groups[1].Value),
            int.Parse(match.Groups[2].Value),
            match.Groups[3].Value[0],
            match.Groups[4].Value);
}

So I set out to design the best possible C# syntax that got me somewhere near the expressiveness and simplicity of the sscanf example from the 1960s.

Here's where I settled:

var (lo, hi, ch, pwd) = line.Extract<(int, int, char, string)>(@"(\d+)-(\d+) (.): (.*)");

From there I added support for types with friendly constructors:

var (lo, hi, ch, pwd) = line.Extract<template>(@"(\d+)-(\d+) (.): (.*)");

record template(int lo, int hi, char ch, string pwd);

And support for extracting named groups to properties:

var result = line.Extract<template>(@"(?<lo>\d+)-(?<hi>\d+) (?<ch>.): (?<pwd>.*)");

class template {
    public int lo { get; set; }
    public int hi { get; set; }
    public char ch { get; set; }
    public string pwd { get; set; }
}

regextract's People

Contributors

sblom avatar oparkerj avatar viceroypenguin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.