Giter Site home page Giter Site logo

microsoft / ts-parsec Goto Github PK

View Code? Open in Web Editor NEW
335.0 8.0 17.0 995 KB

Writing a custom parser is a fairly common need. Although there are already parser combinators in others languages, TypeScript provides a powerful and well-structured foundation for building this. Common parser combinators’ weakness are error handling and ambiguity resolving, but these are ts-parsec’s important features. Additionally, ts-parsec provides a very easy to use programming interface, that could help people to build programming-language-scale parsers in just a few hours. This technology has already been used in Microsoft/react-native-tscodegen.

License: Other

TypeScript 100.00%

ts-parsec's Introduction

ts-parsec

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Using ts-parsec with npm

npm install -g typescript-parsec

Building this repo

yarn
yarn build
yarn test

Packages

  • ts-parsec: Parser combinator for TypeScript
  • tspc-test: Unit test project
  • tspc-utilities: Code generator for developing ts-parsec
    • At this moment, running npm run update will write overloadings for alt and seq for you

Introduction

ts-parsec is a parser combinator library prepared for typescript. By using this library, you are able to create parsers very quickly using just a few lines of code. It provides the following features:

  • Tokenizer based on regular expressions. This tokenizer is designed for convenience. For some cases its performance may be unsatisfying. In this case, you could write your own tokenizer. It is very easy to plug your tokenizer into ts-parsec.
  • Parser combinators.
  • The ability to support recursive syntax.

You are recommended to learn EBNF before using this library.

Please read Getting Started for ramping up, or our document page for deeper understanding.

More Examples

In the Future

Following combinators will be released soon:

  • A context sensitive apply combinator.

Context sensitive tokenizer is also comming.

ts-parsec's People

Contributors

acoates-ms avatar dependabot[bot] avatar fuafa avatar microsoftopensource avatar msftgits avatar simon04 avatar zihanchen-msft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ts-parsec's Issues

Maybe use JSDoc/TSDoc?

I would like to suggest implementing JSDoc/TSDoc so that when working with the framework you don't always have to look in the documentation but that some of this information is already in the code.

Health of the project

Hi,

is this project still actively maintained?

I'm currently considering to use this library in one of my projects but the latest version on npm is 2 years old and I can't see any recent activities on the repo.

thx

Capturing first capture group in Lexer

The code below is untested, but before starting a PR I thought it best to start a discussion:

Foreword

In the code example below I have added a const to store the match of the current substring instead of just testing it for a match. The reason for this is that it allows us to specify a capture group within the lexers token definition.

Rationale

I am not sure if its the responsibility of the lexer/ tokeniser but there seems to be an issue with collisions. Take...

[true, /^[a-z]+/g, TokenKind.FieldName], [true, /^[a-zA-Z\s]+/g, TokenKind.FieldLabel],

... for the string "{name:label}".

We tried to implement a parse that had three tokens one for the FieldName and FieldLabel and another for the semicolon (LabelSeparator)

Unless it's just naivety on our part that didn't work as before it gets to the parsing stage the lexer has already fallen over because of a regex collision i.e. unless I specifically make the label uppercase or contain a space it matches for both the FieldName and FieldLabel and picks the first specified in the lexer.

That forced us to specify the FieldLabel with a prefix of semicolon i.e. [true, /^:[a-zA-Z\s]+/g, TokenKind.FieldLabel],

What that now means is that we have to manually strip the semicolon off at the parsing stage. I was wondering if adding support for the capture group syntax would mitigate it this problem.

[true, /^:([a-zA-Z\s]+)/g, TokenKind.FieldLabel],

Using this regex and (something similar to) the code below it would match on the whole regex but only capture the part we want (if specified)

If this is over-engineering of a problem that doesn't exist (which I have a suspicion it might be) by all means please let me know of the appropriate solution.

Code Example

for (const [keep, regexp, kind] of this.rules) {
            regexp.lastIndex = 0;
            const match = regexp.exec(subString);
            if (match) {
                const text = subString.substr(0, regexp.lastIndex);
                let rowEnd = rowBegin;
                let columnEnd = columnBegin;
                for (const c of text) {
                    switch (c) {
                        case '\r': break;
                        case '\n': rowEnd++; columnEnd = 1; break;
                        default: columnEnd++;
                    }
                }

                const newResult = new TokenImpl<T>(this, input, kind, match[0], { index: indexStart, rowBegin, columnBegin, rowEnd, columnEnd }, keep);
                if (result === undefined || result.text.length < newResult.text.length) {
                    result = newResult;
                }
            }
        }

Is there a way to emit parser failure from my parsers?

I am trying to achieve this, but I can't figure out what's the proper solution:

const dateParser = apply(tok(TokenKind.ISODate), (t) => {
  const parsedDate = parseISO(t.text);
  if (isValid(parsedDate)) {
    return parsedDate;
  }
  return; //what?;
});

What is the proper idiomatic solution to this case?

Push to NPM

The latest changes have not been published on NPM, yet.
Could you one of the maintainers take care to do so?

Thanks ✌️

Proposal: rep_n

I have come across scenarios where it's useful to parse an exact number of the same thing, where the number is not known until runtime. I propose the addition of a rep_n combinator, used like:

  rep_n(parseThing, n)

where parseThing is a parser of some type, and n is the exact number of times that parseThing should be called. This parser fails unless exactly n repetitions of parseThing succeed.

Note: I may be able to create a pull request if that would be welcome.

Monadic bind operator

In parsec and fparsec, you can use the >>= operator (monadic bind) to use the parsed result from a parser to produce a new parser. There doesn't appear to be equivalent functionality in typescript-parsec. I propose the addition of a new combinator combine (since bind might be confusing to experienced javascript programmers):

function combine<TLeft, TRight>(pLeft: Parser<unknown, TLeft>, pRightApply: (val: TLeft) => Parser<unknown, TRight>): Parser<unknown, TRight> { /* implementation */ }

Worst-case performance of alt (perhaps need an alt_sc?)

The alt combinator has some worst-case performance behavior if there isn't a reasonably fast parse fail in any of its arguments. I discovered this during 2021's Advent of Code challenge where I used typescript-parsec on this problem. Using alt to determine the packet type is impossibly slow. I hand-wrote a parser to get past it, but it seems that an alt_sc may be useful to shortcut to the first succeeding argument rather than attempt parsing all arguments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.