Giter Site home page Giter Site logo

ghc-syntax-highlighter's Introduction

GHC syntax highligher

License FreeBSD Hackage Stackage Nightly Stackage LTS CI

This is a syntax highlighter library for Haskell using the lexer of GHC.

Here is a blog post announcing the package, the readme is mostly derived from it:

Motivation

Parsing Haskell is hard, because Haskell is a complex language with countless features. The only way to get it right 100% is to use the parser of GHC itself. Fortunately, now there is the ghc package, which as of version 8.4.1 exports enough of GHC's source code to allow us use its lexer.

Alternative approaches, even decent ones like highlight.js either don't support cutting-edge features or do their work without sufficient precision so that many tokens end up combined and the end result is typically still hard to read.

How to use it in your blog

Depends on your markdown processor. If you're an mmark user, good news, since version 0.2.1.0 of mmark-ext it includes the ghcSyntaxHighlighter extension. Due to flexibility of MMark, it's possible to use this highlighter for Haskell and skylighting as a fall-back for everything else. Consult the docs for more information.

skylighting is what Pandoc uses. And from what I can tell it's hardcoded to use only that library for highlighting, so some creativity may be necessary to get it work.

Limitations

CPP directives are not lexed correctly, because the GHC parser is not designed to parse them.

Contribution

Issues, bugs, and questions may be reported in the GitHub issue tracker for this project.

Pull requests are also welcome.

License

Copyright © 2018–present Mark Karpov

Distributed under BSD 3 clause license.

ghc-syntax-highlighter's People

Contributors

amesgen avatar dependabot[bot] avatar mrkkrp avatar resolritter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ghc-syntax-highlighter's Issues

File header pragmas not recognised

File header pragmas such as LANGUAGE and OPTIONS_GHC are given comment tokens instead of pragma tokens:

> :set -XOverloadedStrings
> tokenizeHaskellLoc "{-# OPTIONS_GHC -fno-warn-unused-matches #-}"
Just [(CommentTok,Loc 1 1 1 45)]
> tokenizeHaskellLoc "{-# LANGUAGE ScopedTypeVariables #-}"
Just [(CommentTok,Loc 1 1 1 37)]
> tokenizeHaskellLoc "{-# INLINE func #-}"
Just [(PragmaTok,Loc 1 1 1 11),(VariableTok,Loc 1 12 1 16),(PragmaTok,Loc 1 17 1 20)]

I can't tell if this error stems from the ghc lexer or not - I don't understand it well enough.

wish: ghc-syntax-highlighter in ghci

This would be great!
I have no idea how difficult it would be to implement this?
The BIG ADVANTAGE would be that things like shh would work with no modifications. Are there editors which support ghc-syntax-highlighter?

Improve readme

Some people requested a more complete readme, so I guess this must be done.

conflicting bounds on 0.6.0.0

Hi, the 0.0.6.0 release says base >=4.12 (suggesting GHC 8.6+) but also ghc-lib-parser ==8.10.* (GHC 8.10 only). I think this is causing confusion somehow, perhaps because base is special, for stackage CI (failure, discussion) and perhaps others ? A possible fix: add a revision to 0.6.0.0 limiting to base == 4.14* (GHC 8.10 only).

Provide both span locations and input chunks...

For some applications both locations and input chunks are needed.
Since we provide both of them separately, it would be nice to be able to have both.

The naive attempt of using zip <$> tokenizeHaskell <*> tokenizeHaskellLoc will provide corrupt output.

VariableTok token too broad

I don't know if this is a limitation of how ghc lexes haskell, but the resulting image is undesirable for syntax highlighting imo. Without highlighting VariableTok , it just looks barren.

IMG
(note how each function in the body of tokenizeHaskell is highlighted as part of the VariableTok group)
Could VariableTok be meaningfully broken up into smaller tokens?

Provide just span locations

For some use cases returning chunks of input stream is not handy. We should also provide a function that will return start/end positions of code spans.

ifdefs on a case expression makes "tokenizeHaskellLoc" return "Nothing"

Sorry for the possibly wrong issue title, as I am quite ignorant about Haskell ATM so could've not specified it better.

Anyways, I'm finding the following expression does not lex properly (please ignore the missing definitions as they are not relevant). More specifically, it's the nargs {...} case expression part; removing it seems to fix the problem.

{-# LANGUAGE NamedFieldPuns    #-}
{-# LANGUAGE TemplateHaskell   #-}
{-# LANGUAGE CPP               #-}

functionImplementation :: Name -> Q ([ArgType], Exp)
functionImplementation functionName = do
    fInfo <- reify functionName
    nargs <- mapM classifyArgType $ case fInfo of
#if __GLASGOW_HASKELL__ < 800
            VarI _ functionType _ _ ->
#else
            VarI _ functionType _ ->
#endif
                determineNumberOfArguments functionType

            x ->
                error $ "Value given to function is (likely) not the name of a function.\n" <> show x

    e <- topLevelCase nargs
    return (nargs, e)

Full source

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.