Giter Site home page Giter Site logo

corollary's Introduction

Corollary: Haskell to Rust conversion

https://img.shields.io/crates/v/corollary.svg

Corollary is a very experimental Haskell to Rust compiler. The goal is to automate the syntatic conversion of Haskell into Rust, letting users manually finish the conversion into idiomatic Rust code. Along with an (extremely loose) adaptation of Haskell methods in corollary-support, this can expediate the process of completing a full port.

Current status: Looking for maintainers. Corollary can parse and translate entire files, with varying results. Source code specific hacks, along with manual translation, were used for the language-c port of Haskell's C parsing library.

Given this project was purpose-built for porting a single library, you'll find source-specific hacks throughout the codebase, though they should ultimately be removed. There are no solutions yet for the following problems:

  • Haskell's module and import system
  • Haskell's garbage collection (instead, given Haskell values are immutable, we liberally .clone() most values when passed around instead)
  • Top-level functions without explicit type declarations
  • Monads and HKT
  • Tail recursion
  • True laziness
  • Or currying (lacking a better way to involve Haskell's type analysis).

Usage

Corollary can be used as a binary:

cargo install corollary
corollary input/file/path.hs -o target/output.rs

Thsi converts a single source file from Haskell into Rust. You can omit the -o parameter to write the file to stdout. Additionally, you can run a file using the --run parameter.

Corollary will strip any code in a {-HASKELL-} ... {-/HASKELL-} block and include any code in a {-RUST ... /RUST-} block embedded in a file. (See corollary/test for examples.) This allows you to --run a Haskell file directly, given it is self-contained (does not rely on Haskell's module system).

Development

Clone this repository including its test dependencies:

git clone http://github.com/tcr/corollary --recursive

These are the crates contained in this repo:

  • parser-haskell/, an original Haskell Parser written in LALRPOP.
  • corollary/, an experimental Haskell to Rust compiler.
  • corollary-support/, a support crate for converted Haskell code to use.

In addition, libraries to test Corollary against exist in the deps/ directory.

References

License

Corollary and parser-haskell are licensed as MIT or Apache-2, at your option.

corollary's People

Contributors

pshc avatar skade avatar tcr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

corollary's Issues

Type signature wrong

This should extract a as a generic param (somehow) but also make the result Option<([a], a)> not just Maybe.

expr_explode doesn't take into account operator precedence

Right now it does a linear scan for the first operator, then splits an expression into two. Operator precedence would split at the highest-precedence operators first, then iterate on the subsections.

fn expr_explode(span: Vec<Expr>) -> Vec<Expr> {
    ...
    for i in 0..span.len() {
        if let &ast::Expr::Operator(ref op) = &span[i] {
            return vec![ast::Expr::Op(
                Box::new(Expr::Span(expr_explode(span[0..i].to_vec().clone()))),
                op.to_string(),
                Box::new(Expr::Span(expr_explode(span[i+1..].to_vec().clone()))),
            )];
        }
    }
    span
}

Main tracking issue for Corrode port to Rust

There's a lot of small tasks to get to the end, but I roughly think this is the roadmap. Comment below if you're interested in helping out!

Look at language_c.rs and corrode.rs for the state of the automated conversion.

  • Automate bulk cross-compilation as modules (see the out/ directory for current status)
  • Write a proof-of-concept parser for Haskell and compilation to Rust
  • Support functions, variables, literals, types
  • Support case statements and guards
  • Convert instances and data structures correctly
  • Detect pointfree code and convert it into pointwise
  • Properly convert $ and . operators
  • Convert rest of operators into Rust equivalents or fn wrappers
  • Successfully parse all files (except lexer.x parser.y) (failures currently output as // ERROR)
  • Find a way to parse flex-based lexer.x and parser.y files and cross-compile them
  • Switch to manual conversion for all remaining edge cases (tricky code segments, clever monad use, etc.)
  • Pass language-c test bench
  • Pass corrode test bench
  • Port literate Haskell comments into Rust
  • Feature-complete

Support generators

When a generator is encountered in code, we just output it as a /* TODO: Generator */ comment. These are used a few times around the libraries we're porting so we should handle them intelligently.

To repro, save this as test.hs:

module Test()
where

example :: ()
example = do
    let a = [(i,j) | i <- [1,2], j <- [1..4] ]

Running cargo run --manifest-path corollary/Cargo.toml -- test.hs currently outputs this:

// Original file: "test.hs"
// File auto-generated using Corollary.

#[macro_use] use corollary_support::*;

pub fn example() -> () {
    /*do*/ {
        let a = /* Expr::Generator */ Generator;

    }
}

Possibly expected output, using Rust's itertools crate:

pub fn example() -> () {
    /*do*/ {
        let a = iproduct!(vec![1, 2].into_iter(), (1..4).into_iter()).map(|(i, j)| (i, j)).collect::<Vec<_>>();
    }
}

Looking for maintainers for Corollary

This project was instrumental in helping port over parser-c. I think there's a lot of potential in exploring what automated conversion from Haskell to Rust can do, given the similarities between their type systems and the richness of Haskell's ecosystem.

If you're interested in helping explore the space and contributing to Corollary, let me know in this thread. If there is sufficient interest, I'm open to moving this to a Github organization, radically restructuring how Corollary translates code, etc. Or feel free to use this as a template for your own transpiler!

Implement structs

This currently outputs an empty comment /* struct dev */ instead of a struct:

data FunctionContext = FunctionContext
    { functionReturnType :: Maybe CType
    , functionName :: Maybe String
    , itemRewrites :: ItemRewrites
    }

Modules

Not to get ahead of myself, but I think this is how the code eventually will go:

  • parser-haskell, breaking out the LALRPOP code and AST parsing into a subcrate (that can hopefully be useful as a basis for a real parser)
  • corroder, the haskell conversion into rust bits. Also could be its own subcrate but much more dubious if the code useful outside this project
  • parser-c, a port of Haskell's language-c into Rust, mostly done with corroder and the rest with manual tweaks
  • corrode-but-in-rust (or some other name) which is the same for corrode

If corroder were broken out, "Haskell compiler to Rust", probably have to give it a cute name, I was thinking "corollary" ๐ŸŒ

Lambda should parse ExprSpan as a body

Because Lambda is treated as an Expr, and the pattern for several exprs is Expr+, it would be ambiguous to accept ExprSpan as a body. A result of this is the format \lt -> let ... won't work.

`const ()`

What is this?

IntMap.fromSet (const ())

Output is built up quadratically

One thing that I do feel strongly about: building up output by interpolating Strings recursively is expensive.

I'd want rip that bandaid off ASAP, but also avoid making things overly complex. What do you think? Some suggestions:

  • tool PrintState appropriately
  • fn print_expr(PrintState, &ast::Expr) -> OutExpr where OutExpr: Display

I would tend toward the latter, but I don't know what shape of output structures we might want yet.

EDIT: tooling PrintState appropriately also means we could do indentation semi-automatically

Is laziness needed?

I wonder if language-c or corrode make use of laziness (other than in monads) in a way that would we would have to simulate?

let-defined lambdas aren't combined.

Similar to how top-level functions with multiple definitions are folded into a match statement (see the comment "There are multiple impls of this function, so expand this into a case statement" in convert.rs for code that can be reused) lambdas with multiple definitions should be folded into one match statement also.

To repro, save this as test.hs:

module Test()
where

example :: ()
example = do
    let isDefault (Just condition) = Left condition
        isDefault Nothing = Right ()

Running cargo run --manifest-path corollary/Cargo.toml -- test.hs currently outputs this:

// Original file: "test.hs"
// File auto-generated using Corollary.

#[macro_use] use corollary_support::*;


pub fn example() -> () {
    /*do*/ {
        let isDefault = |Some(condition)| {
            Left(condition)
        };

        let isDefault = |None| {
            Right(())
        };
    }
}

Expected output:

pub fn example() -> () {
    /*do*/ {
        let isDefault = |_0| {
            match _0 {
                Some(condition) {
                    Left(condition)
                }
                None {
                    Right(())
                }
            }
        };
    }
}

. operator improperly translated

See this in CrateMap.hs:

parseCrateMap :: String -> Either String CrateMap
parseCrateMap = fmap root . foldrM parseLine (Map.empty, []) . filter (not . null) . map cleanLine . lines

yields

fn parseCrateMap() -> Either {
    fmap(root)foldrM(parseLine, (Map::empty, vec![]))filter((notnull))map(cleanLine)lines
}

Remove parser expression errors

There are a few places in translated code in which /* Expr::Error */ is printed instead of whatever value it should have been. These should be diagnosed and cleaned up.

In parser-c:

src/analysis/ast_analysis.rs
62:    /* Expr::Error */ Error
688:            /* Expr::Error */ Error

src/analysis/const_eval.rs
260:            /* Expr::Error */ Error

src/analysis/decl_analysis.rs
95:    /* Expr::Error */ Error
376:    /* Expr::Error */ Error

src/analysis/trav_monad.rs
140:            /* Expr::Error */ Error

src/analysis/type_check.rs
139:    /* Expr::Error */ Error
209:            /* Expr::Error */ Error
276:            /* Expr::Error */ Error
279:            /* Expr::Error */ Error
282:            /* Expr::Error */ Error
285:            /* Expr::Error */ Error
301:            /* Expr::Error */ Error
407:                let charType = /* Expr::Error */ Error;

src/analysis/type_utils.rs
94:    /* Expr::Error */ Error
98:    /* Expr::Error */ Error

src/analysis/sem_rep.rs
347:    /* Expr::Error */ Error

src/data/input_stream.rs
79:            /* Expr::Error */ Error

src/data/node.rs
45:            /* Expr::Error */ Error

src/parser/lexer.rs
63:    /* Expr::Error */ Error
67:    /* Expr::Error */ Error

src/syntax/constants.rs
108:            /* Expr::Error */ Error
143:            /* Expr::Error */ Error

src/syntax/preprocess.rs
61:    /* Expr::Error */ Error

src/syntax/utils.rs
106:            /* Expr::Error */ Error

In rust-corrode:

src/corrode/c.rs
430:            /* Expr::Error */ Error
526:            /* Expr::Error */ Error
803:            /* Expr::Error */ Error
820:            /* Expr::Error */ Error
823:            /* Expr::Error */ Error

Truncate match arm

This match arm in C.lhs is truncated and just prints Result, for some reason:

CLndOp -> return Result { resultType = IsBool, resultMutable = Rust.Immutable, result = Rust.LAnd (toBool lhs) (toBool rhs) }

Layout to brace expansion broken?

When I try to run cargo test on corollary (after pointing it to a manually-fixed version of petgraph 0.1.18 since that broke on nightly recently), it fails to parse any file because there is no end brace inserted after then { "something.

If I understand the layout rules correctly, there should be no need to insert braces for if-then-else expressions.

View patterns

Simple situations seem possible to convert easily:

reply :: String -> String
reply "hi" = "Hello!"
reply (words -> ["how", "are", "you"]) = "Good, and you?"
reply _ = "Yes."

Possible transformation:

fn reply(s: String) -> String {
    match *s {
        "hi" => "Hello!",
        s if (match words(s) { ["how", "are", "you"] => true, _ => false }) => "Good, and you?",
        _ => "Yes.",
    }.into()
}

There might be more complex cases in the codebase though.

How can you match on an empty array?

Say we're matching:

case value of 
    Container([]) => ...,
    Container(vector) => ...,

This won't work in Rust if you try to match against a container. Unless box [] pattern works. Does this need an extensive AST transform to work?

Document strategy for Lexer.hs and Parser.hs

These files are generated using the Haskell-native parsers, Happy and Alex. So the original code is in Haskell and the output requires a Haskell compiler.

These files are then reformatted with hident to avoid having to add braces to the parser (for now I guess).

Because these are part of language-c and not corrode, it doesn't seem important to translate these to Rust and Rust-based parsers yet, i.e. the goal is to have Corrode patchable in Rust and thuslanguage-c can be opaque. But inevitably it'd be nice to have them not require Haskell preprocessing.

Use ghc-mod for inferring function types

When parsing happy and alex outputted code, that is the generated Lexer and Parser code in gen/, we can see that a large number of functions have inferred type definitions, not explicit ones. The compiler currently turns these into Rust lambdas.

We could do some tricks to infer these types in the compiler, or hope Rust can handle the inference, but I think an easier win is to hook up the compilation stage to ghc-mod (if available), read out the types signatures of these functions, and use them to generate the resulting conversion.

Because the setup is more complex for ghc-mod and only is specifically used in function definitions, this should just be optional (if it fails to run ghc-mod or ghc-mod throws an error, it gets swallowed) so that it doesn't slow down development.

let-defined lambas are generated wrong.

Search for these two lines and you'll see they are generated improperly (not as top-level fns, but like two inline lambas in a let statement)

    let isDefault (Just condition) = Left condition
        isDefault Nothing = Right ()

Fill out all symbolic ops:

For example, this is translated poorly without "||" being translated:

let duplicateLHS = isJust op' || demand

Print out the <todo> in these fn defs

compatibleInitializer :: CType -> CType -> Bool
compatibleInitializer (IsStruct name1 _) (IsStruct name2 _) = name1 == name2
compatibleInitializer IsStruct{} _ = False
compatibleInitializer _ IsStruct{} = False
compatibleInitializer _ _ = True

These print <todo> and they shouldn't.

Distinguish patterns from types

It looks like TypeSub parses match patterns, and these get turned into Tys. I think we should have a separate Pattern type which holds all the patterns.

Not all `Just` `Nothing` translated properly.

See Idiomatic.hs:

tailExpr :: Rust.Expr -> Maybe (Maybe Rust.Expr)
-- If the last statement in this block is a return statement, extract
-- its expression (if any) to the final expression position.
tailExpr (Rust.Return e) = Just e
-- If the last statement is a block, that's a tail-block.
tailExpr (Rust.BlockExpr b) = Just (Just (Rust.BlockExpr (tailBlock b)))
-- If the last statement is an if-expression, its true and false blocks
-- are themselves tail-blocks.
-- TODO: treat match-expressions like if-expressions.
tailExpr (Rust.IfThenElse c t f) = Just (Just (Rust.IfThenElse c (tailBlock t) (tailBlock f)))
-- Otherwise, there's nothing to rewrite.
tailExpr _ = Nothing

Print type structs as well

Similar to #31, this should print the full struct:

castTo target (Result { resultType = IsArray mut _ el, result = source }) =
    castTo target Result
        { resultType = IsPtr mut el
        , resultMutable = Rust.Immutable
        , result = Rust.MethodCall source (Rust.VarName method) []
        }

Actually print out lambdas

Right now Lambda's are printed as its debug form, Lambda(...). It should print out an actual Rust lambda.

Enable Travis testing on PRs?

I'm expecting this would just run cargo test.

Until we reach 100% parsing support we might want to punt, since correctness fixes could regress in parsing. But once we reach that milestone it'd be worthwhile to enable for accidental regressions.

It would also be cool if PRs could automatically re-generate the out/ directory files; I'm unsure how that works, in practice. A githook makes sense I suppose.

Print all `where` statements

There are a few control groups (of do, let, and case) that don't handle where right.

  1. They parse their body as a list of statements and then where, so where first needs to be stripped from the body.
  2. The where clause should be asserted to be the last item in the body.
  3. Then the where clause should have its own parameter in the AST node. Then it should be checked that all of these are printed in main.rs

CLI

main needs to be modified to take command line arguments so this can be tested from the command line as well.

corrode-but-in-rust test/file.hs
corrode-but-in-rust --dir language/C
corrode-but-in-rust --run-script test/helloworld.hs

--dir should have support for adding (and renaming the paths of) files you want to inject, mainly Lexer.hs and Parser.hs.

let... in struct is not translated right

There are only a few instances of these in the codebase (see C.lhs) but it should be parsed properly at least.

    CIntConst (CInteger v repr flags) _ ->
        let allow_signed = not (testFlag FlagUnsigned flags)
            allow_unsigned = not allow_signed || repr /= DecRepr
            widths =
                [ (32 :: Int,
                    if any (`testFlag` flags) [FlagLongLong, FlagLong]
                    then WordWidth else BitWidth 32)
                , (64, BitWidth 64)
                ]
            allowed_types =
                [ IsInt s w
                | (bits, w) <- widths
                , (True, s) <- [(allow_signed, Signed), (allow_unsigned, Unsigned)]
                , v < 2 ^ (bits - if s == Signed then 1 else 0)
                ]
            repr' = case repr of
                DecRepr -> Rust.DecRepr
                OctalRepr -> Rust.OctalRepr
                HexRepr -> Rust.HexRepr
        in case allowed_types of
        [] -> badSource expr "integer (too big)"
        ty : _ -> return (literalNumber ty (Rust.LitInt v repr'))

Monadic code

The elephant in the room.

-- | lookup an object, function or enumerator
lookupObject :: (MonadCError m, MonadSymtab m) => Ident -> m (Maybe IdentDecl)
lookupObject ident = do
    old_decl <- liftM (lookupIdent ident) getDefTable
    mapMaybeM old_decl $ \obj ->
        case obj of
        Right objdef -> addRef ident objdef >> return objdef
        Left _tydef  -> astError (nodeInfo ident) (mismatchErr "lookupObject" "an object" "a typeDef")

Right now I'm going to assume we're going to special-case every monad in the codebase so we can get some somewhat reasonable output...?

Translate colons in `case` arms properly

e.g. in this code, the colons will wrongly be treated as function arguments.

baseTypeOf :: [CDeclSpec] -> EnvMonad s (Maybe CStorageSpec, EnvMonad s IntermediateType)
baseTypeOf specs = do
    -- TODO: process attributes and the `inline` keyword
    let (storage, _attributes, basequals, basespecs, _inlineNoReturn, _align) = partitionDeclSpecs specs
    mstorage <- case storage of
        [] -> return Nothing
        [spec] -> return (Just spec)
        _ : excess : _ -> badSource excess "extra storage class specifier"
    base <- typedef (mutable basequals) basespecs
    return (mstorage, base)
    where

Translate into pointful function

This translates incorrectly. We should look at the type signature, and if there are extra arguments in its type that are not in the function arguments, it should be rewritten as pointful.

Save this as test.hs:

module Test()
where

-- error.hs:49
isHardError :: (Error ex) => ex -> Bool
isHardError = ( > LevelWarn) . errorLevel

Running cargo run --manifest-path corollary/Cargo.toml -- test.hs currently outputs this:

// Original file: "test.hs"
// File auto-generated using Corollary.

#[macro_use] use corollary_support::*;

pub fn isHardError() -> bool {
    ((() > LevelWarn(errorLevel)))
}

Expected output:

pub fn isHardError(ex: Error) -> bool {
   errorLevel(ex) > LevelWarn
}

Successive let statements don't work

This code:

addExternIdent
    :: Ident
    -> EnvMonad s IntermediateType
    -> (String -> (Rust.Mutable, CType) -> Rust.ExternItem)
    -> EnvMonad s ()
addExternIdent ident deferred mkItem = do
    action <- runOnce $ do
        itype <- deferred
        rewrites <- lift $ asks itemRewrites
        path <- case Map.lookup (Symbol, identToString ident) rewrites of
            Just renamed -> return ("" : renamed)
            Nothing -> do
                let name = applyRenames ident
                let ty = (typeMutable itype, typeRep itype)
                lift $ tell mempty { outputExterns = Map.singleton name (mkItem name ty) }
                return [name]
        return (typeToResult itype (Rust.Path (Rust.PathSegments path)))
    addSymbolIdentAction ident action

Will create code in its let block like this:

{
    {
        let A = ...;
    }
    {
        let B = ...;
    }
    ...
}

It should be creating successive nested lets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.