Giter Site home page Giter Site logo

unpackdev / solgo Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 160.47 MB

Solidity parser in Go, designed to transform Solidity code into a structured format for enhanced analysis, particularly beneficial for developers using Go to analyze Solidity smart contracts.

Home Page: https://unpack.dev

License: Apache License 2.0

Shell 0.01% Go 60.32% Makefile 0.03% Solidity 39.64%
abi golang solidity solidity-compiler solidity-contracts syntatic-analysis abstract-syntax-tree control-flow-analysis program-analysis syntax-analysis smt ethereum bytecode-interpreter decompiler intermediate-representation vulnerability-detection static-analysis binance-smart-chain arbitrum optimism

solgo's Introduction

Build Status Security Status Coverage Status Go Report Card License PkgGoDev Discord

Ethereum and Solidity Toolkit in Go: Parser and Analyzer

SolGo - a robust tool crafted in Go, designed to dissect and analyze Solidity's source code.

The parser is generated from a Solidity grammar file using Antlr, producing a lexer, parser, and listener using AntlrGo. This allows for the syntactic analysis of Solidity code, transforming it into a parse tree that offers a detailed syntactic representation of the code, allowing for intricate navigation and manipulation.

This project is ideal for those diving into data analysis, construction of robust APIs, developing advanced analysis tools, enhancing smart contract security, and anyone keen on harnessing Go for their Solidity endeavors.

Solidity Version Support

Currently, Solidity versions equal or higher to 0.6.0 are supported.

Older versions may or may not work due to changes in syntax that is not currently supported by the grammar file. In the future, we have plans to support all versions of Solidity.

Disclaimer

Please be aware that this project is still under active development. While it is approaching a state suitable for production use, there may still be undiscovered issues or limitations. Over the next few months, extensive testing will be conducted to evaluate its performance and stability. Additional tests and documentation will also be added during this phase. Additionally, most of the interfaces will stay as is, however, there could be architectural changes that may break your build in the future. I'll try to change as little as possible and notify everyone about the change in release notes.

Once I am confident that the project is fully ready for production, this disclaimer will be removed. Until then, please use the software with caution and report any potential issues or feedback to help improve its quality.

Documentation

The SolGo basic documentation is hosted on GitHub, ensuring it's always up-to-date with the latest changes and features. You can access the full documentation here.

Getting Started

Detailed examples of how to install and use this package can be found in the Usage section.

Need help?

Want to use this library but have issues, questions or just want to join the wagon and follow the ride?

You can join our Discord server.

Solidity Language Grammar

Latest Solidity language grammar higher overview and detailed description can be found here.

ANTLR Grammar

We are using grammar files that are maintained by the Solidity team. Link to the grammar files can be found here.

ANTLR Go

We are using the ANTLR4 Go runtime library to generate the parser. Repository can be found here.

Crytic Slither

We are using Slither to detect vulnerabilities in smart contracts. Repository can be found here.

Makes no sense to rewrite all of that hard work just to be written in Go. Therefore, a bit of python will not hurt. In the future we may change direction.

Features

  • Protocol Buffers: Utilizing Protocol Buffers, SolGo offers a structured data format, paving the way for enhanced analysis and facilitating a unified interface for diverse tools. Currently, it supports Go and Javascript, with plans to incorporate Rust and Python in upcoming versions.
  • Abstract Syntax Tree (AST) Generation: Package ast is equipped with a dedicated builder that crafts an Abstract Syntax Tree (AST) tailored for Solidity code.
  • Intermediate Representation (IR) Generation: From the AST, SolGo is adept at generating an Intermediate Representation (IR). ir package serves as a language-neutral depiction of the contract, encapsulating pivotal components like functions, state variables, and events, thus broadening the scope for intricate analysis and contract manipulation.
  • Control Flow Graph (CFG) Generation: Building upon the IR, SolGo provides tools for constructing and visualizing Control Flow Graphs (CFGs) of Solidity contracts, aiding in the analysis of contract execution paths and potential bottlenecks.
  • Application Binary Interface (ABI) Generation: SolGo's in-built abi package can interpret contract definitions, enabling the generation of ABI for a collective group of contracts or individual ones.
  • Opcode Tools: The opcode package in SolGo demystifies bytecode by decompiling it into opcodes. Additionally, it provides tools for the creation and visualization of opcode execution trees, granting a holistic perspective of opcode sequences in smart contracts.
  • Library Integration: SolGo is programmed to autonomously source and assimilate Solidity contracts from renowned libraries, notably OpenZeppelin. This feature enables users to seamlessly import and utilize contracts from these libraries without the need for manual integration.
  • EIP & ERC Registry: SolGo introduces a package standards exclusively for Ethereum Improvement Proposals (EIPs) and Ethereum Request for Comments (ERCs). This package streamlines interactions with diverse contract standards by encompassing functions, events, and a registry system optimized for proficient management.
  • Solidity Compiler Detection & Compilation: SolGo intelligently identifies the Solidity version employed for contract compilation. This not only streamlines the process of determining the compiler version but also equips users with the capability to seamlessly compile contracts.
  • Security Audit Package: Prioritizing security, SolGo has incorporated an audit package. This specialized package leverages Slither's sophisticated algorithms to scrutinize and pinpoint potential vulnerabilities in Solidity smart contracts, ensuring robust protection against adversarial threats.
  • Contract Bytecode Validation: Enhanced validation package ensures the integrity and authenticity of contract bytecode. By comparing the bytecode of a deployed contract with the expected bytecode generated from its source code, SolGo can detect any discrepancies or potential tampering. This feature is crucial for verifying that a deployed contract's bytecode corresponds accurately to its source code, providing an added layer of security and trust for developers and users alike.

External Projects / Extensions / Plugins

List of the projects that use SolGo:

  • {Un}pack - Solidity (Ethereum) Smart Contracts Analysis Toolchain.
  • Solidity-Gas-Optimizoor - An high performance automated tool that optimizes gas usage in Solidity smart contracts, focusing on storage and function call efficiency.

If you wish to add your repository to the list, make sure to submit new PR :)

Contributing

Contributions to SolGo are always welcome! Please visit Contributing for more information on how to get started.

License

SolGo is licensed under the Apache 2.0. See LICENSE for the full license text.

solgo's People

Contributors

0x19 avatar dependabot[bot] avatar omahs avatar

Stargazers

 avatar

Watchers

 avatar  avatar

solgo's Issues

Ethereum Improvement Proposals (EIP) Package

If we want to extract types of the contracts, we need to know this information to be able precisely understand if contract is type of EIP/ERC standard and at which confidence level.

AST and IR Node Visitor

We basically need a functionality to visit node efficiently from AST and IR. This is possible now by building custom node visitor by yourself and utilising AST Tree. This should be implemented directly into the solgo library.

Wiki Documentation & Examples

At current wiki there's ongoing work that needs to be completed. Right now examples and documentations are added for installation and usage, however, more in-depth, package specific documentation is lacking and should be resolved.

Topological Sort Ommited Sources

Problem currently with visited topological sort approach is if we have two of the same name, with different content, only one file will be extracted and sorted, resulting in corrupted sources. For example IERC20.sol interface and IERC20.sol token contract.

EIP package to standards

Current EIP package is designed to support Ethereum Improvement Proposals. However, I'll be extending it and renaming it to standards. One of the extensions are to figure out if pancakeswap, uniswap, sushiswap and so on are detected.

This can be later on taken to maybe more quickly understand if current contract supports liquidity or not.

Statistics

Basically the most important statistics about processed contract. From how many functions to how many calls, external calls, different type of references, basically all of the data that can help out gain statistical information about source code.

Explained just dummy idea. Real idea will follow in the future.

Sub contracts license bug

Licenses can be different in parent contracts and are not taken under consideration. Only entry contract. This is in AST

Bump parser grammar file

There are changes for event definition added 3 days ago that needs to be added, by it grammars needs to be regenerated.

Control Flow Graph (CFG) Support

Just notes for now:

  • Ability to turn AST into CFG.
  • Ability to render dot file for graph (graphviz).
  • Ability to render image (png, jpeg, svg) from the graph.
  • Graph node itself...

Function treated as state variable

Old versions of solidity contracts use for fallback and receive (sometimes even having revert) following function syntax

function () payable { ... }

Current AST parsing will fail if function like above is reached. It will fail under state variable discovery (which is weird).

So far, only investigating potential solutions. Most likely this will be related to grammar file not supporting functions like that. If that's the case, we'll branch off grammar from solidity externals and start maintaining our own (or just apply patches once we sync external repo).

If it's grammar issue, it's in fact, not issue but no longer supported functionality and therefore we cannot file any bugs.
If it's due to antlr-go, then we need to file bug to antlr or fork it off.

It happens very rarely, only with old contracts.

AST Reference Discovery Ideas

Forward statement resolution in AST seems to be a bit more complicated than it should. However, I am not yet sure how to handle it in the future.

Problematic with forward statement means that node can be anywhere. What that means is that reference for any particular statement can be:

  • In any object.
  • Defined in the same object but after it is executed.
  • Globally defined outside of any object at any point prior or after the statement execution.
  • Literally some types, for example events, structs, vars etc... Can be defined literally anywhere in the code and called out from literally anywhere.

Knowing the problem, dealing with solution becomes troublesome when types descriptions are in question. We basically need to understand the type itself and regularly I need to come back to references during testing as some type is not discovered efficiently. For example, node is not found at all, resulting in skewed AST results and panics due to type description object is not found.

For now I am patching the code to resolve it. However, I'd like in the future to find out proper way how to traverse through the tree and figure out all of the types without doing some ad-hoc kumbaya patches to the resolver.

Moreover, even if node is found, are we certain that that particular node corresponds to the proper reference or just globally defined reference? This calls for a node reverse lookup.

Anyway, just ideas posting here so this story can be upgraded in the future and perhaps better solution can be done.

This is just so we are on a same page, a severe amount of work and will broke the entire code resolution once touched.

Storage Package

Ability to load contract storage information at particular block from upstream.

Abstract Syntax Tree (AST) Forward Statement Discovery

Right now, package is capable of to a point to discover statements and references for definitions that occur after. This is a large blocker in order to be able process any type of code that is provided without requirement for source code to be sorted out.

When processing directly from metadata, code can be and usually won't be sorted in order resulting in panicing on all fronts due to type descriptions not being discovered.

This is the highest priority and biggest blocker for ast package to be useful to anyone.

Examples

Having them under readme, we can do better. So adding just general issue here...

More about it will be defined later.

Standards ABI interfaces

Right now standards have ability to calculate confidence and in general, standards management, however, easy usage of interfaces from this package does not exist.

For example, you wish to load token ERC20 and interact with the interface, well that's not possible. Going to add that.

Abstract Syntax Tree Cleanups and Resolver Rewrite

I have no idea what the heck it was in my head that current approach could be useful (which still will be for edge cases) but in order to resolve forwarded statements, with current approach logic of the code needs to become so complex that I just dislike idea of moving forward with it. Therefore, will introduce following changes into the code base:

  • Sources sorted based on graph and topological sorting algorithms in order to sort imports by the lowest requirement to the highest. This will help to a point.
  • to be defined, not sure what yet...
  • Get at least a bit more code coverage and documentation done in the process... It's ridiculous that AST which is a base package is becoming less and less documented and tested.

Currently we get panics because type descriptions cannot be found, including forward contract creations that cannot be discovered in time which results in us needing to write t_unknown_{nodeId} and later on processing them in the resolver. You see, I bored myself already by just typing this thing down. That would be a shit so complex that anyone trying, including me, look into the codebase would just evaporate from the project. Therefore, lets build something performant and nice to look at.

For example, this one right here -> https://bscscan.com/address/0x7a4af156379f512de147ed3b96393047226d923f#code, jumps from contract/statement to a contract/statement like it's a candy season and thus, the reason where I've discovered numerous of panics and well... Let's fix it.

Fun to talk with myself.

Thread safe solc switch tool

The thing is solc-select is a great tool for its purpose. Gives a very nice way of switching global solc version with execution of maximum two commands. It is however, not designed to be thread safe and be used by the APIs.

Current solgo version, as a prototype uses solc-select. We should move away from it from our perspective as soon as possible as it's becoming a blocker on multiple fronts:

  • Python so it's slower then it should be.
  • Have to use cmd package to interact with it which makes no sense for us, except in extreme conditions when there's no viable alternative.
  • It changes global solidity version on machine. Consider what will happen when even 2 concurrent requests occurs...
  • Requires time to download/build solc which we can fix by having cmd that downloads all of the versions for specified operating system at once if executed.
  • Sometimes it fails, even in CI/CD pipeline to switch as multiple jobs are running.

Probably few more good things we can sort out by doing something more concrete here.

I was thinking to have it as solgo/solc but that's probably not the best solution so instead of that, I'll build and maintain separated repository for this endeavor.

Clients package - Ethereum RPC client and smart load balancer

This whole package has its functionalities, however, it's lacking smarter client management package so we can get it to a next level. Therefore, this issue is reserved to introducing clients package that can do following:

  • Ability to load multiple clients for multiple eth client supported networks
  • Ability to load balance multiple clients
  • Ability to failover between full and archive nodes
  • Ability to query clients based on different criterias

Multi contract support

Right now, if there are multiple contracts involved or contracts that are dependencies, that's really not working. We should change that.

Once AST package is completed will rewrite ABI and sort this issue out.

Onchain Interceptor Package

To be defined in the future but basic idea is based on specific criteria, intercept onchain data and do the transformation into a specific concept. For example, Token transformator to fetch the data of any token and get back basic information, including potentially some heavy-weight information such as liquidity.

This is rough idea, probably will be changed by the end of the development.

Accounts package

Ability to manage existing or new accounts...

#140 has the latest development ideas... It's not yet ready...

Introduce complex types ABI parsing for methods and variables

The last part of the missing functionality in ABIs are passing more complex types then mappings.

Following issue should resolve:

  • Parsing structs including nested structs.
  • Parsing enums.

Besides this, everything else "should" operate as expected. Take under note that we need to test this against large amount of smart contracts in order to ensure abi parser is completed.

ABI (Application Binary Interface) 2.0

Instead of using parser directly going to change it entirely to and use IR package instead.

This will give us ability to fine tune responses as we wish, out of the box multiple contracts support.

More info will be provided later.

Performance Stress Testing

Currently there's only one benchmark available in the system which is way too low. Enough said right?

Want to test the most:

  • How will solc-select behave concurrently. I think I already know the answer and will need to understand what to do here.
  • How many errors we will discover in parsing AST, ABI, IR? -> Address them completely.
  • See how system is behaving with contracts solidity version lower then 0.8. Address issues that come up.
  • Introduce prometheus and grafana metrics as well so we can build nice dashboard.
  • Introduce pprof and see memory/cpu allocation and what can be improved if there's a large need.

More information in upcoming weeks.

Operation Codes (OpCode) Package

Ability to parse contract, transaction, log, any type of bytecode and get back EVM opcode information including a tree for future inspections.

Simulator package

Ability to simulate transactions on local network as well as on the main net.

Syntax Errors Improvements

It works, but really does not work how it should work for production. Just adding placeholder issue here for future work.

Exchanges package

Idea of the following package is to provide unification mechanism on top of which multiple exchanges such as pancakeswap, uniswap, sushiswap, etc... can be loaded, registered and used throughout the unpack tools.

Challenges

  • Multiple client types, if we only had one, would be great but we can have multiple clients, spawning at any point in time and on top of that there are simulation clients that needs to be taken under consideration.
  • Resource management as if we initiate too much, we're not at the right track.

AST Node Source Code

Right now there's no way of getting actual source code of any specific node. This is going to be cumbersome but it has to be done. I am thinking about providing context itself for every node and to provide ToString method on each of 100+ nodes in the ast tree...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.