Giter Site home page Giter Site logo

personal-semgrep-server's Introduction

Personal Semgrep Server

I created this personal Semgrep server to learn Rust. It is suitable for local deployment for folks who cannot use the Semgrep SaaS App because of custom Semgrep rules and proprietary code.

  1. Unlimited local policies: A policy is a collection of rules.
  2. Serve rules and policies to the Semgrep CLI app over HTTP.

It was inspired by wahyuhadi/semgrep-server-rules.

I will try to keep the main branch usable. The dev branch is used for development.

Quickstart

$ git clone https://github.com/parsiya/personal-semgrep-server
$ git submodule update --init --recursive
$ cargo build
$ ./target/debug/personal-semgrep-server -r tests/rules/ -p tests/policies/
# run all rules against your code
$ semgrep --config http://localhost:9090/c/p/all path/to/code

Note: Passing a policy path with "-p" is optional. The only mandatory option is -r that points to the location of the rules. In this case, it will only serve individual rules or the all policy/rule.

How to Use

Run the server like this:

./personal-semgrep-server -r path/to/rules/ -p path/to/policies/

Then navigate to http://localhost:9090. The landing page has a link to every rule and policy indexed by the server. Clicking on each link will show you the complete YAML file. This server uses the same path structure as the Semgrep App.

  • Policy URL: /c/p/{policyid}
  • Rule URL: /c/r/{ruleid}

Pass these URLs directly to the Semgrep CLI app.

index

Policies

Policies are collections of rules. A local policy is a YAML file like this:

name: policy-name # this should be unique
rules:
- ruleID-1
- ruleID-2
- arrays-out-of-bounds-access
- potentially-uninitialized-pointer
- snprintf-insecure-use

Create as many as you want. After passing the path to the server, it will search for all .yaml and .yml files in that path recursively. This allows you to store your policies in subdirectories for better organization:

tests
└── policies
     ├── cpp
     |   ├── cpp-policy1.yaml
     |   └── cpp-policy2.yaml
     └── rust
         ├── rust-policy1.yaml
         └── rust-policy2.yaml

Note: Policy names must be unique. If you have duplicate policy names, one will be overwritten by another.

The 'all' Policy and Rule

The semgrep-rs library creates a built-in policy and rule named all, even if you do not pass a policy path. The all rule/policy contains every rule indexed by the server. It's useful when you want to run all rules against a code base. If you have a custom rule or policy named all, it will be overwritten.

The Semgrep CLI app only runs specific rules against a file based on its extension so don't shy away from throwing the kitchen sink at your code with all. See Language extensions and tags in the Semgrep documentation.

Complete Rule IDs

Similar to policy names, rule IDs must also be unique. The Semgrep SaaS App uses complete rule IDs that are based on the path to avoid collisions. To create a complete rule ID, replace the path separator (/ or \) with ., then append the rule's internal ID (the value of the id key in the rule file).

For example, the complete rule ID for a rule with id: double-free in the rules/c/lang/security/double-free.c file is: rules.c.lang.security.double-free.double-free.

My underlying library semgrep-rs supports creating complete rule IDs, but I have not added it to the current iteration of server because:

  1. You have to include the complete rule ID in the policy file.
  2. The rule ID will be dependent on the path passed to the server.

I can change this if we can come up with a solution to get consistent rule IDs and a way to write policies automatically.

Security

lol wut?! Only run it on localhost and don't expose this to the internet.

Why not Use the Semgrep App?

The Semgrep SaaS App is awesome and you should use (and buy it) if you can. But my custom rules and code had to stay local so I had to create a directory structure for rules to simulate policies.

Another issue was lack of local policies. For example, to run all C++ rules against a target, your only realistic option is to store all rules in a directory named cpp and pass it to Semgrep CLI with --config path/to/cpp/.

If you want to run a specific set of C++ rules, you can either copy/paste the rule files somewhere else or use the --exclude-rule command line switch a bajillion times. Now if you modify a rule (good idea to keep them in a git repository), you have to manually update all copies

Another issue is the directory structure of the Semgrep Rules on GitHub. It doesn't work for me. Audit Shouldn't be Under Security in the Semgrep Rules Repository

This server and local policies solve all of these problems for me. I can keep a single copy of my custom rules in a git repository with the Semgrep Rules repository as a git submodule and have custom policies to mix and match rules. The policies are also in the same repository.

Features

I like to keep this server as simple as possible. I don't want to create a Semgrep App competitor. The only thing I would add is a simple UI similar to the Semgrep Playground to allow people to run it locally for proprietary rules/code and maybe some simple commands for filtering rules and creating policies (e.g., create a policy from specific keys in the rules' metadata).

License

Rust likes dual-licensing so here we go.

Licensed under either of Apache License, Version 2.0 or MIT license.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this repository by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

personal-semgrep-server's People

Contributors

parsiya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

personal-semgrep-server's Issues

Landing page

Host a simple landing page in root.

The page can have a list of all rules. Technically, we could have a list of rules/rulesets as hyperlinks, clicking a rule or rulepack will open the contents. This is simple because we want to serve them at specific URLs anyways.

It should also list how to access each rule and ruleset.

Have multiple commands

For example a command to generate a ruleset and write it to disk.

Another command to split a ruleset and write all the rules to disk in a target directory.

A command to start the server.

Maybe a command to server files from disk like Hugo server?

I can use Hugo server as an example.

Rule ID should include path

Just like Semgrep server we should use the complete path because it will reduce name collision.

The path should start from the root of directory that is passed to the server. So if we have rule with id buffer-overflow in rules/cpp/security/some-file-name.yaml the complete rule ID in the index should be rules.cpp.security.ome-file-name.buffer-overflow.

The filename and ruleID usually match but we cannot guarantee that.

Make the policy path optional

A server without policies is kinda useless but it should be possible to run the server just to serve individual rules.

Web UI

This is more than I can do by myself but I can create a simple UI where people can pass files and get the results?

browse rules and stuff?

Point to rule path on disk or online?

Upload a zip file with code to scan?

Point to a git repository to scan?

Point to a rule directory and get a ruleset?

Or better yet, a personal playground where users can paste a rule and a file and then run that rule on the file and get highlights and the message.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.