plsyssec / cargo-scan Goto Github PK

Static analysis-assisted security for the Rust supply chain

License: MIT License

Python 1.00% Makefile 0.53% Rust 94.85% TypeScript 3.63%

cargo-scan's Introduction

Cargo Scan

cargo scan is a supply chain auditing tool for Cargo (Rust) dependencies using static analysis. It can also be used in tandem with cargo vet.

⚠️ cargo scan is under active development. All interfaces are currently unstable.

Installation

Make sure you have Rust
Run rustup update -- the build has been known to crash on older versions of Rust.
Run make install.

This installs cargo-download and builds the Rust source. Installation has been tested on Mac OS (Monterey) and Linux (Arch Linux).

Quick-start

To use Cargo Scan you first need a Rust crate somewhere on your system. To scan a crate, you run the binary (from this repository), and provide it a path to the crate.

Running an audit

The following runs an audit of the crate:

cargo run <path to crate> mycrate.audit -i

The tool will show you dangerous effects found in the crate, one at a time. To go to the next effect, type l.

For example, you can download a crate and run

cargo download -x fs-extra
cargo run fs_extra-1.3.0/ fs-extra.audit -i

Or you can run on a provided test crate in data/test-packages:

cargo run data/test-packages/permissions-ex permissions-ex.audit -i

If the command is run a second time, it continues the existing audit. To instead overwrite the existing audit, use -f. To review the audit, use -r.

Scan with CSV output

If you don't want to perform an audit, you can also simply get the list of effects found in a basic CSV format for further inspection and analysis. To run the tool this way, use the scan binary:

cargo run --bin scan <path to crate>

This should print a list of effects, one per line. The last four items on each line give the directory, file, line, and column where the effect occurs. The beginning of the line gives the crate name, the function body and callee that contains the effect, and the effect type or pattern that it matches.

Detailed instructions

Please see the file AUDITING.md for further instructions about auditing.

Other usage

Running the unit tests

Run cargo test to run Rust unit tests
Run make test to re-run the tool on all our test packages, whose results are in data/results and placed under version control to check for any regressions.

Running an experiment

You can also run ./scripts/scan.py -h to see options for running an experiment; this is useful for running a scan on a large list of crates, e.g. the top 100 crates on crates.io or your own provided list. Alternatively, see Makefile for some pre-defined experiments to run, such as make top10.

cargo-scan's People

Contributors

Stargazers

Watchers

Forkers

ambadidi muhammad-hassnain

cargo-scan's Issues

Rust CLI for `scan.py`

Since a bunch of functionality is duplicated in scan.py and rst-src/src/effect.rs, it may be worth moving the CLI to Rust. It would be nice at least to have the basic CLI working: modify the file
rust-src/src/main.rs using clap to generate a basic CLI matching the main options of scan.py (see the bottom of scan.py: all the argparse code.)
The MIRAI options can be ignored for now.

UX improvements

From discussion today,

Make audit the default binary, ~~rename scan to scan_csv~~
For audit: add a default argument for <AUDIT_FILE_PATH>
Provide a bit more info to stdout: continuing an existing audit, saving audit to ,etc. (tool already does a good job of some of this, telling you it's loading an audit file, creating a new audit file)
Provide a way to run audit and create a new scan, listing all effects (without auditing -- matching the behavior of scan)

Currently the -r flag is supposed to do the third thing, but I can't quite get it to do what I want. Maybe the solution is just, if the audit file doesn't exist, create it with the default scan?

caleb@caleb-mac cargo-scan % cargo run --bin audit data/test-packages/fns-closures audit.audit2 -r
    Finished dev [unoptimized + debuginfo] target(s) in 0.40s
     Running `target/debug/audit data/test-packages/fns-closures audit.audit2 -r`
Error: Audit file to review doesn't exist
caleb@caleb-mac cargo-scan % 
caleb@caleb-mac cargo-scan % 
caleb@caleb-mac cargo-scan % cargo run --bin audit data/test-packages/fns-closures audit.audit3 -r
    Finished dev [unoptimized + debuginfo] target(s) in 0.53s
     Running `target/debug/audit data/test-packages/fns-closures audit.audit3 -r`
Error: Audit file to review doesn't exist

track information flow for arguments

Currently we are doing this only on the MIRAI backend, not the syn backend:
track possible value(s) that can flow into effectful sinks

Add instructions for how to use the `chain` binary

Having some trouble figuring out how to use the chain binary. cargo run --bin chain -- create works OK:

% cargo run --bin chain -- create data/packages/num_cpus policy.manifest
    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
     Running `target/debug/chain create data/packages/num_cpus policy.manifest`
Creating audit chain
Loading audit package lockfile
Creating dependency graph
Making default policy for libc v0.2.65
Making default policy for hermit-abi v0.1.3
Making default policy for num_cpus v1.13.1
Finished creating policy chain

But then for cargo run --bin chain -- audit I am not sure how to pass the right arguments:

caleb@caleb-mac cargo-scan % cargo run --bin chain -- audit policy.manifest num_cpus-1.13.1
  Finished dev [unoptimized + debuginfo] target(s) in 0.16s
   Running `target/debug/chain audit policy.manifest num_cpus-1.13.1`
Auditing crate: num_cpus-1.13.1
Error running command: Path is not a crate; not a directory: ".audit_crates/num_cpus-1.13.1"
caleb@caleb-mac cargo-scan % cargo run --bin chain -- audit policy.manifest num_cpus    
  Finished dev [unoptimized + debuginfo] target(s) in 0.16s
   Running `target/debug/chain audit policy.manifest num_cpus`
Auditing crate: num_cpus
Error running command: We require exactly one policy matching the crate name

Name resolution panics on hashbrown

I'm getting another interesting error on the hashbrown crate:

% RUST_LOG=info ./scripts/scan.py -c hashbrown
INFO:root:=== Scanning hashbrown in data/packages ===
[2023-04-25T00:31:56Z INFO  cargo_scan::scanner] Scanning crate: "data/packages/hashbrown"
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/name_resolution.rs:391:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Looks possibly different than the errors from before

effect model

We are working on updating the effect model so that unsafe code effects are also considered effects. (But “dangerous” code blocks that aren’t themselves effects, such as unsafe trait decls, won’t be effects.) Basically we need to update the definition of
struct Effect
so that unsafe code elements are part of it rather than accumulated by Scanner in a separate vector. I think it would be better for it to remain a struct, not an enum, so that it's clear what commonalities all effects share.

syn backend: report all instances of unsafe blocks and FFI calls

Crate versioning + crashes

(Making an issue for myself to track / revisit later)

Currently, scan.py doesn't worry about the version of the crate being fetched from crates.io. This should be updated so that it fetches a version using an input CLI arg and passes it to the appropriate corresponding arg of cargo download.

Related problem: deal with missing crates (yanked from crates.io) better by catching and reporting cargo-download errors instead of crashing.

crash on `serde_json`

This one is probably a bug in Scanner or in syn. Leaving an issue here to revisit at some point later

% RUST_LOG=info ./scripts/scan.py -c serde_json
INFO:root:=== Scanning serde_json in data/packages ===
[2023-04-25T00:37:49Z INFO  cargo_scan::scanner] Scanning crate: "data/packages/serde_json"
Error: expected one of: `fn`, `extern`, `use`, `static`, `const`, `unsafe`, `mod`, `type`, `struct`, `enum`, `union`, `trait`, `auto`, `impl`, `default`, `macro`, identifier, `self`, `super`, `crate`, `::`

LoC tracking + conditionally compiled code

Add code in scanner to gather stats about the total amount of code skipped (in LoC) vs. scanned.

In addition to macros, skip platform-specific and conditionally compiled code
Add a count of skipped # of lines and processed # of lines to scanner and scan results
Add LoC for macros and conditionally compiled code throughout scanner
Add LoC for effects
Add LoC for transitive effects: using --scan binary
In auditing toolchain: report skipped blocks of code in a separate set of code blocks for the user to audit
collect summary in scan.py

name resolution test cases

New rust-analyzer name resolution module just landed! 🎉 Thanks to @lzoghbi for the hard work!!

It works in most cases. This thread for a few rarer bugs/issues with it to be resolved.

The plan is to soon 🔥 🗑️ deprecate and delete 🗑️ 🔥 the old resolution code, currently in hacky_resolution.rs -- in the interim, I will use hacky_resolution as a fallback for the rare cases when the new resolver fails.

Generate default policies

Add a file rust-src/src/bin/generate_default.rs that should be a CLI which does the following: given input (1) a path to a Rust crate, (2) a path to a list of of_interest effect patterns (data/of_interest.txt) it should output <crate name>.toml that represents a default policy for that crate.

Look at find_calls.rs and check_policy.rs for existing binaries that do related functions; a lot of the same logic should be reusable. The following should be useful:

scanner.rs scans a Rust file (only a single file, not the whole crate) and returns all call sites in the form of Effect objects -- see effect.rs.
policy.rs contains the Policy object. It includes functions to add an allow or require statement (probably needs more methods to be added), and it can be serialized to a TOML file via toml::to_string(&policy). This can then be saved to a file.
util.rs: feel free to add useful utility functions like saving to a file.

Tests

The following check should pass for existing crates: if we generate a default policy for the crate, then check against the policy (via the logic in check_policy.rs), it should pass. It would be great to have some tests written to this effect.

Technical Notes

There are two default policies that make sense for a crate: first, the policy that allows everything, and second the policy that requires everything. It would be awesome to have a flag that can be turned on/off to toggle which default policy is generated.

add argument patterns and trust statements to policies

add arg patterns and trust statements to the policy language in policy.rs (see Policy struct) and implement all associated methods for them.

Roadmap for `cfg` attributes

Current infrastructure for skipping cfg attributes is hacky, see this code in scanner.rs:

    // Quickfix to decide when to skip a CFG attribute
    // TODO: we need to use rust-analyzer or similar to more robustly parse attributes
    pub fn skip_cfg(&self, args: &str) -> bool {
        args.starts_with("target_os = \"linux\"") || args.starts_with("not (feature =")
    }

    // Return true if the attributes imply the code should be skipped
    pub fn skip_attr(&self, attr: &'a syn::Attribute) -> bool {
        let path = attr.path();
        // if path.is_ident("cfg_args") || path.is_ident("cfg") {
        if path.is_ident("cfg") {
            let syn::Meta::List(l) = &attr.meta else { return false };
            let args = &l.tokens;
            if self.skip_cfg(args.to_string().as_str()) {
                info!("Skipping cfg attribute: {}", args);
                return true;
            } else {
                debug!("Scanning cfg attribute: {}", args);
                return false;
            }
        }
        false
    }

    // Return true if the attributes imply the code should be skipped
    pub fn skip_attrs(&self, attrs: &'a [syn::Attribute]) -> bool {
        attrs.iter().any(|x| self.skip_attr(x))
    }

Probably eventually we want to add code to the rust-analyzer wrapper to get information about attributes that are currently enabled for the current build. Relates to the story for build-time effects #13 as well.

Dependency fencing

another idea to support: fencing dependencies -- crate U is restricted and can only be used by certain trusted dependencies X, Y, Z

in the workspace Cargo.toml, say "this dep can only be used by crate X, Y, Z"
(you can do this as a postprocessing step. maybe we should just make cargo fence for this down the road)

e.g.:

we're pulling in yubikey which pulls in pcsc-sys; this is only for thing son the cli side but we have a workspace where our server-side stuff lives too (but should never ever use this)

Integrate call graph

Looks like the petgraph::graph::DiGraph call graph constructed in ScanResults is not currently used, and I think there's some overlap with the various collected info in pub_fns, fn_locs, unsafe_effect_blocks_set, and get_callers. @DavidThien doesn't seem that high priority, but might be something for you to look into at some point if you want to do some cleanup.

Type resolution crash on `proc_macro2`

thread 'main' panicked at 'Definition should have a type', src/name_resolution.rs:456:6

backtrace:

./scripts/scan.py -c proc-macro2
INFO:root:=== Scanning proc-macro2 in data/packages ===
thread 'main' panicked at 'Definition should have a type', src/name_resolution.rs:456:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
INFO:root:=== Results ===
INFO:root:===== All results =====

INFO:root:===== Crate Summary =====
Number of dangerous imports by crate:
===== Crate Totals =====
0 crates with 1 or more dangerous imports
1 crates with 0 dangerous imports

caleb@caleb-mac cargo-scan % RUST_BACKTRACE=1 ./scripts/scan.py -c proc-macro2
INFO:root:=== Scanning proc-macro2 in data/packages ===
thread 'main' panicked at 'Definition should have a type', src/name_resolution.rs:456:6
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:64:14
   2: core::panicking::panic_display
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:147:5
   3: core::panicking::panic_str
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:131:5
   4: core::option::expect_failed
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/option.rs:1924:5
   5: core::option::Option<T>::expect
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/option.rs:786:21
   6: cargo_scan::name_resolution::get_canonical_type
             at ./src/name_resolution.rs:436:14
   7: cargo_scan::name_resolution::Resolver::resolve_type
             at ./src/name_resolution.rs:188:9
   8: cargo_scan::resolve::FileResolver::resolve_type_core
             at ./src/resolve.rs:88:9
   9: cargo_scan::resolve::FileResolver::resolve_type_or_else
             at ./src/resolve.rs:111:15
  10: <cargo_scan::resolve::FileResolver as cargo_scan::resolve::Resolve>::resolve_field_type
             at ./src/resolve.rs:188:9
  11: cargo_scan::scanner::Scanner::scan_deref::{{closure}}
             at ./src/scanner.rs:560:26
  12: core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/iter/traits/iterator.rs:834:29
  13: core::iter::traits::iterator::Iterator::fold
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/iter/traits/iterator.rs:2438:21
  14: core::iter::traits::iterator::Iterator::for_each
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/iter/traits/iterator.rs:837:9
  15: cargo_scan::scanner::Scanner::scan_deref
             at ./src/scanner.rs:558:9
  16: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:528:21
  17: cargo_scan::scanner::Scanner::scan_expr_call_args
             at ./src/scanner.rs:596:13
  18: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:474:17
  19: cargo_scan::scanner::Scanner::scan_fn_local
             at ./src/scanner.rs:360:13
  20: cargo_scan::scanner::Scanner::scan_fn_statement
             at ./src/scanner.rs:344:36
  21: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:397:21
  22: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:420:17
  23: cargo_scan::scanner::Scanner::scan_expr_call_args
             at ./src/scanner.rs:596:13
  24: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:474:17
  25: cargo_scan::scanner::Scanner::scan_fn_statement
             at ./src/scanner.rs:345:42
  26: cargo_scan::scanner::Scanner::scan_fn
             at ./src/scanner.rs:327:13
  27: cargo_scan::scanner::Scanner::scan_method
             at ./src/scanner.rs:290:9
  28: cargo_scan::scanner::Scanner::scan_impl
             at ./src/scanner.rs:257:21
  29: cargo_scan::scanner::Scanner::scan_item
             at ./src/scanner.rs:174:37
  30: cargo_scan::scanner::Scanner::scan_file
             at ./src/scanner.rs:164:13
  31: cargo_scan::scanner::scan_file
             at ./src/scanner.rs:727:5
  32: cargo_scan::scanner::try_scan_file
             at ./src/scanner.rs:740:5
  33: cargo_scan::scanner::scan_crate_with_sinks
             at ./src/scanner.rs:776:13
  34: cargo_scan::scanner::scan_crate
             at ./src/scanner.rs:809:5
  35: scan::main
             at ./src/bin/scan.rs:29:19
  36: core::ops::function::FnOnce::call_once
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
INFO:root:=== Results ===
INFO:root:===== All results =====

INFO:root:===== Crate Summary =====
Number of dangerous imports by crate:
===== Crate Totals =====
0 crates with 1 or more dangerous imports
1 crates with 0 dangerous imports

Build-time effects

run on build.rs scripts and macro invocations and produce reasonable output

installing tool fails

I'm finding the readme a bit unhelpful in how I would use the tool (eg I don't care about data/packages/), so figured I'd try to install the tool and run the binary. Installing fails:

cargo install --path . --locked

error[E0658]: use of unstable library feature 'once_cell'
   --> /home/d/.cargo/registry/src/github.com-1ecc6299db9ec823/cargo-0.72.2/src/cargo/util/mod.rs:215:9
    |
215 |     use std::sync::OnceLock;
    |         ^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #74465 <https://github.com/rust-lang/rust/issues/74465> for more information

error[E0658]: use of unstable library feature 'once_cell'
   --> /home/d/.cargo/registry/src/github.com-1ecc6299db9ec823/cargo-0.72.2/src/cargo/util/mod.rs:216:19
    |
216 |     static UMASK: OnceLock<libc::mode_t> = OnceLock::new();
    |                   ^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #74465 <https://github.com/rust-lang/rust/issues/74465> for more information

error[E0658]: use of unstable library feature 'once_cell'
   --> /home/d/.cargo/registry/src/github.com-1ecc6299db9ec823/cargo-0.72.2/src/cargo/util/mod.rs:216:44
    |
216 |     static UMASK: OnceLock<libc::mode_t> = OnceLock::new();
    |                                            ^^^^^^^^
    |
    = note: see issue #74465 <https://github.com/rust-lang/rust/issues/74465> for more information

error[E0658]: use of unstable library feature 'once_cell'
   --> /home/d/.cargo/registry/src/github.com-1ecc6299db9ec823/cargo-0.72.2/src/cargo/util/mod.rs:216:44
    |
216 |     static UMASK: OnceLock<libc::mode_t> = OnceLock::new();
    |                                            ^^^^^^^^^^^^^
    |
    = note: see issue #74465 <https://github.com/rust-lang/rust/issues/74465> for more information

error[E0658]: use of unstable library feature 'once_cell'
   --> /home/d/.cargo/registry/src/github.com-1ecc6299db9ec823/cargo-0.72.2/src/cargo/util/mod.rs:221:12
    |
221 |     *UMASK.get_or_init(|| unsafe {
    |            ^^^^^^^^^^^
    |
    = note: see issue #74465 <https://github.com/rust-lang/rust/issues/74465> for more information

For more information about this error, try `rustc --explain E0658`.
error: could not compile `cargo` due to 5 previous errors
error: failed to compile `cargo-scan v0.1.0 (/home/d/hack/cargo-scan)`, intermediate artifacts can be found at `/home/d/hack/cargo-scan/target`

improve FFI call effect

Currently, we add the FFICall effect when an FFI function is called, not when it is declared. This seems wrong because when an FFI function is public, e.g.

pub extern "C" fn do_some_shady_stuff(...) { ... }

The function is not marked as having any effects, but calling it is clearly dangerous.

This shows up on the system-configuration-sys crate (v 0.5.0) as found by @deian leading to, confusingly, no effects in this crate, and I also copied the relevant source code file to the test-crates/ffi-ex example crate so we can track its results on make test

v0 tasks:

add an FFIDecl effect when an FFI function is declared

v1 tasks:

make sure FFI functions are showing up in the call graph / audit tree (running an audit)
remove the FFICall effect when FFI functions are called, as this would now be redundant
check the results on make test and make sure the changes look reasonable

Performance improvements

Performance isn't prohibitive for running on a single crate, so this is fine to revisit later, probably roadmap-v1.

Basically two things that would help would be

Loading rust-analyzer once for the entire workspace of several crates, rather than one instance per-crate
Caching results so that we don't have to recompute them for known versions of dependent crates -- This may come for free if we support pulling in a sanctioned existing policy from a database somewhere rather than always generating your own.

assertion failures in WalkDir: `src/util.rs:78:9`

[2023-04-25T19:57:02Z INFO  cargo_scan::scanner] Scanning crate: "data/packages/fnv"
thread 'main' panicked at 'assertion failed: p.is_dir()', src/util.rs:78:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

[2023-04-25T19:58:06Z INFO  cargo_scan::scanner] Scanning crate: "data/packages/matches"
thread 'main' panicked at 'assertion failed: p.is_dir()', src/util.rs:78:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

HackyResolver warnings: skipping `impl dyn Trait` block

occuring on lots of crates, e.g.

[2023-04-25T20:04:53Z WARN  cargo_scan::hacky_resolver] HackyResolver: skipping 'impl dyn Trait' block (TypeTraitObject { dyn_token: Some(Dyn), bounds: [TypeParamBound::Trait(TraitBound { paren_token: None, modifier: TraitBoundModifier::None, lifetimes: None, path: Path { leading_colon: None, segments: [PathSegment { ident: Ident { sym: Subscriber, span: bytes(200058..200068) }, arguments: PathArguments::None }] } })] }) (504:5)

i don't remember why we skip these

hackyresolver eventually deprecated so we can just ignore this for now

Tracking issue: crashes in `chain` binary

Binary chain crashes with stack overflow on tokio:

caleb@caleb-mac cargo-scan % cargo run --bin chain -- create data/packages/tokio policy.manifest
    Finished dev [unoptimized + debuginfo] target(s) in 0.38s
     Running `target/debug/chain create data/packages/tokio policy.manifest`
Creating audit chain
Loading audit package lockfile
Creating dependency graph
Making default policy for unicode-ident v1.0.4
Making default policy for proc-macro2 v1.0.46
Making default policy for quote v1.0.21
Making default policy for syn v1.0.102

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
zsh: abort      cargo run --bin chain -- create data/packages/tokio policy.manifest

MIRAI call path is missing one call

I think there’s something off about the call path that is printed when running the tool: it is skipping one level of the call path (it shows the callee, but not the caller function). For example when I run it on num_cpus here’s the current output:

% ./scan.py -c num_cpus -m -g
INFO:root:=== Scanning num_cpus in data/packages ===
INFO:root:MIRAI:new function
INFO:root:MIRAI:new call path
INFO:root:MIRAI effect found: libc[5fb3]::unix::bsd::apple::{extern#0}::sysctlbyname, src, lib.rs:319:17
INFO:root:MIRAI effect found: num_cpus[1818]::get_physical, src, lib.rs:109:5
INFO:root:MIRAI:new function
INFO:root:MIRAI:new call path
INFO:root:MIRAI effect found: libc[5fb3]::unix::{extern#1}::sysconf, src, lib.rs:347:25
INFO:root:MIRAI effect found: num_cpus[1818]::get, src, lib.rs:74:5
INFO:root:MIRAI:new call path
INFO:root:MIRAI effect found: libc[5fb3]::unix::{extern#1}::sysconf, src, lib.rs:347:25
INFO:root:MIRAI effect found: num_cpus[1818]::get_num_physical_cpus, src, lib.rs:324:20
INFO:root:MIRAI effect found: num_cpus[1818]::get_physical, src, lib.rs:109:5
INFO:root:=== Generating call graph as a PNG ===

focusing in on the first call path, we have:
num_cpus::get_physical --> libc::sysctlbyname

but the call graph shows that it should be:
num_cpus::get_physical --> num_cpus::get_num_physical_cpus --> libc::sysctlbyname

so I think it’s just skipping one level of the call path here.

VS Code extension

make a vscode thing to integrate cargo scan into your vscode project

Add example policies in `policies`

This is a follow-up issue to #7.
Once you have added a test crate, write a policy for the crate in policies/<crate name>.toml The policy should have the same name as the crate. You can look at the existing policies to see the expected policy syntax.

To check your policy against the crate: run

./scan.py -c <crate name> -t -p policies/<crate name>.toml

and look at the output. It should inform you of any effects found in the source code that are not noted in the policy. Report anything weird or errors that you find!

more context?

I'd want to see how buff is used, but the default lines make it kind of hard; probably worth making this configurable

MIRAI feature request: take list of functions of interest from an input/config file

it would be nice to have the list of functions of interest be parsed from a config file, something like a .json input. That way we can configure it to look for whatever list of functions we want and the “cargo scan” tool can have options to add/remove functions.

Sanity check: check that call graph is complete

Check that for each node in the call graph, we also have scanned its definition (or can find it in a dependency, or it's an FFI call)

This is a good sanity check to make sure we're not silently missing large sections of the call graph or mapping function calls to an identifier we don't recognize.

Updated roadmap for traits

New roadmap for traits. Replaces #30

For v0:

add tracking code to maintain, for each trait, the total set of caller-checked effects and unsafety for that trait (in the current package scope, among all implementations)
add information to our audit/policy files: in addition to the public functions, we also need the traits (both implemented by and exposed by/declared by that crate). For each trait: whether caller-checked or unsafe, which specific caller-checked effects
In the call graph in scanner, make sure trait methods are added to the call graph
In the call graph in scanner, make sure name resolution for traits points to the abstract trait as added to the call graph.

Possibly for v0, or save for v1:

In the call graph in scanner, add specific trait implementations in addition to the abstract trait method
In a non-generic context when we know the trait that is pointed to, point to the specific trait implementation instead of the abstract trait method.
For dynamic trait objects, just point to the abstract trait method

For v1:

For dynamic trait objects, track which specific possible trait impls it could point to
For traits called in a generic context, track which specific possible trait impls it could point to

syn backend: use rust-analyzer to get type information and map identifiers to source code

The idea is to expose two functions,

fn type_of(s: SourceCodeLocation) -> syn::Type
fn def_of(s: SourceCodeLocation) -> SourceCodeLocation

where SourceCodeLocation is of the appropriate type to make these work (maybe just a syn blob, like syn::Expr?)

This is kinda necessary to do before we have any further interesting information on the syn backend and before we gather empirical data about a lot of crates. I will take a look at it when I next get the chance.

Roadmap for closures, fn pointers, and traits

This is a work-in-progress, discussion welcome!

A possible short-term roadmap is:

implement UnknownCall: an additional type of unsafety that must be audited. This is used for calls to closures and calls to function pointers.
additionally, implement UnknownTraitCall: this applies to calls to dynamic trait objects, and calls to traits in a generic context. In other words, if we call t.foo() where we don't know the exact type of t.
for traits in a non-generic context (t.foo() where we do know the type of t), fix resolution: currently we are resolving them to the trait path, we need to resolve them to the path where the trait is implemented instead.

Longer term:

for closures and function pointers: add closure locations to the call graph. (i) define anonymous path names for closures (ii) use type resolution to try to resolve where closures are defined. If type resolution fails, fall back on UnknownCall
for traits: track the set of possible implementations and be smarter about delegating responsibility. David had some ideas here and said it requires changes to the policy stuff

improve data abstraction + CSV output for effects

and make it consistent between grep & mirai backends

tracking branch: improve_effect_csv

Ident/Path confusion

Ident should represent an alphanumeric string with underscores like foo_bar; Path should represent a fully qualified path like my_crate::my_mod::foo_bar.

If Idents are being created with ::, these should be Paths instead
These should have nice abstractions over them as needed. An Ident can be turned into a Path, not vice versa; Path should expose functions like .iter_idents(&self) -> impl Iterator<Item = Ident>
I guess we should also document whether Ident and Path are allowed to contain keywords like crate, super, etc. And whether Path should be fully qualified: can it be a relative path, or must it be absolute?

@DavidThien something to fix if it's easy to do on your end, else I will get to this at some point when I have the lock on all files

Also the [ and ] should eventually be disallowed; currently I was using them as a hacky way to encode function calls that we were unable to name lookup, i.e. methods and call expressions. I can fix that later.

resolver quickfixes

@lzoghbi take a look at the function resolve::resolve_core -- I've put in two temporary fixes for the minor issues we discussed, to get the tests passing.

s.add(1) moves the column one over for some cases where a whitespace token wasn't resolved as expected
result.replace_empty_idents() replaces all instances of :::: with ::

Please go ahead and remove these lines when resolved!

    fn resolve_core(&self, i: &syn::Ident) -> Result<CanonicalPath> {
        let mut s = SrcLoc::from_span(self.filepath, i);
        // TODO Lydia remove
        s.add1();
        let i = Ident::from_syn(i);
        let mut result = self.resolver.resolve_ident(s, i)?;
        // TODO Lydia remove
        result.replace_empty_idents();
        Ok(result)
    }

Updated roadmap for closures and fn pointers

New roadmap for closures and function pointers. Replaces #30

For v0:

add two types of unsafe effect to our effect model: ClosureCreation and FnPointerCreation
in scanner.rs, whenever a closure or function pointer is created, add this to the list of effects associated with that function that need to be audited.

For v1, the plan is probably something like the following:

add closures to the call graph
for effects in a closure, we need a new audit model: effects can be caller-checked, creator-checked, safe, or unsafe.
for each closure variable or function pointer variable, track which possible closures/function pointers it might refer to (an overapproximation). If we don't have enough information to identify, could add an UnsafeCall effect to track this.

Performance regression on `rand`, `syn`

We seem to have a performance regression, currently hanging (not an infinite loop, just running really slow) on rand.

Sometime between 7294a89 and the present, so probably my fault -- introduced when I updated rust-analyzer!

here's --release time, checking debug now as well

RUST_LOG=info cargo run --release --bin scan -- data/packages/rand  210.47s user 2.46s system 98% cpu 3:36.76 total
RUST_LOG=info cargo run --bin scan -- data/packages/syn  1633.82s user 24.07s system 95% cpu 28:53.81 total

Add example crates in `data/test-packages`

This is an open issue that would be good for everyone involved to do at least once to start out!

To add an example crate:

make a directory in data/test-packages and run cargo init inside it
add whatever Rust source files you want and make sure it compiles
check the output of ./scan.py -c <my crate> -t and ./scan.py -c <my crate> -t -m to see how it looks. Report anything you think is weird/should be fixed

Finally, add the crate to automatically be updated on make test so that we can track the output. To do that:

add the crate name to data/crates/test-crates.csv
run make test
run git diff to check the output, then git add . and `git commit -m "add test crate" to add everything to version control.

base32

I scanned base32. Got promoted a few times, with things along the lines of:

For safe-to-run, it's probably worth checking #[cfg(test)] code but might be okay to not too; not clear?

Cosmetic changes

Two cosmetic changes we might want to consider

rename policy files to audit files -- better matches the intended use case
make --scan print out more helpful output by default -- could reuse the infrastructure for printing effects in --bin audit so that the two outputs would match and be analogous

assertion failures at `src/effect.rs:218:13`

another one for me to come back to

[2023-04-25T20:06:49Z INFO  cargo_scan::scanner] Scanning crate: "data/packages/uuid"
thread 'main' panicked at 'assertion failed: is_unsafe', src/effect.rs:218:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

rethinking EffectBlock abstraction

It's unclear what we want/need from the EffectBlock abstraction -- recent changes in scanner::push_effect and scanner::push_independent_effect have made it clear this is a source of confusion/bugs. It's possible we can get rid of this abstraction entirely in favor of plain effects.

name resolution crash on `serde_json`

Another name resolution crash, I think different from before:

% RUST_BACKTRACE=1 cargo run --bin scan data/packages/serde_json
    Finished dev [unoptimized + debuginfo] target(s) in 0.51s
     Running `target/debug/scan data/packages/serde_json`
[2023-05-31T20:17:38Z WARN  cargo_scan::scanner] Failed to scan file: data/packages/serde_json/src/features_check/error.rs (expected one of: `fn`, `extern`, `use`, `static`, `const`, `unsafe`, `mod`, `type`, `struct`, `enum`, `union`, `trait`, `auto`, `impl`, `default`, `macro`, identifier, `self`, `super`, `crate`, `::`)
thread 'main' panicked at '

Failed to lookup [email protected] in this Semantics.
Make sure to use only query nodes, derived from this instance of Semantics.
root node:   [email protected]
known nodes: [email protected]

backtrace:

', /Users/caleb/.cargo/registry/src/github.com-1ecc6299db9ec823/ra_ap_hir-0.0.149/src/semantics.rs:1371:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:64:14
   2: ra_ap_hir::semantics::SemanticsImpl::find_file::{{closure}}
   3: ra_ap_hir::semantics::SemanticsImpl::find_file
   4: ra_ap_hir::semantics::SemanticsImpl::analyze_impl
   5: ra_ap_hir::semantics::SemanticsImpl::type_of_expr
   6: ra_ap_hir::semantics::Semantics<DB>::type_of_expr
             at /Users/caleb/.cargo/registry/src/github.com-1ecc6299db9ec823/ra_ap_hir-0.0.149/src/semantics.rs:345:9
   7: cargo_scan::name_resolution::get_canonical_type::{{closure}}
             at ./src/name_resolution.rs:493:30
   8: core::option::Option<T>::and_then
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/option.rs:1309:24
   9: cargo_scan::name_resolution::get_canonical_type
             at ./src/name_resolution.rs:491:36
  10: cargo_scan::name_resolution::Resolver::resolve_type
             at ./src/name_resolution.rs:192:9
  11: cargo_scan::resolve::FileResolver::resolve_type_core
             at ./src/resolve.rs:118:9
  12: cargo_scan::resolve::FileResolver::resolve_type_or_else::{{closure}}
             at ./src/resolve.rs:147:36
  13: cargo_scan::resolve::FileResolver::resolve_or_else
             at ./src/resolve.rs:127:9
  14: cargo_scan::resolve::FileResolver::resolve_type_or_else
             at ./src/resolve.rs:147:9
  15: <cargo_scan::resolve::FileResolver as cargo_scan::resolve::Resolve>::resolve_path_type
             at ./src/resolve.rs:167:9
  16: cargo_scan::scanner::Scanner::scan_path
             at ./src/scanner.rs:590:18
  17: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:516:17
  18: cargo_scan::scanner::Scanner::scan_fn_statement
             at ./src/scanner.rs:375:42
  19: cargo_scan::scanner::Scanner::scan_expr
             at ./src/scanner.rs:471:21
  20: cargo_scan::scanner::Scanner::scan_fn_statement
             at ./src/scanner.rs:375:42
  21: cargo_scan::scanner::Scanner::scan_fn
             at ./src/scanner.rs:357:13
  22: cargo_scan::scanner::Scanner::scan_fn_decl
             at ./src/scanner.rs:315:9
  23: cargo_scan::scanner::Scanner::scan_item
             at ./src/scanner.rs:185:35
  24: cargo_scan::scanner::Scanner::scan_mod
             at ./src/scanner.rs:204:17
  25: cargo_scan::scanner::Scanner::scan_item
             at ./src/scanner.rs:180:34
  26: cargo_scan::scanner::Scanner::scan_file
             at ./src/scanner.rs:174:13
  27: cargo_scan::scanner::scan_file
             at ./src/scanner.rs:829:5
  28: cargo_scan::scanner::try_scan_file
             at ./src/scanner.rs:842:5
  29: cargo_scan::scanner::scan_crate_with_sinks
             at ./src/scanner.rs:878:13
  30: cargo_scan::scanner::scan_crate
             at ./src/scanner.rs:911:5
  31: scan::main
             at ./src/bin/scan.rs:29:19
  32: core::ops::function::FnOnce::call_once
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Similar error to #21 but those are now fixed, I wonder if a similar fix can be applied here?

Same crate as #26, but different error

There's no .unwrap() in the line in question so I am not sure exactly how to suppress and report the error.

pub fns vs pub fns with effects

The number of pub_fns and pub_fns_with_effects is the same for all the test crates and top 10 crates. Seems suspicious, we should add a test case to double check this.

make test:

crate, total, loc_lb, loc_ub, macros, loc_lb, loc_ub, conditional_code, loc_lb, loc_ub, skipped_calls, loc_lb, loc_ub, skipped_fn_ptrs, loc_lb, loc_ub, skipped_other, loc_lb, loc_ub, unsafe_trait, loc_lb, loc_ub, unsafe_impl, loc_lb, loc_ub, pub_fns, pub_fns_with_effects, pub_total_effects
toy-crates, 6, 285, 285, 2, 0, 2, 4, 33, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 19
fns-closures, 5, 136, 136, 9, 0, 9, 0, 0, 0, 5, 0, 5, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 7
serde_hex_minimal, 4, 634, 634, 16, 185, 195, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 7, 0, 0, 0, 0, 0, 0, 5, 5, 5
caller-checked, 3, 70, 71, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2
parsing-ex, 2, 146, 146, 7, 8, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0
unsafe-test, 2, 119, 119, 12, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1
cfg-ex, 1, 55, 55, 2, 0, 2, 4, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
num_cpus_minimal, 1, 37, 37, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
trait-ex, 1, 27, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
dependency-ex, 1, 23, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 3
permissions-ex, 1, 17, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 10
libc-ex, 1, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
fnv_minimal, 1, 135, 135, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
inline-ex, 1, 13, 13, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
dependency-parent, 1, 13, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1
resolution-ex, 1, 124, 124, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0
dummy, 1, 103, 103, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

make top10:

crate, total, loc_lb, loc_ub, macros, loc_lb, loc_ub, conditional_code, loc_lb, loc_ub, skipped_calls, loc_lb, loc_ub, skipped_fn_ptrs, loc_lb, loc_ub, skipped_other, loc_lb, loc_ub, unsafe_trait, loc_lb, loc_ub, unsafe_impl, loc_lb, loc_ub, pub_fns, pub_fns_with_effects, pub_total_effects
quote, 7, 2558, 2558, 70, 1441, 1488, 1, 9, 9, 0, 0, 0, 0, 0, 0, 16, 0, 16, 0, 0, 0, 0, 0, 0, 1, 1, 1
rand_core, 6, 1591, 1591, 20, 0, 20, 11, 215, 215, 0, 0, 0, 0, 0, 0, 10, 60, 65, 0, 0, 0, 0, 0, 0, 1, 1, 1
syn, 52, 43768, 43768, 1582, 4607, 5932, 10, 138, 138, 0, 0, 0, 0, 0, 0, 47, 15, 59, 0, 0, 0, 3, 0, 3, 42, 42, 180
autocfg, 4, 752, 752, 57, 31, 82, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 18, 219
unicode-xid, 3, 1646, 1646, 4, 0, 4, 5, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
rand, 26, 8366, 8366, 152, 1018, 1104, 24, 2431, 2431, 0, 0, 0, 0, 0, 0, 24, 3, 24, 0, 0, 0, 0, 0, 0, 1, 1, 5
libc, 220, 109520, 109520, 593, 40876, 40880, 1, 3, 3, 0, 0, 0, 0, 0, 0, 16, 1, 16, 0, 0, 0, 2, 0, 2, 29, 29, 97
serde, 20, 15672, 15672, 364, 2299, 2506, 13, 110, 110, 0, 0, 0, 0, 0, 0, 235, 70, 269, 0, 0, 0, 0, 0, 0, 1, 1, 1
proc-macro2, 10, 4717, 4717, 39, 180, 203, 58, 544, 544, 0, 0, 0, 0, 0, 0, 17, 0, 17, 0, 0, 0, 0, 0, 0, 5, 5, 10
cfg-if, 1, 175, 175, 1, 56, 56, 1, 87, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Cargo chain crash on `serde-hex`

$ chain create serde-hex-0.1.0 okay
Creating audit chain
Loading audit package lockfile
Creating dependency graph
Making default policy for nodrop v0.1.14
Making default policy for array-init v0.0.4
Making default policy for unicode-ident v1.0.11
Making default policy for proc-macro2 v1.0.66
Making default policy for quote v1.0.33
Making default policy for syn v2.0.29
Error running command: Audit chain creation failed: Failed to read Cargo metadata from Cargo.toml file /home/d/hack/cargo-scan/.audit_crates/syn-2.0.29/Cargo.toml, Some(Version { major: 1, minor: 72, patch: 0 })

Severity labels + continuous integration

another idea from @deian

For automated scans, would be cool to classify severity: e.g. Critical, High, Moderate, Low.

This could be integrated with a CI run of cargo scan to determine if any prior audits/policy saves are critically broken and help triage errors.

Similar to this:

A useful closure + fn pointer optimization

(moving this a new thread so we can tag it roadmap-v1, c.f. #34)

Just discussed this with David, false positives for closure creation seem to come up commonly, e.g. for closures like |x| x + 1 which are quite common.

An extremely useful optimization we should do here is:

if the closure contains no effects inside it -- so absolutely no effects are possible when called -- skip adding the ClosureCreation effect.

Update audits on diff

Note to look at later once initial experiments and audits are complete:
we want a semantically meaningful diff calculator that uses static analysis, in tandem with the existing audit, to determine which parts of the code need to be looked at again.

MIRAI fork: run clippy and fmt

cargo clippy catches some helpful lint errors here!
cargo fmt fixes code formatting

It's also best practice, in general, not to have commented-out code in the main branch of a shared repository -- it usually doesn't end up being very useful later. But it can come in handy to annotate parts of the code where something could be added later.