Giter Site home page Giter Site logo

galoisinc / mate Goto Github PK

View Code? Open in Web Editor NEW
170.0 17.0 11.0 119.74 MB

MATE is a suite of tools for interactive program analysis with a focus on hunting for bugs in C and C++ code using Code Property Graphs.

Home Page: https://galoisinc.github.io/MATE/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.14% Dockerfile 0.22% Makefile 0.20% Python 42.80% C++ 5.79% C 8.91% GDB 0.01% LLVM 25.35% Jupyter Notebook 6.23% CMake 0.12% ANTLR 0.46% Haskell 1.23% Ruby 0.21% HTML 0.04% TypeScript 7.94% SCSS 0.34%
code-property-graph llvm program-analysis security security-research static-analysis symbolic-execution

mate's Introduction

MATE

MATE is a suite of tools for interactive program analysis with a focus on hunting for bugs in C and C++ code. MATE unifies application-specific and low-level vulnerability analysis using code property graphs (CPGs), enabling the discovery of highly application-specific vulnerabilities that depend on both implementation details and the high-level semantics of target C/C++ programs.

See the online documentation for more information.

Acknowledgements

This material is based upon work supported by the United States Air Force and Defense Advanced Research Project Agency (DARPA) under Contract No. FA8750-19-C-0004. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force or DARPA. Approved for Public Release, Distribution Unlimited.

mate's People

Contributors

langston-barrett avatar thinkmoore avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mate's Issues

doc: Make notebook tutorial into a Jupyter notebook

The notebook tutorial is currently a sequence of code blocks and their outputs embedded into markup. But this is exactly what Jupyter notebooks are for! We should convert the tutorial into a notebook and ship it like Usagefinder. Jupyter even has an option to export to ReST, so we could easily keep it in sync with the version in the docs.

Manticore UI: Checkboxes to enable additional Mantiserve detectors

We should have configuration settings (checkboxes?) that allow us to enable or disable Manticore's various detectors (e.g. concrete OOB) for a particular under-constrained Manticore run.

Detectors can de enabled/disabled with a checkbox, but they usually take some parameters. One that is shared among all detectors is a boolean switch fast, telling whether manticore should stop exploring a state as soon as a detector is triggered. It sounds reasonable to me to always enable fast without exposing it to the UI for underconstrained tasks.

The underconstrained OOB detector is always enabled by default in UC mode, so we don't need to bother about that one.
Other detectors that can be exposed are:

  • the concrete heap OOB detector which doesn't take any arguments, so a checkbox would work for enabling/disabling it
  • possibly the VariableBoundsAccess, UninitializedVars, and UseAfterFree detectors (although they haven't really been tested in an under-constrained context). All three take an optional list of POIs as a parameter. I'm not sure what would be the best way to select and send POI information from the UI, cc @william.woodruff @ted

As regards the data format for detector options, it's documented in the mantiserve REST API documentation.

bug: Blight journal is empty

Hello,

Today, I have started to use MATE, I was able to compile and go through the notes.c example. However, I wasn't able to compile my example project that is compiled using 'make' after the Makefile generation through cmake. The Makefile is created under the build dir in the root src dir.

 mate-cli oneshot exampleProject/                                                          
โœ– bb83b24f0e4242648ab3f9592ea1e0f7: failed

=======================
EXITED WITH: 2
=======================

=======================
STDOUT:

=======================

=======================
STDERR:
make: /usr/bin/cmake: Command not found
make: *** [Makefile:4932: cmake_check_build_system] Error 127

=======================
Although, I already have cmake at /usr/bin/cmake

whereis cmake                                                                     
cmake: **/usr/bin/cmake** /usr/lib/x86_64-linux-gnu/cmake /usr/lib/cmake /usr/share/cmake /usr/share/man/man1/cmake.1.gz /usr/src/googletest/googletest/cmake

I have also tried to creat a tar out of the exampleProject/ dir
tar czf exampleProject.tar.gz exampleProject/
mate-cli artifact create compile-target:tarball exampleProject.tar.gz
mate-cli compile create --wait --make-targets exampleProject.elf --artifact-id 6226e34a987942589c5c9cadb089c9ea

Still, it didn't work!

Any idea what I could did wrong, thank you!

Issues compiling bitcode for redis

A user reached out about an error ingesting redis (https://github.com/redis/redis) using mate-cli oneshot redis.

The initial issue appears to be compatibility with -flto, which redis uses at it's default optimization level. At link time, the compilation logs include the error:

/usr/bin/ld: /opt/mate/llvm-wedlock/bin/../lib/LLVMgold.so: error loading plugin: /opt/mate/llvm-wedlock/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory

Commenting out the relevant lines of the Makefile improves things:

diff --git a/src/Makefile b/src/Makefile
index 7e17f1f83..bd6d11a22 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -16,10 +16,10 @@ release_hdr := $(shell sh -c './mkreleasehdr.sh')
 uname_S := $(shell sh -c 'uname -s 2>/dev/null || echo not')
 uname_M := $(shell sh -c 'uname -m 2>/dev/null || echo not')
 OPTIMIZATION?=-O3
-ifeq ($(OPTIMIZATION),-O3)
-       REDIS_CFLAGS+=-flto
-       REDIS_LDFLAGS+=-flto
-endif
+#ifeq ($(OPTIMIZATION),-O3)
+#      REDIS_CFLAGS+=-flto
+#      REDIS_LDFLAGS+=-flto
+#endif
 DEPENDENCY_TARGETS=hiredis linenoise lua hdr_histogram fpconv
 NODEPS:=clean distclean

But the compilation still fails. It looks like this is because redis' build process uses separate invocations for compiling and linking each final executable:

    LINK redis-server
INFO:Entering CC [-g -ggdb -rdynamic -o redis-server adlist.o quicklist.o ae.o anet.o dict.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o ../deps/hiredis/libhiredis.a ../deps/lua/src/liblua.a ../deps/hdr_histogram/libhdrhistogram.a ../deps/fpconv/libfpconv.a ../deps/jemalloc/lib/libjemalloc.a -lm -ldl -pthread -lrt -g3 -grecord-gcc-switches]
DEBUG:Compile using parsed arguments:
InputList:         [-g -ggdb -rdynamic -o redis-server adlist.o quicklist.o ae.o anet.o dict.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o ../deps/hiredis/libhiredis.a ../deps/lua/src/liblua.a ../deps/hdr_histogram/libhdrhistogram.a ../deps/fpconv/libfpconv.a ../deps/jemalloc/lib/libjemalloc.a -lm -ldl -pthread -lrt -g3 -grecord-gcc-switches]
InputFiles:        []
ObjectFiles:       [adlist.o quicklist.o ae.o anet.o dict.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o ../deps/hiredis/libhiredis.a ../deps/lua/src/liblua.a ../deps/hdr_histogram/libhdrhistogram.a ../deps/fpconv/libfpconv.a ../deps/jemalloc/lib/libjemalloc.a]
OutputFilename:    redis-server
CompileArgs:       [-g -ggdb -g3 -grecord-gcc-switches]
LinkArgs:          [-rdynamic adlist.o quicklist.o ae.o anet.o dict.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o ../deps/hiredis/libhiredis.a ../deps/lua/src/liblua.a ../deps/hdr_histogram/libhdrhistogram.a ../deps/fpconv/libfpconv.a ../deps/jemalloc/lib/libjemalloc.a -lm -ldl -pthread -lrt]
ForbiddenFlags:    []
IsVerbose:         false
IsDependencyOnly:  false
IsPreprocessOnly:  false
IsAssembleOnly:    false
IsAssembly:        false
IsCompileOnly:     false
IsEmitLLVM:        false
IsLTO:             false
IsPrintOnly:       false

DEBUG: We are skipping bitcode generation because we did not see any input files.

After a quick inspection, I wasn't able to track down what emits the final DEBUG message (clang itself?).

`test_poi_use_after_free` fails

On 4ce8832,

docker-compose -f docker-compose.yml -f docker-compose.test.yml run -v "$(pwd):/mate" test -- -- -n=1 -k use_after_free

passes. On 39d9de7, it fails:

>       assert results == expected
E       assert {(69, 88)} == {(76, 95)}
E         Extra items in the left set:
E         (69, 88)
E         Extra items in the right set:
E         (76, 95)
E         Full diff:
E         - {(76, 95)}
E         + {(69, 88)}

tests/postgres/poi_analysis/use_after_free_test.py:61: AssertionError

Test for isolated subgraphs

We might want to define and test for some high level CPG invariants, such as:

  • Do we expect our CPG to be connected, i.e. for there to be a set of edges from any node to any other node?
  • If not, do we expect certain nodes or communities of nodes to always be orphaned/unconnected?

Migrated from internal (Gitlab) MATE issue number(s) 624

Support for variable-arity functions

MATE's support for varargs functions is limited. We don't create Argument nor ParamBinding CPG nodes for these functions.

Migrated from internal (Gitlab) MATE issue number(s) 149, 178.

REST API: Status endpoint

The REST API should have an endpoint that reports the overall health of the MATE instance, maybe a few top-level statistics about the DB.

Relate DWARF types to their corresponding LLVM types

This came up during yesterday's dogfooding session: DWARFTypes are a little bit easier to query for, but they're difficult to relate back to the LLVM-level CPG components. To get from a DWARFType to an LLVMType, you currently need to do a dance like this:

  • Retrieve some DWARFType
  • Get one or more local variables/arguments of that type with DWARFType.local_variables
  • Get the LLVMType of one of those LocalVariables/Arguments with .llvm_type

We should either supply a helper that does that behind the scenes, or find a way to link them directly.

Migrated from internal (Gitlab) MATE issue number(s) 969.

Debugging a failing build

We are trying to run MATE on an exact pull of dnsmasq v2.77. The project compiles but fails to build after several hours. We attempted to follow the build debugging suggestions on https://galoisinc.github.io/MATE/debugging-builds.html, but the Docker logs only show the most recent activity and we didn't see information about the failure. There apparently were no out-of-memory issues (checked with dmesg | grep -i "OOM"). We'd appreciate any ideas about the likely cause or potential fixes.

Our machine runs Ubuntu 22.04.2 LTS with 4 threads and 64GB of RAM.

Screen Shot 2023-07-06 at 2 09 13 PM

getting "invalid label" while executing "docker-compose"

hi

i did every step in "quickstart" page successfully but still i get this error:

1 error(s) decoding:

* error decoding '[mate].labels': invalid label ["com.galois.mate.ci-safe-to-remove"]

while executing:

docker-compose -f docker-compose.yml -f docker-compose.ui.yml -f docker-compose.notebook.yml up
work dir is MATE's source dir.

SIGKILL while build process

Hello, Team,
Great analysis tool, thank you!

Could you help me please to deal with haproxy analysis... or at least how to get more diagnostics to find errors
I adapt haproxy makefile to compile with MATE (haproxy.tar.gz attached), than created artifact:

mate-cli artifact create compile-target:tarball haproxy.tar.gz

got
{ "artifact_id": "76e78253db7c4dc89fa43e4f0ec31ae9", "attributes": { "filename": "haproxy.tar.gz" }, "build_ids": [], "compilation_ids": [], "has_object": true, "kind": "compile-target:tarball" }

Than I compiled haproxy - successfully:
mate-cli compile create --wait --artifact-id 76e78253db7c4dc89fa43e4f0ec31ae9 --make-targets all

got state "compiled":
{ "artifact_ids": [ "76e78253db7c4dc89fa43e4f0ec31ae9", "07e624b817d14ae297532714c916a1ab", "c7f74f4f1b734677be79b7ca84f1d873", "da921fd94c804a5a8ef4470af043f49f", "a19a76750b8b4862a0384c99d323c18e" ], "build_ids": [], "compilation_id": "99c3197dfbfa4feda4ed7ca82987c59a", "log_artifact": { "artifact_id": "07e624b817d14ae297532714c916a1ab", "attributes": { "filename": "compile.log" }, "build_ids": [], "compilation_ids": [ "99c3197dfbfa4feda4ed7ca82987c59a" ], "has_object": true, "kind": "compile-output:compile-log" }, "options": { "containerized": false, "containerized_infer_build": true, "docker_image": null, "experimental_embed_bitcode": false, "extra_compiler_flags": [], "make_targets": [ "all" ], "testbed": null }, "source_artifact": { "artifact_id": "76e78253db7c4dc89fa43e4f0ec31ae9", "attributes": { "filename": "haproxy.tar.gz" }, "build_ids": [], "compilation_ids": [ "99c3197dfbfa4feda4ed7ca82987c59a" ], "has_object": true, "kind": "compile-target:tarball" }, "state": "compiled" }

but when I tried to build haproxy - i got failed state
What should I do to make process successfully finished?

After
docker container logs mate_executor_1
I got error messages:

`
[2022-10-23 10:07:06,466: DEBUG/ForkPoolWorker-1] waiting for build: build.uuid='9e66837900734413a413d50900b709e9' build.state=<BuildState.Building: 'building'>

[2022-10-23 10:07:11,372: ERROR/ForkPoolWorker-8] Task mate.tasks.build.build_artifact[c1ff0258-7649-479d-8bd6-bb4d59d577bf] raised unexpected: MateError('9e66837900734413a413d50900b709e9', '[PosixPath('/opt/mate/llvm-wedlock/bin/opt'), '-load', PosixPath('/opt/mate/local/lib/libSoufflePA.so'), '-load', PosixPath('/opt/mate/local/lib/libPAPass.so'), '-load', PosixPath('/opt/mate/local/lib/libMATE.so'), '-disable-output', '-signatures=/tmp/tmpytz40zm7.json', '-time-passes=false', '-ast-graph-writer', '-pretty-llvm-value=true', '-datalog-pointer-analysis=true', '-mem-dep-edges=false', '-control-dep-edges=true', '-datalog-analysis=unification', '-debug-datalog=false', '-debug-datalog-dir=/tmp/tmpfgbuum3s/pa_results', '-check-datalog-assertions=false', '-context-sensitivity=2-callsite', '-cpg-file', '/tmp/tmp0gfyovjm.jsonl', PosixPath('/tmp/tmpub_vmjt6.bc')] exited due to signal SIGKILL (9);\n\nSTDOUT:\nb''\n\nSTDERR:\nb'Writing facts to: "/tmp/tmpfgbuum3s/pa_results"...\nfneg: Unhandled instruction\n'')
Traceback (most recent call last):
File "/opt/mate/lib/python3.8/site-packages/celery/app/trace.py", line 451, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mate/lib/python3.8/site-packages/celery/app/trace.py", line 734, in protected_call
return self.run(*args, **kwargs)
File "/opt/mate/local/lib/python3.8/site-packages/mate/tasks/build.py", line 49, in build_artifact
mate_build.build_artifact(artifact, build, self.session, opts)
File "/opt/mate/local/lib/python3.8/site-packages/mate/build/build.py", line 1002, in build_artifact
raise e
File "/opt/mate/local/lib/python3.8/site-packages/mate/build/build.py", line 992, in build_artifact
(_new_artifacts, _graph) = builder.build_artifact(artifact, session)
File "/opt/mate/local/lib/python3.8/site-packages/mate/build/build.py", line 617, in build_artifact
mate_jsonl = self._build_mate_jsonl(Path(canonicalized_bc.name))
File "/opt/mate/local/lib/python3.8/site-packages/mate/build/build.py", line 458, in _build_mate_jsonl
raise CPGBuildError.from_process_error(e, build_id=self._build.uuid)
mate_common.error.MateError: ('9e66837900734413a413d50900b709e9', '[PosixPath('/opt/mate/llvm-wedlock/bin/opt'), '-load', PosixPath('/opt/mate/local/lib/libSoufflePA.so'), '-load', PosixPath('/opt/mate/local/lib/libPAPass.so'), '-load', PosixPath('/opt/mate/local/lib/libMATE.so'), '-disable-output', '-signatures=/tmp/tmpytz40zm7.json', '-time-passes=false', '-ast-graph-writer', '-pretty-llvm-value=true', '-datalog-pointer-analysis=true', '-mem-dep-edges=false', '-control-dep-edges=true', '-datalog-analysis=unification', '-debug-datalog=false', '-debug-datalog-dir=/tmp/tmpfgbuum3s/pa_results', '-check-datalog-assertions=false', '-context-sensitivity=2-callsite', '-cpg-file', '/tmp/tmp0gfyovjm.jsonl', PosixPath('/tmp/tmpub_vmjt6.bc')] exited due to signal SIGKILL (9);\n\nSTDOUT:\nb''\n\nSTDERR:\nb'Writing facts to: "/tmp/tmpfgbuum3s/pa_results"...\nfneg: Unhandled instruction\n'')

[2022-10-23 10:07:11,381: DEBUG/ForkPoolWorker-8] http://storage:9000 "PUT /artifacts/6cfabc5ea6fe4c66973e1cfefbf9a6b0 HTTP/1.1" 200 0

[2022-10-23 10:07:11,474: WARNING/ForkPoolWorker-1] got failed state for build, not running analyses: build.uuid='9e66837900734413a413d50900b709e9' build.state=<BuildState.Failed: 'failed'>

[2022-10-23 10:07:11,483: INFO/ForkPoolWorker-1] Task mate.tasks.build.await_built_state_and_start_all_analyses[1de85dcd-507c-4fac-ae8d-3633cbccd8ab] succeeded in 2292.0805339869694s: None

[2022-10-23 10:07:11,484: ERROR/ForkPoolWorker-1] probable API misuse: task didn't take a build_id kwarg
`

Flowfinder: Show incoming/outgoing edges by kind

I think it would be useful and quite general to add two new context menu items that work on any kind of node, one for showing incoming edges and one for showing outgoing edges. An initial version might just show all edges in the given direction, whereas a more advanced iteration might allow (but not require) filtering by edge kinds, possibly even limited to edge kinds that are valid in said direction from the given node kind, according to the schema (endpoints.json).

This would not only make it easy to find very particular paths (e.g. just a subset of data flow entering/exiting from a given node) to reduce clutter, but would also expose a whole bunch of nodes and edges that aren't yet visible in Flowfinder, like LLVM type nodes for instance.

Manticore UI: Indicate states that terminated in an error

We should add a visual indicator that a state ended with an error, the same way we indicate states that contain potential bugs.
I'm thinking either adding text like State N: exploration error in the state banner, or colouring the banner in orange/red. Or even adding some picture like a red exclamation mark at the beginning of the state banner...

Reminder: a state that ends with an error means that the error_msg field of the exploration node will be set.

Support LLVM exception handling

Update the CPG to better account for exceptions.

Some tasks:

  • Add CallReturn nodes for exceptional return values (perhaps CallThrow?)
  • Distinguish between regular and exceptional return in control flow graph
  • Connect exceptional return values with landing pads
  • Consider whether to add implicit exceptions for arithmetic, etc.

Migrated from internal (Gitlab) MATE issue number(s) 735

Doc: Link to blog post

There is a forthcoming release announcement on the Galois blog. We should stick a link to that somewhere in the docs.

DWARF nodes: APIs for common type derivations

Right now, if you have a DWARFType foo and you want to get some of its common derivations (e.g. foo *, const foo *), you need to jump through some hoops:

next(typ for typ in foo.deriving_types if typ.is_pointer)
next(typ for typ in foo.deriving_types if typ.is_const)

This becomes especially annoying when you want multiple levels of derivation. Instead, we should support APIs like these:

# foo -> const foo *
foo.const_of.pointer_of

UI: Show help text when there are no compilations/builds/etc.

When there are no compilations, builds, or snapshots, the MATE UI is pretty bare: it just shows some empty tables. If I were a novice user, it might not be obvious to me how to go about populating those tables - perhaps we could link to the documentation? Or just show some canned text about how to kick off a build?

doc: Deployment security considerations

MATE currently isn't safe to expose to the public facing internet, and won't be without a decent amount of service hardening. We should document the current security considerations required during a deployment (even a private one).

An incomplete list of aspects that need to be documented (and potentially fixed):

  • MATE uses an access token and secret to establish a connection with the storage service (MinIO). It uses hard-coded ones by default, but users should update their .env file to set each to a long random string.

  • MATE's executor (Celery) service runs as root, with C_FORCE_ROOT=1 and Pickle-based serialization enabled. We do this to marshal Pydantic models correctly, but it's a potential security problem. We should either figure out a way to run as non-root (tricky?) or to get rid of our Pickle dependency (maybe less tricky?)

  • Compilation and build processes are not meaningfully isolated from MATE's core runtime: a non-containerized build/compilation can modify the executor service underneath it, including in ways that'll completely hose the system. Containerized builds are slightly more isolated, but still have access to the Docker socket and could potentially pivot to other services/escape their container.

Document CPG schema guidelines

We repeatedly have questions like these, which I haven't thought through the answers to thoroughly enough. I'd like to take the time to consider them more deeply, list pros and cons, record our thoughts and conclusions, and enact them by changing the schema where necessary.

  • What direction should AST edges go? (InstructionToParentBlock, or BlockToChildInstruction)
  • Sometimes, we have edges that represent relationships that conceptually could hold between many node types (especially in ASTs), like InstructionToParentBlock. Should we use the same edge kind and name for all of these, or separate edge kinds depending on the endpoints (e.g. one kind for LLVM, one for ASM, etc.)?
  • When we have two analyses that provide the same sort of information, but with varying precision or other factors, like the LLVM and Datalog pointer analyses, should we use the same or different edge kinds?
  • How should the structure of the CPG reflect superclass-subclass relations, e.g. those that appear in LLVM? @scott and I identified four possibilities here: https://gitlab-ext.galois.com/mate/MATE/-/merge_requests/499

Additionally, the following are guidelines that we already follow and we could document:

  • The LLVM AST is primary, so node and edge kinds in this AST are un-prefixed, e.g., Function and Instruction refer to LLVM-level functions and instructions.

Migrated from internal (Gitlab) MATE issue number(s) 550

Doc: List contributors

Initial list:

Galois:

  • Andrei Stefanescu
  • Andrew Kent
  • Ankita Singh
  • Annie Cherkaev
  • Ben Davis
  • Ben Selfridge
  • Jason Graalum
  • Karl Smeltzer
  • Langston Barrett
  • Michelle Cheatham
  • Niki Carroll
  • P.C. Shyamshankar
  • Richard Jones
  • Scott Moore
  • Ted Hille

Trail of Bits:

  • Alex Cameron
  • Artem Dinaburg
  • Boyan Milanov
  • Brad Swain
  • Carson Harmon
  • Eric Hennenfent
  • Eric Kilmer
  • Rory Mackie
  • Sonya Schriner
  • Trent Brunson
  • Weston Hopkins
  • William Woodruff

Harvard:

  • Aaron Bembenek
  • Kevin Zhang
  • Steve Chong

Support compilations that use configure scripts

This would be a low-effort way to vastly expand the set of programs that the compilation pipeline can handle.

For example, many of the programs in the gllvm examples would be good candidates, including coreutils and binutils.

Migrated from internal (Gitlab) MATE issue number(s) 854.

Test program with C and C++ sources

We've been testing MATE on codebases that are either C++ or C, but many real-world codebases are both. We should confirm that, particularly on the machine code mapping side, having multiple compilation units of different origin languages doesn't cause problems.

The recompilation pipeline inside of quotidian needs to know how best to recompile the assembly it's given. It currently deduces the right compiler frontend to use (clang or clang++) based on the DWARF language ID(s) for each compilation unit, which aren't guaranteed to all be the same.

Migrated from internal (Gitlab) MATE issue number(s) 440.

Tests: Reinstate `example_1.c`

This is a test program developed by Apogee Research that we used extensively in MATE's tests. They intend to go through the necessary steps to get approval for us to publish it, but haven't yet. Many of the MATE tests had to be disabled. We should reinstate this test whenever we can.

No POIs in vulnerable program

hi

i am trying to test a code with MATE:

int main(){

	int x = 1;
	if (x=0){
	 x=3;
	 }
	int y = x/0;

	int size = 5;
	char *src = "xxxxxxxxxx";
	int *dest = malloc(size + 1);
	memcpy(dest, src, size);
	
	int* alc_mem__ptr = malloc(5);
	memcpy(alc_mem__ptr, src, 9);
	free(alc_mem__ptr);
	memcpy(alc_mem__ptr, src, 9);

	return 0;
}

this code has several issues; redundancy, use after free, overflow ...
but i am not getting any reports in POI or any other sections.

to make sure that i correctly installed MATE, i analyzed "authentication.c" file and two problems are shown in POI section.

p.s: how to delete the scanned project in build section?

Flowfinder: Toggle for constant nodes

If I'm looking at a dataflow query from user input to some dangerous sink... I probably don't care about constant nodes. We could have a toggle for these similar to the one for memory locations.

Manticore UI: Link to DSL docs

We should provide a link to the MATE documentation for the constraint DSL, to help users when writing custom constraints.

Clean up tests

The tests for all the different MATE packages live in a top-level tests/ directotry, some should be put in {package_name}/test subdirectories depending on which package they test.

Project-wide/integration tests (such as CPG tests) should remain in the top level tests directory.

Migrated from internal (Gitlab) MATE issue number(s) 1498.

Doc: Add Flowfinder tutorial

I wrote this Flowfinder tutorial, we should add it to the documentation.

Flowfinder Hands-On Demo

Setup

After setting up a MATE system (see the Quickstart):

  • Download notes.c
  • Install the MATE CLI: pip install -r cli-requirements.txt
  • Upload notes.c to MATE: mate-cli oneshot -p frontend/test/programs/notes.c
  • Navigate to the MATE web UI to check the status of the build

Background

The target program is a simple server that allows users to create notes (i.e.,
store binary blobs). When a note is written, the user is given a completely
random key. They can retrieve the note using this key.

The server supports three commands, write, read, and quit.

Example use:

$ clang -Wall -Werror -o notes -O1 -g notes.c
$ ./notes
Listening on port 8894

In a separate terminal:

$ nc localhost 8894
notes> write very secret data
<server will send back a long alphanumeric key here>
notes> read <key that the server sent back>
very secret data

Tutorial

First, open the program in Flowfinder by clicking the "analyze in Flowfinder"
link. You should see a mostly empty screen with a sidebar on the right.

Exploring a Function

Let's start by looking at where user input enters the program from the network,
via recv. Use the "Select a Function Node to Start..." box to add the recv
function to the screen. Then, find add all the callsites of recv to see how
user input can enter the program.

  • Type recv into the "Select a Function Node to Start..." box
  • Click the "Add Node" button. You should see a box labeled recv.
  • Right-click the recv box and select "Show callsites".

Feel free to re-arrange nodes and edges at any point by dragging and dropping
them. Some nodes can be collapsed or expanded by double-clicking them.

At this point, you should see an arrow from a large box labeled handle_loop to
a small box labeled recv, indicating that the instruction %t8 = call i64 @recv ... in handle_loop calls recv.

Exploring the CFG

Now we know that network input enters the program at this call to recv in
handle_loop. What happens after that? Try taking a look at the slice of the
control-flow graph that follows this call: right-click the call instruction
and select "Show control flow from this node".

At this point, you should see a fairly large graph. What's going on here? If you
follow enough arrows, you should be able to convince yourself that the recv is
inside of a loop, and so the control-flow graph following the call is exactly
the CFG of thee handle_loop function, which consists entirely of this loop.

Hide or remove the control-flow slice by pressing the "x" or the slider in the
upper left or upper right hand corner of the corresponding card (the card should
have "Kind: Control flow" on it).

Exploring the DFG

The CFG was a little overwhelming, with a suboptimal signal-to-noise ratio.
Let's just look at the places where the data from the recv call gets used.
Right-click on the call to recv and click "Show uses". You should see a single
store instruction show up. This isn't too helpful - we've just taken a single
step through the dataflow graph. Let's try taking a few at once.

Try adding the slice of the dataflow graph that starts at this call to
recv: right-click the call instruction and select "Show data flow from this
node".

This graph seems a little sparse. First of all, the targets are all in
handle_loop, but surely user-provided data flows to other functions. If you
examine the source lines carefully, you can see that this slice actually shows
the data flow from the return value of recv. If we want to look for how
user-provided data flows through the program, we'll have to try something else.

Hide or remove all the "Dataflow" and "Uses" cards.

Signatures

The problem is that we really want to track the flow of data originating
outside of the program. The mechanism MATE uses for this purpose is called an
"input signature". There are also corresponding "output signatures" which
represent the effect of the program on the external world (printing messages,
creating files, etc.).

Try right-clicking the call to recv and select "Show dataflow and I/O
signatures". Right-click the leftmost input signature that appears (the node is
pentagonal and pink), and click "Show data flow from this node". You should now
see a much more interesting data flow graph. Can you see the vulnerability?
Hint: it's a path traversal.

The problem is that the user input from this call to recv flows to the path
argument of a call to fopen: the key that the user gives to the read command
is used as a path, with no sanitization. This means the user can input a key
like ../../../super/secret/file and read the contents of that path.

Right-click the output signature for fopen (which represents the file that may
be created by fopen), and click "show callsites". You should be able to see
that the vulnerable call occurs in the cmd_read function. Congratulations,
you found the vulnerability!

If you'd like to understand how the data flows from the recv to the fopen in
more detail, try disabling the "Hide Nodes - memory" slider in the sidebar. A
circular, green node labeled nil*stack_alloc@handle_loop[[1024 x i8]* %t1][0][*] should appear between the input signature for recv and the output
signature for fopen, which indicates that the data flows through a stack
allocation of size 1024 that was allocated in handle_loop. You can right-click
the memory node and click "Show allocation site" to show the LLVM alloca
instruction which allocates this buffer (corresponding to a local variable at
the C level). If you "Show operands" on the call to recv and then "Show
operands" on the getelementptr instruction, you can see that this is the
buffer passed as the second argument of recv. (You could also try establishing
this by walking the other direction in the dataflow graph, by clicking "Show
uses" on the alloca and so on.) Nice!

Remove TraceLogger

This functionality has bitrotted. Associated documentation doc/traces.rst should also go.

Migrated from internal (Gitlab) MATE issue number(s) 476.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.