Giter Site home page Giter Site logo

provbpf's Introduction

camflow-bpf

Use this vagrant VM.

Setting things up

Building libbpf:

make build_libbpf

Building latest fedora kernel:

make build_kernel

All of the build in one go:

make prepare

Delete the dependencies' build folders:

make delete_dependency

You should reboot the machine and make sure you boot under the correct kernel.

Building and running

Building bpf kernel and user space program:

make all

Installing CamFlowBPF:

make install

Starting the service:

make start

Stopping the service:

make stop

Check provbpf.ini that can be edited at /etc/provbpf.ini.

Running bpf program:

make run

provbpf's People

Contributors

tfjmp avatar bstelea avatar

Stargazers

游~游~游 avatar dqq avatar  avatar

Watchers

 avatar

Forkers

dejavudwh

provbpf's Issues

Policy setting

From the paper, I found that ProvBPF supports custom capture policies to select kernel objects and system events that are relevant for a specific analysis. How do I configure it if I want to only record events related to objects associated with specific container on the host? Could you possibly update a detailed configuration template? Thank you for taking the time to check my questions and look forward to your reply.

SOCK_INODE as a bpf helper

We discussed this problem before. I think the cleanest way to do that is to provide a bpf helper that performs this operation for us. It will help support the same functionality as standard LSM (and I have some further ideas beyond this immediate project).

The following function:

struct inode *SOCK_INODE(struct socket *socket)
{
	return &container_of(socket, struct socket_alloc, socket)->vfs_inode;
}

Need to be transformed in something like:

struct inode *bpf_sock_inode(struct socket *socket)
{
	return &container_of(socket, struct socket_alloc, socket)->vfs_inode;
}

This needs to be implemented in the kernel.

My understanding is that the function need to be added to this list here:
https://github.com/torvalds/linux/blob/master/include/linux/bpf.h#L1833

Needs to be implemented in https://github.com/torvalds/linux/blob/master/kernel/bpf/helpers.c

An example here:
https://github.com/torvalds/linux/blob/master/kernel/bpf/helpers.c#L492

Question: could we not put it in helpers.c, but instead create our own file like lsm-helpers.c? that would help maintain the code longer term.

@s00y33 that will need to be implemented in this repo: https://github.com/tfjmp/provbpf-kernel

Memory protection and mmap

I have been tweaking the code around mmap.

We should probably tweak things to support all possible combination of read/write/execute possible when mmaping.

This means modifying code:
https://github.com/tfjmp/camflow-bpf/blob/225493e7e7a175c2f2a98f63bfeadbe8765c8228/kern.c#L1201 << test for all combinations.
https://github.com/tfjmp/camflow-bpf/blob/225493e7e7a175c2f2a98f63bfeadbe8765c8228/include/shared/prov_types.h#L134 <<< add more types (e.g. RDONLY, WRONLY, EXECONLY, RDWR, RDEXEC, WREXEC, RDWREXEC).
https://github.com/tfjmp/camflow-bpf/blob/225493e7e7a175c2f2a98f63bfeadbe8765c8228/types.c#L77 <<< add the stringification here.
https://github.com/tfjmp/camflow-bpf/blob/225493e7e7a175c2f2a98f63bfeadbe8765c8228/types.c#L259 <<< matching here.
https://github.com/tfjmp/camflow-bpf/blob/225493e7e7a175c2f2a98f63bfeadbe8765c8228/types.c#L419 <<< matching here.

Do not hesitate to rename things for consistency. Maybe we should follow access mode (https://www.gnu.org/software/libc/manual/html_node/Access-Modes.html) naming convention for user ease? @michael-hahn an opinion?

Some combination may be impossible depending on architecture (e.g. x86 do not support write-only memory), but we can put the logic for everything.

Memory leak

There is a memory leak on baf99db

This Bogdan's branch head as of January 7th.

How to reproduce?

Build and install. Reboot. The service should be running.

Execute the following command spaced over time:
sudo systemctl status provbpfd.service

You should notice that the memory increase monotonically.

What to look for?

I looked through the code, there is no obvious alloc not followed by a free.

Test with the different output format: SPADE and W3C. Is it caused in a bug in one of the serializations?
Test with the null output? Is it caused by a bug in how disk writing is being handled?
etc.

Implement graph "compression" strategies

At the moment it seems we are recording everything all the time and not compressing edge/nodes etc.

In https://github.com/tfjmp/camflow-bpf/blob/dev/include/kern/relation.h we need to implement the code corresponding to the following policies:

struct capture_policy {
	// Whether nodes should be compressed into one if possible.
	bool should_compress_node;
	// Whether edges should be compressed into one if possible.
	bool should_compress_edge;
	// every time a relation is recorded the two end nodes will be recorded
	// again if set to true.
	bool should_duplicate; // will probably still be needed by spade
};

There is example code for this in the CamFlow repository. I would also consider simplifying the code in this file if/when possible.

Network family support

At the moment IPv4, IPv6 and Unix family is supported. Potentially extend to support more.

Should follow exactly the template here:
https://github.com/tfjmp/camflow-bpf/blob/dbbf06de719d569af3ead5cde6d07dbffd29f374/include/kern/net.h#L53

Also need to add the corresponding userspace code.

Here for Spade:
https://github.com/tfjmp/camflow-bpf/blob/tfjmp/spade.c#L245

Here for W3C:
https://github.com/tfjmp/camflow-bpf/blob/tfjmp/w3c.c#L559

https://github.com/tfjmp/camflow-bpf/blob/tfjmp/w3c.c#L589

simplify graph element naming convention

We should consider simplifying the graph naming convention to be more intelligible for the average user.

cred -> process
task -> thread
etc.

I will discuss with @michael-hahn and come up with something that does make sense.

Do local storage work?

Running the capture mechanism, when checking if versions are being created:

cat provbpf-2021-03-27_15-18-30.log | grep object_type\":\"task

Example output:

{"type":"Activity","id":"AQAAAAAAAEA6AwAAAAAAAAQAAABJnZVSAAAAAAAAAAA=","annotations": {"object_id":"826","object_type":"task","boot_id":4,"cf:machine_id":"cf:1385536841","version":0,"cf:date":"2021:03:27T15:28:13","cf:taint":"0","cf:jiffies":"0","cf:epoch":0,"pid":3964,"vpid":3964,"utime":"0","stime":"0","vm":"4072","rss":"0","hw_vm":"16320","hw_rss":"4224","secid":0}}
{"type":"Activity","id":"AQAAAAAAAEA8AwAAAAAAAAQAAABJnZVSAAAAAAAAAAA=","annotations": {"object_id":"828","object_type":"task","boot_id":4,"cf:machine_id":"cf:1385536841","version":0,"cf:date":"2021:03:27T15:28:13","cf:taint":"0","cf:jiffies":"0","cf:epoch":0,"pid":3964,"vpid":3964,"utime":"0","stime":"0","vm":"4072","rss":"0","hw_vm":"16320","hw_rss":"4224","secid":0}}
{"type":"Activity","id":"AQAAAAAAAEBIAwAAAAAAAAQAAABJnZVSAAAAAAAAAAA=","annotations": {"object_id":"840","object_type":"task","boot_id":4,"cf:machine_id":"cf:1385536841","version":0,"cf:date":"2021:03:27T15:28:19","cf:taint":"0","cf:jiffies":"0","cf:epoch":0,"pid":3964,"vpid":3964,"utime":"0","stime":"0","vm":"4072","rss":"0","hw_vm":"16320","hw_rss":"4224","secid":0}}
{"type":"Activity","id":"AQAAAAAAAEBKAwAAAAAAAAQAAABJnZVSAAAAAAAAAAA=","annotations": {"object_id":"842","object_type":"task","boot_id":4,"cf:machine_id":"cf:1385536841","version":0,"cf:date":"2021:03:27T15:28:19","cf:taint":"0","cf:jiffies":"0","cf:epoch":0,"pid":3964,"vpid":3964,"utime":"0","stime":"0","vm":"4072","rss":"0","hw_vm":"16320","hw_rss":"4224","secid":0}}

All task seems to have the version set to 0 (which seems very unlikely), while the pid is the same. That seems to indicate that the code currently always return a new provenance object here: https://github.com/tfjmp/camflow-bpf/blob/bfc3279a0a1548abf773ec440ad7c3c797cb659e/include/kern/task.h#L75

If it was working we will not expect to see nodes with different object_id and the same pid/vpid.

The problem is likely to affect all local storages at the moment.

The local storage lifecycle is expected to be that of the associated object (https://lwn.net/Articles/826858/), so we should not be seeing this behaviour.

Deconflating provenance metadata from vertices attributes

We should decouple provenance metadata from vertices attributes.

Provenance metadata:

  • information used to build the graph (was_recorded etc.);
  • IDs (node, machine, boot);

Attributes:

  • task pid, gid etc.
  • inode number etc.

Provenance metadata needs to be kept around as it is now. Attributes need to be accessed only when we are recording the provenance (i.e. writing to the ring buffer).

This will need a significant code refactoring.

Check graph correctness

It would be nice to develop a technique to verify the graph correctness.

Two options:

  • we re-implement selective capture (i.e. we can capture the provenance of a specific program. We build a small one that we can manually check).
  • we implement a mean to query the graph to extract the above from a whole system graph and check that manually.

Handling nested kernel structure

Thomas: I think the way we are handling a number of nested data structure is broken. I am not convinced that copying the pointer address via bpf_probe_read is supposed to be working. It was clearly crashing the kernel on this branch after Soo updated a number of things (with a clear reference to VM unauthorized access). There are two possible solutions:

  • figuring out how to cleanly handle nested structures (e.g. task->cred) @bstelea ;
  • that failing, adding helpers in the kernel @s00y33

It needs a bit of further investigation.
@bstelea could clone the following commit bdcb13c and sync with Soo on how to trigger the kernel crash.

The problem extends beyond cred as the bpf_probe_read on pointers technique is used in other places.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.