project-oak / transparent-release Goto Github PK

Making transparency normal!

License: Apache License 2.0

Go 99.54% Shell 0.46%

transparent-release's Issues

What is the right SLSA Schema Format?

We presently use a JSON schema to specify the shape of the expected SLSA buildType. While I went with a JSON schema, this was done initially just to have some clearly defined format. It doesn't have to be JSON schema specifically and there's a few pros & cons to them:

Pros:

Machine readable, hence clearly defined
Existing tools for both validation of schemas and for generating example data off them
Open standard

Cons:

They're like actually quite hard to read for humans
We can't actually use them to parse JSON files, only to validate. Hence we require go structs which essentially duplicate the schema (in less detail). See PR #16 for deets.

Thoughts? Ideas? :) cc @rbehjati @tiziano88

[This issue is related to #15, probably a superset of it].

Remove bazel and convert the repo to a standard go project

We now have go.mod and go.sum, and can use most go commands, e.g., for building. Can we now remove all bazel-related files and convert this repo to a standard project?

Currently when running go test ./... some auth-logic tests fail. I think this is because of the genrules that fetch and build the dependencies. Is it possible to replaces those bazel targets with go commands? (go generate ./... might be relevant).

WDYT? @aferr @tiziano88

Add Instructions for preparing a Rekor entry using KMS to generate signatures

There are currently instructions for generating a rekor entry in experimental/auth-logic/test_data/rekor_entry_instructions.md

Instructions that use Google Cloud KMS to do key management for the entry should also be added.

Refactor Auth Logic Wrappers to Mostly Use Bytes

There are several functions like GetAppNameFromEndorsement that use file paths from which bytes are parsed. The files are read more than once, so it is better to refactor as many calls to operate on bytes as possible and parse from the files into bytes just once.

Refactor to Enforce Principal Name Sanitization

Goal: make sure all principal names are sanitized using the SanitizeName function.

Proposed solution:

Make the Principal struct unexported so that its initializer cannot be called outside a wrappers_interface package which just includes wrappers.go. Move the implementations of the specific wrappers into a new package (like transparentReleaseWrappers).
Make a factory method for constructing principals that runs sanitization like the following:

func ConstructPrincipal(contents String) {
   return principal{ Contents: sanitizeName(contents) }
}

This way, the only way external packages can make a principal is by calling the constructor.

If making the principal struct unexported does not work, another way to enforce this could be to introduce a SanitizedName type produced by SanitizeName(), and make the Principal initializer take one of these.

Add actions GitHub actions to validate schema examples against the respective schema

Consider using protobuf instead of json schema

Use Standard Package Layout for Auth Logic Subdirectory.

It should be more like this: https://github.com/golang-standards/project-layout

Extend provenance file wrapper to read the builder ID

The authorizaiton logic policies will depend on which trusted builder produced the provenance file. The builder ID shows up under the builder predicate here. The other tooling around transparent release does not support this field yet, so this issue depends on adding support for getting this builder field from provenance files.

(information provided by @rbehjati )

Use Flag Library (Or Similar) to Parse Command Line Options in Authorization Logic Process

cc @rbehjati

Look into template for implementing some authorization logic wrappers

Look into using templates for generating the text in some of the more complicated wrappers. For example, the one in verifier_wrapper.go.

Change Wrapper Interface to Return an Error

Right now the interface for wrappers returns just an UnattributedStatement, but because wrappers could throw errors, it would be more idiomatic to return an (UnattributedStatement, Error) pair.

The same is true for the IdentifiableWrappers.

Authorization Logic for Endorsement File Release.

At present, we have an authorization logic tool that runs the client-side verification process. Our tool for doing this produces an authorization logic policy that collects evidence for verifying if a previously released binary "is safe" to be consumed by the client.

In addition to this, we would like tooling that works similarly for parties releasing a software binary. This tooling will be used by the product team releasing a binary instead of the client. It gathers the evidence needed to generate an endorsement file, includes a policy for deciding if the evidence gathered is sufficient, and uses this policy to decide if the release is acceptable. If the tool succeeds, an endorsement file is generated. If it fails, it throws an error.

This is a meta-issue for tracking the work to build the release tool to be used by the product team. Here are the sub-tasks:

Use sha256 instead of sha1 to refer to source code

@waywardgeek pointed out that sha1 should not be used for cryptographic purposes, so we should switch to using sha256 in endorsements of the source code and also provenance generation. I don't know if it's possible to get the sha256 commit id from git, but failing that we can take the repository files, tar them, and sha256 that.

Write Wrappers for at least 2 "Trusted Builders"

Examples of two "trusted builders" are:

"Github Actions"
"Google Cloud Build"

Make library for unpacking rekor entries more flexible, if actually needed for our use cases

The current library for unpacking rekor logs here assumes that the log format will be rekord based on the example rekord entry for endorsement files we need. If for some reason we do need to use more than one format, we can make this code more flexible. If we can figure out one format that works for our software, then perhaps we can avoid this to keep our code simpler.

Check signedEntryTimestamp using Rekor's public key in Rekor Log Wrapper

In the comments here, it suggests that one of the steps for verifying a Rekor log entry is to verify the signature in signedEntryTimestamp (which is a part of the struct for a rekor log entry). So presumably the rekor log wrapper should also do this.

@rbehjati Did you happen to already look into what type of signature / format is used for signedEntryTimestamp? I couldn't find any of Rekor's libraries that use this part of the struct. I can try to figure this out, but if you already looked into it and have info that can speed this up that would help.

Implement different types of verifiers

We need to support additional types of verifiers:

A metadata verifier, similar to https://github.com/slsa-framework/slsa-verifier
Signature verifier

It should be possible to combine verifiers. For instance to build a verifier that both checks metadata and verifies signatures.

Refactor Codebase to work with SLSA files instead of our own TOML format

Use types from in-toto

Currently we have our own definition of in-toto types. We should be able to use the types defined in https://github.com/in-toto/in-toto-golang/tree/master/in_toto.

Write an authorization logic policy for releasing endorsements.

This policy should gather evidence needed to decide if an endorsement file can be released. This evidence includes output from one or more trusted builders such as guithub actions and google cloud build.

Here are some "product requirements":

Different projects might have different needs w.r.t the trusted builders. For example:

one team might only trust github actions, but not any other builder
another team might be OK with getting output from either builder
another team might insist on getting the same output from both builders
another team might want the same output from k out of n trusted builders
other teams might have different needs entirely

Most teams will not want to write authorization logic code at all because it's based on logic programming, and this is not a common skill set. We should minimize how much they have to interact with this.

Proposed solution:

To meet the needs of different product teams and to reduce interaction with authorization logic, we will write a few "Policy Principals" that each represent a different policy for deciding if the builder output is acceptable. For example, one policy principal will specify "just Github actions is trusted", another will specify that "getting the expected output from either builder is fine", and so on.

These policy principals are sort of like libraries -- they remove the need for product teams to specify these policies themselves. The policy for releasing an endorsement file will include a delegation (using the authorization logic features for delegation) to one of these policy principals. Most of the time, product teams will just change this delegation requiring just a small 1-2 LOC change to the endorsement generation policy. Of course, teams that have more specific needs and are willing to do so can choose not to delegate to one of these policy principals and write the entire policy themselves -- we expect that this will not be a common case though

Some implementation details:

As is the case with the other wrappers, the policy principals and the policy for releasing an endorsement file will be generated using a go template so that variables can be filled in.

Add trusted builder key to Endorsement Release Authorization Logic Policy

From what @rbehjati tells me, the outputs of trusted builders like Github Actions and Google Cloud Build will be signed using a unique keypair for these trusted builders. This signing does not yet exist (even independently of authorization logic), but we should add the relevant keys into the authorization logic policy once the signing is set up.

I think I heard that @mariaschett is working on adding key management for this (?).

Do either of you have more relevant details for this?

Can We Take Hyphens Out of Subject Names of Endorsement Files?

The current format for endorsement files uses a hyphen in the application name:
https://github.com/project-oak/transparent-release/blob/main/schema/amber-endorsement/v1/example.json

has

oak_functions_loader-0f2189703c57845e09d8ab89164a4041c0af0a62

I'm using the app name to define several principal names in the generated auth logic. The authorization logic syntax for principal names and predicate arguments doesn't include -, and I don't want to add it because it will probably conflict with numeric minus, which we might want (and the grammar is also fully whitespace agnostic). Can you instead use a different character, like:

oak_functions_loader_0f2189703c57845e09d8ab89164a4041c0af0a62

oak_functions_loader:0f2189703c57845e09d8ab89164a4041c0af0a62

oak_functions_loader.0f2189703c57845e09d8ab89164a4041c0af0a62

oak_functions_loader0f2189703c57845e09d8ab89164a4041c0af0a62

Check hash in rekor log entries matches hash of endorsement file in wrapper

To be able to test this, we need a new Rekor log entry that matches the unexpired endorsement file used in tests.

Possibly simplify `rekor_verifier_policy.auth.tmpl`

At the moment the auth logic code generated by this template includes base facts and rules depending on just these base facts, all attributed to the same principal. We should decide if:

splitting up these facts is really useful to make the steps involved in verification clearer, or if:
this is just adding unnecessary complexity

Context: link

Sanitized Names Can Collide

Problem: The function SanitizeName can cause name collisions which may not be detected.

Example: If there are two sources FooBar and Foo-Bar this will map both to FooBar when they should have been different.

Potential solution:

Define: s is a sanitizable name (SN) when s != SanitizeName(s) and call SanitizeName(s) its sanitization.
Maintain a map, sanMap from sanitizations to the sanitizable names that generated them
whenever a sanitizable name s is encountered.

Then, whenever a sanitizable name is encountered:

if sanMap.get(SanitizeName(s)) is empty, add sanMap[SanitizeName(s) --> s]
else: check sanMap.get(s) == s
- if equal: do nothing
- if not equal: there is an erroneous name collision, throw error.

Generate endorsement file if authorization logic query passes

We need a way to generate an endorsement file with the right contents if an authorization logic policy query is true (or generate an error if it is not true). We will use this to check if a predicate in the policy for releasing an endorsement file can be proved (is true), and generate the endorsement file if so.

Add builder, invocation and metadata to generated SLSA provenances

The SLSA provenance v0.2 format has a few optional fields, in particular invocation and metadata, that we currently don't fill out in the provenances that we generate.
To be able to fill these fields out when generating the provenances using cmd/build, the input .toml should be adjusted to provide invocation-related information.

In particular, in case of this GitHub workflow the invocation could be filled out as follows:

"invocation.configSource.uri": "https://github.com/project-oak/oak/blob/f8c4c96375ca1d87a74238fc4450a81703d1d147/scripts/generate_provenance"
"invocation.configSource.digest": "<cryptographic digests of the file above>"
"invocation.configSource.entryPoint": ""
"invocation.parameters": "GitHub" 
"invocation.environment": "<info about the GitHub Actions runner>"

More generally, invocation.configSource.entryPoint, invocation.parameters and invocation.environment are optional and can be omitted.

The metadata should be set as follows:

metadata.buildStartedOn = The timestamp of when the build started, filled out by cmd/build.
metadata.buildFinishedOn = The timestamp of when the build completed, filled out by cmd/build.
metadata.completeness and metadata.reproducible: For now we don't fill these out.

The builder.id is another field that should be added as a command line flag, instead of being included in the toml file.

Transparent Release CLI producing different builds than manual Oak builds

Expected Behavior

Running the transparent release CLI on the testdata in https://github.com/project-oak/transparent-release/blob/main/testdata/build.toml yields the same binary hash as building the binary using Oak's ./scripts/docker_run command with the same arguments.

Actual Behavior

Runnning the testdata yields the binary hash of 15dc16c42a4ac9ed77f337a4a3065a63e444c29c18c8cf69d6a6b4ae678dca5c , as expected in the testdata.

Running the same build using ./scripts/docker_run yields a binary with the hash of e4c5930c6c0e81f8d743da1597905df0fa66b0daf69c6bf92a5cbc561c79b127.

Steps to Reproduce the Problem

For testdata: see the expected hash in the testdata, or run the testdata as outlined in this repos readme.

For running the same build with Oak's ./scripts/docker_run:

Clone a fresh copy of the repo
Checkout the commit from testdata
Run ./scripts/docker_run with the arguments provided in testdata's command key.

For connivence, that's this command in a fresh clone: git checkout 0f2189703c57845e09d8ab89164a4041c0af0a62 && ./scripts/docker_run ./scripts/runner build-functions-server && sha256sum ./oak_functions/loader/bin/oak_functions_loader.

Why this is happening

Further investigation is needed, but I'm fairly certain it boils down the fact that this CLI runs docker with different ENV variables and config than ./scripts/docker_run.

In my case, ./scripts/docker_run uses the following docker run command

docker run --rm --tty --env=TERM=xterm-256color --env=BAZEL_REMOTE_CACHE_ENABLED --env=BAZEL_GOOGLE_CREDENTIALS --env=HOST_UID --env=HOST_GID --volume=/usr/local/google/home/julsh/oak-tmp/bazel-cache:/home/docker/.cache/bazel --volume=/usr/local/google/home/julsh/oak-tmp/cargo-cache:/home/docker/.cargo --volume=/usr/local/google/home/julsh/oak-tmp/sccache-cache:/home/docker/.cache/sccache --volume=/usr/local/google/home/julsh/oak-tmp:/workspace --workdir=/workspace --network=host --volume=/var/run/docker.sock:/var/run/docker.sock --group-add=995 --interactive sha256:3ff688de7071c92700eb08462d846f92847f2a8ef33cb20638e52ae4133473c4 ./scripts/fix_docker_user_and_run './scripts/runner build-functions-server'

Whereas this CLI runs

/usr/bin/docker run --volume=/tmp/release/oak:/workspace --workdir=/workspace --rm --tty gcr.io/oak-ci/oak@sha256:53ca44b5889e2265c3ae9e542d7097b7de12ea4c6a33785da8478c7333b9a320 ./scripts/runner build-functions-server

Possible fixes

More fully capture the docker env used?
Use the CLI for Oak builds in general, or use it when generating provenances? Is it acceptable for provenances capture a different build than the canonical docker build config?

Automatically copy over output of endorsement-release binary into output_policy_examples/...

Add development guidelines

We need at least some guidelines related to installing mcpp, see #32.

Write "Getting Started with Transparent Release for My Repo"

We want to write a guide for the maintainer of a repo to get started with Transparent Release.

Integrate trusted builder wrappers with endorsement generation policy

For example, make an executable program that can be given a few command line options to wrap the relevant trusted builders, combine this with a supplied endorsement generation policy, run this on the outputs of the trusted builders, and either throw an error or produce an endorsement file accordingly.

The input toml file should contain the name of released binary

This will be used as the name of the subject when generating the Amber provenance file. See #50.

Implementation of endorsement structs in Go

In the schema, we currently have the time fields as strings, in the Go implementation we could use time.Time for these fields. This may require implementing a custom serializer, but would allow time comparisons directly on this struct.

See the CustomMarshalJSON example in https://pkg.go.dev/encoding/json

Related issue: #30.

Refactor `slsa` package

@jul-sh Following your suggestion in #21 (review), I've added this test. I am not sure if this is the right place for the test though.

I think we could rename the slsa package to amber, and have separate go files in it for provenance and endorsement, and their corresponding tests. WDYT?

Originally posted by @rbehjati in #22 (comment)

Exposing the verify functionality

The code needed to parse and verify SLSA files exists, but the functionality is not yet accessible from the command-line.

Regardless of the high level approach wrt to SLSA, TOML etc, I definitely need that functionality for the plan of using this tool outlined in: project-oak/oak#2519 (comment)

How should we move forward with this? :) cc @rbehjati @tiziano88

Generate test coverage reports

A test coverage report can be generated using go test -cover (see https://tip.golang.org/doc/go1.2#cover). However, we cannot use it directly on the top-level repo, since we cannot run the authorization logic tests without bazel.

A test coverage report can be generated with bazel (https://docs.bazel.build/versions/main/coverage.html). But I was not able to generate html reports using genhtml on output generated by the bazel coverage --combined_report=lcov [target] command.

Improve logging

Currently, we log errors from running the command (exec.Command) in a tempfile.
These files are difficult to access on GitHub, in case, a CI step fails.

Should we instead log everything to console?

The problem with that is that the logs may get very messy and difficult to use.

Refactor Packages for Auth Logic for Transparent Release

Refactor the auth logic for transparent release into the following separate packages:

authLogicWrapper -- includes just wrapper_interface.go and is used by all the projects implementing wrappers
binaryTransparencyWrappers -- includes the specific wrappers used for this binary transparency verification step (such as unix_epoch_time_wrapper.go). These implement EmitStatment but not Identify
binaryTransparencyVerification -- this includes binary_transparency_verification_top_level.go which imports the above wrappers, implements Identify() for each of them, and calls them all to produce the auth logic used to check if the verification step is satisfied.

The idea is that when this is done, the code in the ..._wrapper.go files will be re-usable by different consumers (so not just the veirification process, for example) with different principals attached to the statements they emit. The code for the top-level (which isn't implemented yet) will include all of the Identify() functions along with calls to all the wrappers.

Restore Working Directory in FetchSourcesFromRepo and Verify

cc: @rbehjati

The current implementations of FetchSourcesFromRepo in common/common.go and Verify in verify/verify.go change the working directory but do not restore the working directory to whatever it was before they ran. This could cause issues if these functions are used by callers that expect the working directory not to change. These functions might be more predictable if they are either implemented in a way that does not change the working directory or: save the working directory at the beginning of the call to a temporary variable, change the directory as needed, and then change it back to whatever was stored in the temporary variable.

Choose some conventions for writing authorization logic and make the auth logic code consistent with this convention.

For example, should predicates have snake or camel case? How about predicate arguments? When are the first letters capitalized?

Explicity start and end validity fields in validityPeriod

transparent-release/schema/amber-endorsement/v1/predicate.json

Lines 8 to 21 in 4c5e62a

    
           "validityPeriod": { 
        
             "type": "object", 
        
             "required": ["releaseTime"], 
        
             "properties": { 
        
               "releaseTime": { 
        
                 "type": "string", 
        
                 "description": "Timestamp in RFC 3339 format" 
        
               }, 
        
               "expiryTime": { 
        
                 "type": "string", 
        
                 "description": "Timestamp in RFC 3339 format" 
        
               } 
        
             } 
        
           }

The releaseTime field seems specific to releases, but the validityPeriod object may well be used for arbitrary time-bounded trusted files, so I think the name is too specific. Also the time of a release may not correspond to the time where it should be trusted. e.g. a release may be made today, but it can only be trusted in the future (e.g. to leave time for reviewers to inspect it).

Additionally, I think we should always require the expiry time; the author of this object may use a time far in the future if necessary.

Finally ,I think it may be useful to use terminology from existing systems, like x509 certs, in which such fields are called NotBefore / NotAfter (see https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.x509certificates.x509certificate2.notbefore?view=net-6.0)

Add Integration Test

@rbehjati pointed out in #9 (comment) that it would be good to have more tests.

We currently have a fair amount of snapshot tests and unit tests and snapshot tests for individual pieces of code, such as sha256 hashing or the parsing of a given config file.

I think it would be good to add an integration test that tests the functionality of our tooling. That is parsing a provenance file, running a build, and verifying the hash. This would be helpful as it tests the functionality of code, as opposed to its specific implementation.

An advantage of integration tests is that they test a lot of code at once. This is a good fit here, as the codebase is relatively small and hence straightforward to debug.

Consider changing the "Endorsement" principal to one representing a product team

Here is more context around this: link

The product team is also related to a cryptographic key, and that should be involved as well.

Comparison to the in-toto CLI

@rbehjati suggested tracking this is an issue, to be able to discuss it more here. :)

Basically in-toto has it's own CLI, that is distinct (but adjacent to the in-toto attestation format). https://github.com/in-toto/in-toto. It looks very similar to our CLI, stating that is it used to: "Create layout, run supply chain steps and verify final product".

Add Library Function for Concatenation

Because auth logic statements are often concatenated, add a library function for doing this, like:

func ConcatenateStatements(s []Statement) string { .. }

cc: @tiziano88

Command line tool for signing provenances

We need a command line tool (in cmd/sing) for signing provenances.

The command line tool takes as input the path to a file containing the provenance, and signs it.
To be able to sign, the tool must be able to verify the identity of the signer and provision signing keys. For this we plan to use Fulcio. Additional input should be provided to the tool for identifying the identity of the signer. How can we do this?

For now, we only want to sign SLSA provenances, so the tool should first check that the input file is in fact a valid SLSA provenance statement (by parsing the file content as a json string).

Fix quirks with directory and file paths in experimental/auth-logic

The way working directories work in experimental/auth-logic is quirky at the moment.

Most file paths are imported under the assumption that the working directory is the transparent-release directory. This is partly done because it gives us a way to locate all the files and built binaries necessary within bazel.

The part where this is awkward is that the wrapper tests work because provenance_wrapper_test.go changes the working directory to transparent-release and the other wrapper tests are assumed to run after that one, and will fail if this is not true.

This is a bit brittle. However, partly the way this is setup is to work within bazel and partly this is related to the current directory structure, and we are moving away from bazel anwyay #67 and we also need to restructure the modules here eventually #53

Allow specifying docker run flags in the input TOML file

When building binaries, we currently pass some default flags to docker run:

transparent-release/common/common.go

Lines 170 to 179 in 2afb748

    
           defaultDockerRunFlags := []string{ 
        
           	// TODO(razieh): Check that b.DockerRunFlags does not set similar flags. 
        
           	// Mount the current working directory to workspace. 
        
           	fmt.Sprintf("--volume=%s:/workspace", cwd), 
        
           	"--workdir=/workspace", 
        
           	// Remove the container file system after the container exits. 
        
           	"--rm", 
        
           	// Get a pseudo-tty to the docker container. 
        
           	// TODO(razieh): We probably don't need it for presubmit. 
        
           	"--tty"}

For more flexibility, and to ensure the builds are reproducible, we should instead allow users to specify the flags that should be passed to docker run. The flags can either be specified as a comma-separated list directly in the build-config TOML file, or they could be specified in a separate file, with the file path included in the build-config TOML file. Either way, we probably need to be able to parse these flags, and validate them.

Is there a set of flags that we should ban for security?

	"validityPeriod": {
	"type": "object",
	"required": ["releaseTime"],
	"properties": {
	"releaseTime": {
	"type": "string",
	"description": "Timestamp in RFC 3339 format"
	},
	"expiryTime": {
	"type": "string",
	"description": "Timestamp in RFC 3339 format"
	}
	}
	}

	defaultDockerRunFlags := []string{
	// TODO(razieh): Check that b.DockerRunFlags does not set similar flags.
	// Mount the current working directory to workspace.
	fmt.Sprintf("--volume=%s:/workspace", cwd),
	"--workdir=/workspace",
	// Remove the container file system after the container exits.
	"--rm",
	// Get a pseudo-tty to the docker container.
	// TODO(razieh): We probably don't need it for presubmit.
	"--tty"}

project-oak / transparent-release Goto Github PK

transparent-release's Issues

Here are some "product requirements":

Proposed solution:

Some implementation details:

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Why this is happening

Possible fixes

Recommend Projects

Recommend Topics

Recommend Org