Giter Site home page Giter Site logo

transparent-release's Issues

What is the right SLSA Schema Format?

We presently use a JSON schema to specify the shape of the expected SLSA buildType. While I went with a JSON schema, this was done initially just to have some clearly defined format. It doesn't have to be JSON schema specifically and there's a few pros & cons to them:

Pros:

  • Machine readable, hence clearly defined
  • Existing tools for both validation of schemas and for generating example data off them
  • Open standard

Cons:

  • They're like actually quite hard to read for humans
  • We can't actually use them to parse JSON files, only to validate. Hence we require go structs which essentially duplicate the schema (in less detail). See PR #16 for deets.

Thoughts? Ideas? :) cc @rbehjati @tiziano88

[This issue is related to #15, probably a superset of it].

Remove bazel and convert the repo to a standard go project

We now have go.mod and go.sum, and can use most go commands, e.g., for building. Can we now remove all bazel-related files and convert this repo to a standard project?

Currently when running go test ./... some auth-logic tests fail. I think this is because of the genrules that fetch and build the dependencies. Is it possible to replaces those bazel targets with go commands? (go generate ./... might be relevant).

WDYT? @aferr @tiziano88

Refactor Auth Logic Wrappers to Mostly Use Bytes

There are several functions like GetAppNameFromEndorsement that use file paths from which bytes are parsed. The files are read more than once, so it is better to refactor as many calls to operate on bytes as possible and parse from the files into bytes just once.

Refactor to Enforce Principal Name Sanitization

Goal: make sure all principal names are sanitized using the SanitizeName function.

Proposed solution:

  • Make the Principal struct unexported so that its initializer cannot be called outside a wrappers_interface package which just includes wrappers.go. Move the implementations of the specific wrappers into a new package (like transparentReleaseWrappers).
  • Make a factory method for constructing principals that runs sanitization like the following:
func ConstructPrincipal(contents String) {
   return principal{ Contents: sanitizeName(contents) }
}

This way, the only way external packages can make a principal is by calling the constructor.

If making the principal struct unexported does not work, another way to enforce this could be to introduce a SanitizedName type produced by SanitizeName(), and make the Principal initializer take one of these.

Extend provenance file wrapper to read the builder ID

The authorizaiton logic policies will depend on which trusted builder produced the provenance file. The builder ID shows up under the builder predicate here. The other tooling around transparent release does not support this field yet, so this issue depends on adding support for getting this builder field from provenance files.

(information provided by @rbehjati )

Change Wrapper Interface to Return an Error

Right now the interface for wrappers returns just an UnattributedStatement, but because wrappers could throw errors, it would be more idiomatic to return an (UnattributedStatement, Error) pair.

The same is true for the IdentifiableWrappers.

Authorization Logic for Endorsement File Release.

At present, we have an authorization logic tool that runs the client-side verification process. Our tool for doing this produces an authorization logic policy that collects evidence for verifying if a previously released binary "is safe" to be consumed by the client.

In addition to this, we would like tooling that works similarly for parties releasing a software binary. This tooling will be used by the product team releasing a binary instead of the client. It gathers the evidence needed to generate an endorsement file, includes a policy for deciding if the evidence gathered is sufficient, and uses this policy to decide if the release is acceptable. If the tool succeeds, an endorsement file is generated. If it fails, it throws an error.

This is a meta-issue for tracking the work to build the release tool to be used by the product team. Here are the sub-tasks:

Use sha256 instead of sha1 to refer to source code

@waywardgeek pointed out that sha1 should not be used for cryptographic purposes, so we should switch to using sha256 in endorsements of the source code and also provenance generation. I don't know if it's possible to get the sha256 commit id from git, but failing that we can take the repository files, tar them, and sha256 that.

Check signedEntryTimestamp using Rekor's public key in Rekor Log Wrapper

In the comments here, it suggests that one of the steps for verifying a Rekor log entry is to verify the signature in signedEntryTimestamp (which is a part of the struct for a rekor log entry). So presumably the rekor log wrapper should also do this.

@rbehjati Did you happen to already look into what type of signature / format is used for signedEntryTimestamp? I couldn't find any of Rekor's libraries that use this part of the struct. I can try to figure this out, but if you already looked into it and have info that can speed this up that would help.

Write an authorization logic policy for releasing endorsements.

This policy should gather evidence needed to decide if an endorsement file can be released. This evidence includes output from one or more trusted builders such as guithub actions and google cloud build.

Here are some "product requirements":

Different projects might have different needs w.r.t the trusted builders. For example:

  • one team might only trust github actions, but not any other builder
  • another team might be OK with getting output from either builder
  • another team might insist on getting the same output from both builders
  • another team might want the same output from k out of n trusted builders
  • other teams might have different needs entirely

Most teams will not want to write authorization logic code at all because it's based on logic programming, and this is not a common skill set. We should minimize how much they have to interact with this.

Proposed solution:

To meet the needs of different product teams and to reduce interaction with authorization logic, we will write a few "Policy Principals" that each represent a different policy for deciding if the builder output is acceptable. For example, one policy principal will specify "just Github actions is trusted", another will specify that "getting the expected output from either builder is fine", and so on.

These policy principals are sort of like libraries -- they remove the need for product teams to specify these policies themselves. The policy for releasing an endorsement file will include a delegation (using the authorization logic features for delegation) to one of these policy principals. Most of the time, product teams will just change this delegation requiring just a small 1-2 LOC change to the endorsement generation policy. Of course, teams that have more specific needs and are willing to do so can choose not to delegate to one of these policy principals and write the entire policy themselves -- we expect that this will not be a common case though

Some implementation details:

As is the case with the other wrappers, the policy principals and the policy for releasing an endorsement file will be generated using a go template so that variables can be filled in.

Add trusted builder key to Endorsement Release Authorization Logic Policy

From what @rbehjati tells me, the outputs of trusted builders like Github Actions and Google Cloud Build will be signed using a unique keypair for these trusted builders. This signing does not yet exist (even independently of authorization logic), but we should add the relevant keys into the authorization logic policy once the signing is set up.

I think I heard that @mariaschett is working on adding key management for this (?).

Do either of you have more relevant details for this?

Can We Take Hyphens Out of Subject Names of Endorsement Files?

The current format for endorsement files uses a hyphen in the application name:
https://github.com/project-oak/transparent-release/blob/main/schema/amber-endorsement/v1/example.json

has

oak_functions_loader-0f2189703c57845e09d8ab89164a4041c0af0a62

I'm using the app name to define several principal names in the generated auth logic. The authorization logic syntax for principal names and predicate arguments doesn't include -, and I don't want to add it because it will probably conflict with numeric minus, which we might want (and the grammar is also fully whitespace agnostic). Can you instead use a different character, like:

oak_functions_loader_0f2189703c57845e09d8ab89164a4041c0af0a62

or

oak_functions_loader:0f2189703c57845e09d8ab89164a4041c0af0a62

or

oak_functions_loader.0f2189703c57845e09d8ab89164a4041c0af0a62

or

oak_functions_loader0f2189703c57845e09d8ab89164a4041c0af0a62

Possibly simplify `rekor_verifier_policy.auth.tmpl`

At the moment the auth logic code generated by this template includes base facts and rules depending on just these base facts, all attributed to the same principal. We should decide if:

  • splitting up these facts is really useful to make the steps involved in verification clearer, or if:
  • this is just adding unnecessary complexity

Context: link

Sanitized Names Can Collide

Problem: The function SanitizeName can cause name collisions which may not be detected.

Example: If there are two sources FooBar and Foo-Bar this will map both to FooBar when they should have been different.

Potential solution:

Define: s is a sanitizable name (SN) when s != SanitizeName(s) and call SanitizeName(s) its sanitization.
Maintain a map, sanMap from sanitizations to the sanitizable names that generated them
whenever a sanitizable name s is encountered.

Then, whenever a sanitizable name is encountered:

  • if sanMap.get(SanitizeName(s)) is empty, add sanMap[SanitizeName(s) --> s]
  • else: check sanMap.get(s) == s
    • if equal: do nothing
    • if not equal: there is an erroneous name collision, throw error.

Generate endorsement file if authorization logic query passes

We need a way to generate an endorsement file with the right contents if an authorization logic policy query is true (or generate an error if it is not true). We will use this to check if a predicate in the policy for releasing an endorsement file can be proved (is true), and generate the endorsement file if so.

Add builder, invocation and metadata to generated SLSA provenances

The SLSA provenance v0.2 format has a few optional fields, in particular invocation and metadata, that we currently don't fill out in the provenances that we generate.
To be able to fill these fields out when generating the provenances using cmd/build, the input .toml should be adjusted to provide invocation-related information.

In particular, in case of this GitHub workflow the invocation could be filled out as follows:

"invocation.configSource.uri": "https://github.com/project-oak/oak/blob/f8c4c96375ca1d87a74238fc4450a81703d1d147/scripts/generate_provenance"
"invocation.configSource.digest": "<cryptographic digests of the file above>"
"invocation.configSource.entryPoint": ""
"invocation.parameters": "GitHub" 
"invocation.environment": "<info about the GitHub Actions runner>"

More generally, invocation.configSource.entryPoint, invocation.parameters and invocation.environment are optional and can be omitted.

The metadata should be set as follows:

  • metadata.buildStartedOn = The timestamp of when the build started, filled out by cmd/build.
  • metadata.buildFinishedOn = The timestamp of when the build completed, filled out by cmd/build.
  • metadata.completeness and metadata.reproducible: For now we don't fill these out.

The builder.id is another field that should be added as a command line flag, instead of being included in the toml file.

Transparent Release CLI producing different builds than manual Oak builds

Expected Behavior

Running the transparent release CLI on the testdata in https://github.com/project-oak/transparent-release/blob/main/testdata/build.toml yields the same binary hash as building the binary using Oak's ./scripts/docker_run command with the same arguments.

Actual Behavior

Runnning the testdata yields the binary hash of 15dc16c42a4ac9ed77f337a4a3065a63e444c29c18c8cf69d6a6b4ae678dca5c , as expected in the testdata.

Running the same build using ./scripts/docker_run yields a binary with the hash of e4c5930c6c0e81f8d743da1597905df0fa66b0daf69c6bf92a5cbc561c79b127.

Steps to Reproduce the Problem

For testdata: see the expected hash in the testdata, or run the testdata as outlined in this repos readme.

For running the same build with Oak's ./scripts/docker_run:

  1. Clone a fresh copy of the repo
  2. Checkout the commit from testdata
  3. Run ./scripts/docker_run with the arguments provided in testdata's command key.

For connivence, that's this command in a fresh clone: git checkout 0f2189703c57845e09d8ab89164a4041c0af0a62 && ./scripts/docker_run ./scripts/runner build-functions-server && sha256sum ./oak_functions/loader/bin/oak_functions_loader.

Why this is happening

Further investigation is needed, but I'm fairly certain it boils down the fact that this CLI runs docker with different ENV variables and config than ./scripts/docker_run.

In my case, ./scripts/docker_run uses the following docker run command

docker run --rm --tty --env=TERM=xterm-256color --env=BAZEL_REMOTE_CACHE_ENABLED --env=BAZEL_GOOGLE_CREDENTIALS --env=HOST_UID --env=HOST_GID --volume=/usr/local/google/home/julsh/oak-tmp/bazel-cache:/home/docker/.cache/bazel --volume=/usr/local/google/home/julsh/oak-tmp/cargo-cache:/home/docker/.cargo --volume=/usr/local/google/home/julsh/oak-tmp/sccache-cache:/home/docker/.cache/sccache --volume=/usr/local/google/home/julsh/oak-tmp:/workspace --workdir=/workspace --network=host --volume=/var/run/docker.sock:/var/run/docker.sock --group-add=995 --interactive sha256:3ff688de7071c92700eb08462d846f92847f2a8ef33cb20638e52ae4133473c4 ./scripts/fix_docker_user_and_run './scripts/runner build-functions-server'

Whereas this CLI runs

/usr/bin/docker run --volume=/tmp/release/oak:/workspace --workdir=/workspace --rm --tty gcr.io/oak-ci/oak@sha256:53ca44b5889e2265c3ae9e542d7097b7de12ea4c6a33785da8478c7333b9a320 ./scripts/runner build-functions-server

Possible fixes

  • More fully capture the docker env used?
  • Use the CLI for Oak builds in general, or use it when generating provenances? Is it acceptable for provenances capture a different build than the canonical docker build config?

Integrate trusted builder wrappers with endorsement generation policy

For example, make an executable program that can be given a few command line options to wrap the relevant trusted builders, combine this with a supplied endorsement generation policy, run this on the outputs of the trusted builders, and either throw an error or produce an endorsement file accordingly.

Refactor `slsa` package

@jul-sh Following your suggestion in #21 (review), I've added this test. I am not sure if this is the right place for the test though.

I think we could rename the slsa package to amber, and have separate go files in it for provenance and endorsement, and their corresponding tests. WDYT?

Originally posted by @rbehjati in #22 (comment)

Generate test coverage reports

A test coverage report can be generated using go test -cover (see https://tip.golang.org/doc/go1.2#cover). However, we cannot use it directly on the top-level repo, since we cannot run the authorization logic tests without bazel.

A test coverage report can be generated with bazel (https://docs.bazel.build/versions/main/coverage.html). But I was not able to generate html reports using genhtml on output generated by the bazel coverage --combined_report=lcov [target] command.

Improve logging

Currently, we log errors from running the command (exec.Command) in a tempfile.
These files are difficult to access on GitHub, in case, a CI step fails.

Should we instead log everything to console?

The problem with that is that the logs may get very messy and difficult to use.

Refactor Packages for Auth Logic for Transparent Release

Refactor the auth logic for transparent release into the following separate packages:

  • authLogicWrapper -- includes just wrapper_interface.go and is used by all the projects implementing wrappers
  • binaryTransparencyWrappers -- includes the specific wrappers used for this binary transparency verification step (such as unix_epoch_time_wrapper.go). These implement EmitStatment but not Identify
  • binaryTransparencyVerification -- this includes binary_transparency_verification_top_level.go which imports the above wrappers, implements Identify() for each of them, and calls them all to produce the auth logic used to check if the verification step is satisfied.

The idea is that when this is done, the code in the ..._wrapper.go files will be re-usable by different consumers (so not just the veirification process, for example) with different principals attached to the statements they emit. The code for the top-level (which isn't implemented yet) will include all of the Identify() functions along with calls to all the wrappers.

Restore Working Directory in FetchSourcesFromRepo and Verify

cc: @rbehjati

The current implementations of FetchSourcesFromRepo in common/common.go and Verify in verify/verify.go change the working directory but do not restore the working directory to whatever it was before they ran. This could cause issues if these functions are used by callers that expect the working directory not to change. These functions might be more predictable if they are either implemented in a way that does not change the working directory or: save the working directory at the beginning of the call to a temporary variable, change the directory as needed, and then change it back to whatever was stored in the temporary variable.

Explicity start and end validity fields in validityPeriod

"validityPeriod": {
"type": "object",
"required": ["releaseTime"],
"properties": {
"releaseTime": {
"type": "string",
"description": "Timestamp in RFC 3339 format"
},
"expiryTime": {
"type": "string",
"description": "Timestamp in RFC 3339 format"
}
}
}

The releaseTime field seems specific to releases, but the validityPeriod object may well be used for arbitrary time-bounded trusted files, so I think the name is too specific. Also the time of a release may not correspond to the time where it should be trusted. e.g. a release may be made today, but it can only be trusted in the future (e.g. to leave time for reviewers to inspect it).

Additionally, I think we should always require the expiry time; the author of this object may use a time far in the future if necessary.

Finally ,I think it may be useful to use terminology from existing systems, like x509 certs, in which such fields are called NotBefore / NotAfter (see https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.x509certificates.x509certificate2.notbefore?view=net-6.0)

Add Integration Test

@rbehjati pointed out in #9 (comment) that it would be good to have more tests.

We currently have a fair amount of snapshot tests and unit tests and snapshot tests for individual pieces of code, such as sha256 hashing or the parsing of a given config file.

I think it would be good to add an integration test that tests the functionality of our tooling. That is parsing a provenance file, running a build, and verifying the hash. This would be helpful as it tests the functionality of code, as opposed to its specific implementation.

An advantage of integration tests is that they test a lot of code at once. This is a good fit here, as the codebase is relatively small and hence straightforward to debug.

Comparison to the in-toto CLI

@rbehjati suggested tracking this is an issue, to be able to discuss it more here. :)

Basically in-toto has it's own CLI, that is distinct (but adjacent to the in-toto attestation format). https://github.com/in-toto/in-toto. It looks very similar to our CLI, stating that is it used to: "Create layout, run supply chain steps and verify final product".

Command line tool for signing provenances

We need a command line tool (in cmd/sing) for signing provenances.

The command line tool takes as input the path to a file containing the provenance, and signs it.
To be able to sign, the tool must be able to verify the identity of the signer and provision signing keys. For this we plan to use Fulcio. Additional input should be provided to the tool for identifying the identity of the signer. How can we do this?

For now, we only want to sign SLSA provenances, so the tool should first check that the input file is in fact a valid SLSA provenance statement (by parsing the file content as a json string).

Fix quirks with directory and file paths in experimental/auth-logic

The way working directories work in experimental/auth-logic is quirky at the moment.

Most file paths are imported under the assumption that the working directory is the transparent-release directory. This is partly done because it gives us a way to locate all the files and built binaries necessary within bazel.

The part where this is awkward is that the wrapper tests work because provenance_wrapper_test.go changes the working directory to transparent-release and the other wrapper tests are assumed to run after that one, and will fail if this is not true.

This is a bit brittle. However, partly the way this is setup is to work within bazel and partly this is related to the current directory structure, and we are moving away from bazel anwyay #67 and we also need to restructure the modules here eventually #53

Allow specifying docker run flags in the input TOML file

When building binaries, we currently pass some default flags to docker run:

defaultDockerRunFlags := []string{
// TODO(razieh): Check that b.DockerRunFlags does not set similar flags.
// Mount the current working directory to workspace.
fmt.Sprintf("--volume=%s:/workspace", cwd),
"--workdir=/workspace",
// Remove the container file system after the container exits.
"--rm",
// Get a pseudo-tty to the docker container.
// TODO(razieh): We probably don't need it for presubmit.
"--tty"}

For more flexibility, and to ensure the builds are reproducible, we should instead allow users to specify the flags that should be passed to docker run. The flags can either be specified as a comma-separated list directly in the build-config TOML file, or they could be specified in a separate file, with the file path included in the build-config TOML file. Either way, we probably need to be able to parse these flags, and validate them.

Is there a set of flags that we should ban for security?

See also #25.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.