project-oak / transparent-release Goto Github PK
View Code? Open in Web Editor NEWMaking transparency normal!
License: Apache License 2.0
Making transparency normal!
License: Apache License 2.0
transparent-release/schema/amber-endorsement/v1/predicate.json
Lines 8 to 21 in 4c5e62a
The releaseTime
field seems specific to releases, but the validityPeriod object may well be used for arbitrary time-bounded trusted files, so I think the name is too specific. Also the time of a release may not correspond to the time where it should be trusted. e.g. a release may be made today, but it can only be trusted in the future (e.g. to leave time for reviewers to inspect it).
Additionally, I think we should always require the expiry time; the author of this object may use a time far in the future if necessary.
Finally ,I think it may be useful to use terminology from existing systems, like x509 certs, in which such fields are called NotBefore / NotAfter (see https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.x509certificates.x509certificate2.notbefore?view=net-6.0)
This policy should gather evidence needed to decide if an endorsement file can be released. This evidence includes output from one or more trusted builders such as guithub actions and google cloud build.
Different projects might have different needs w.r.t the trusted builders. For example:
Most teams will not want to write authorization logic code at all because it's based on logic programming, and this is not a common skill set. We should minimize how much they have to interact with this.
To meet the needs of different product teams and to reduce interaction with authorization logic, we will write a few "Policy Principals" that each represent a different policy for deciding if the builder output is acceptable. For example, one policy principal will specify "just Github actions is trusted", another will specify that "getting the expected output from either builder is fine", and so on.
These policy principals are sort of like libraries -- they remove the need for product teams to specify these policies themselves. The policy for releasing an endorsement file will include a delegation (using the authorization logic features for delegation) to one of these policy principals. Most of the time, product teams will just change this delegation requiring just a small 1-2 LOC change to the endorsement generation policy. Of course, teams that have more specific needs and are willing to do so can choose not to delegate to one of these policy principals and write the entire policy themselves -- we expect that this will not be a common case though
As is the case with the other wrappers, the policy principals and the policy for releasing an endorsement file will be generated using a go template so that variables can be filled in.
This will be used as the name of the subject when generating the Amber provenance file. See #50.
We need to support additional types of verifiers:
It should be possible to combine verifiers. For instance to build a verifier that both checks metadata and verifies signatures.
For example, make an executable program that can be given a few command line options to wrap the relevant trusted builders, combine this with a supplied endorsement generation policy, run this on the outputs of the trusted builders, and either throw an error or produce an endorsement file accordingly.
We need a command line tool (in cmd/sing
) for signing provenances.
The command line tool takes as input the path to a file containing the provenance, and signs it.
To be able to sign, the tool must be able to verify the identity of the signer and provision signing keys. For this we plan to use Fulcio. Additional input should be provided to the tool for identifying the identity of the signer. How can we do this?
For now, we only want to sign SLSA provenances, so the tool should first check that the input file is in fact a valid SLSA provenance statement (by parsing the file content as a json string).
The code needed to parse and verify SLSA files exists, but the functionality is not yet accessible from the command-line.
Regardless of the high level approach wrt to SLSA, TOML etc, I definitely need that functionality for the plan of using this tool outlined in: project-oak/oak#2519 (comment)
How should we move forward with this? :) cc @rbehjati @tiziano88
Currently, we log errors from running the command (exec.Command) in a tempfile.
These files are difficult to access on GitHub, in case, a CI step fails.
Should we instead log everything to console?
The problem with that is that the logs may get very messy and difficult to use.
At the moment the auth logic code generated by this template includes base facts and rules depending on just these base facts, all attributed to the same principal. We should decide if:
Context: link
In the schema, we currently have the time fields as strings, in the Go implementation we could use time.Time
for these fields. This may require implementing a custom serializer, but would allow time comparisons directly on this struct.
See the CustomMarshalJSON example in https://pkg.go.dev/encoding/json
Related issue: #30.
Because auth logic statements are often concatenated, add a library function for doing this, like:
func ConcatenateStatements(s []Statement) string { .. }
cc: @tiziano88
@waywardgeek pointed out that sha1 should not be used for cryptographic purposes, so we should switch to using sha256 in endorsements of the source code and also provenance generation. I don't know if it's possible to get the sha256 commit id from git, but failing that we can take the repository files, tar them, and sha256 that.
It should be more like this: https://github.com/golang-standards/project-layout
cc @rbehjati
Currently we have our own definition of in-toto types. We should be able to use the types defined in https://github.com/in-toto/in-toto-golang/tree/master/in_toto.
There are several functions like GetAppNameFromEndorsement
that use file paths from which bytes are parsed. The files are read more than once, so it is better to refactor as many calls to operate on bytes as possible and parse from the files into bytes just once.
We need at least some guidelines related to installing mcpp
, see #32.
In the comments here, it suggests that one of the steps for verifying a Rekor log entry is to verify the signature in signedEntryTimestamp
(which is a part of the struct for a rekor log entry). So presumably the rekor log wrapper should also do this.
@rbehjati Did you happen to already look into what type of signature / format is used for signedEntryTimestamp
? I couldn't find any of Rekor's libraries that use this part of the struct. I can try to figure this out, but if you already looked into it and have info that can speed this up that would help.
cc: @rbehjati
The current implementations of FetchSourcesFromRepo
in common/common.go
and Verify
in verify/verify.go
change the working directory but do not restore the working directory to whatever it was before they ran. This could cause issues if these functions are used by callers that expect the working directory not to change. These functions might be more predictable if they are either implemented in a way that does not change the working directory or: save the working directory at the beginning of the call to a temporary variable, change the directory as needed, and then change it back to whatever was stored in the temporary variable.
@rbehjati pointed out in #9 (comment) that it would be good to have more tests.
We currently have a fair amount of snapshot tests and unit tests and snapshot tests for individual pieces of code, such as sha256 hashing or the parsing of a given config file.
I think it would be good to add an integration test that tests the functionality of our tooling. That is parsing a provenance file, running a build, and verifying the hash. This would be helpful as it tests the functionality of code, as opposed to its specific implementation.
An advantage of integration tests is that they test a lot of code at once. This is a good fit here, as the codebase is relatively small and hence straightforward to debug.
Refactor the auth logic for transparent release into the following separate packages:
authLogicWrapper
-- includes just wrapper_interface.go
and is used by all the projects implementing wrappersbinaryTransparencyWrappers
-- includes the specific wrappers used for this binary transparency verification step (such as unix_epoch_time_wrapper.go
). These implement EmitStatment
but not Identify
binaryTransparencyVerification
-- this includes binary_transparency_verification_top_level.go
which imports the above wrappers, implements Identify()
for each of them, and calls them all to produce the auth logic used to check if the verification step is satisfied.The idea is that when this is done, the code in the ..._wrapper.go
files will be re-usable by different consumers (so not just the veirification process, for example) with different principals attached to the statements they emit. The code for the top-level (which isn't implemented yet) will include all of the Identify()
functions along with calls to all the wrappers.
We now have go.mod
and go.sum
, and can use most go
commands, e.g., for building. Can we now remove all bazel-related files and convert this repo to a standard project?
Currently when running go test ./...
some auth-logic
tests fail. I think this is because of the genrules that fetch and build the dependencies. Is it possible to replaces those bazel targets with go commands? (go generate ./...
might be relevant).
WDYT? @aferr @tiziano88
Here is more context around this: link
The product team is also related to a cryptographic key, and that should be involved as well.
Problem: The function SanitizeName
can cause name collisions which may not be detected.
Example: If there are two sources FooBar
and Foo-Bar
this will map both to FooBar
when they should have been different.
Potential solution:
Define: s is a sanitizable name (SN) when s != SanitizeName(s) and call SanitizeName(s) its sanitization.
Maintain a map, sanMap
from sanitizations to the sanitizable names that generated them
whenever a sanitizable name s is encountered.
Then, whenever a sanitizable name is encountered:
The way working directories work in experimental/auth-logic is quirky at the moment.
Most file paths are imported under the assumption that the working directory is the transparent-release
directory. This is partly done because it gives us a way to locate all the files and built binaries necessary within bazel.
The part where this is awkward is that the wrapper tests work because provenance_wrapper_test.go
changes the working directory to transparent-release
and the other wrapper tests are assumed to run after that one, and will fail if this is not true.
This is a bit brittle. However, partly the way this is setup is to work within bazel and partly this is related to the current directory structure, and we are moving away from bazel anwyay #67 and we also need to restructure the modules here eventually #53
Right now the interface for wrappers returns just an UnattributedStatement, but because wrappers could throw errors, it would be more idiomatic to return an (UnattributedStatement, Error) pair.
The same is true for the IdentifiableWrappers.
@rbehjati suggested tracking this is an issue, to be able to discuss it more here. :)
Basically in-toto has it's own CLI, that is distinct (but adjacent to the in-toto attestation format). https://github.com/in-toto/in-toto. It looks very similar to our CLI, stating that is it used to: "Create layout, run supply chain steps and verify final product".
To be able to test this, we need a new Rekor log entry that matches the unexpired endorsement file used in tests.
There are currently instructions for generating a rekor entry in experimental/auth-logic/test_data/rekor_entry_instructions.md
Instructions that use Google Cloud KMS to do key management for the entry should also be added.
See also this context
We need a way to generate an endorsement file with the right contents if an authorization logic policy query is true (or generate an error if it is not true). We will use this to check if a predicate in the policy for releasing an endorsement file can be proved (is true), and generate the endorsement file if so.
We presently use a JSON schema to specify the shape of the expected SLSA buildType. While I went with a JSON schema, this was done initially just to have some clearly defined format. It doesn't have to be JSON schema specifically and there's a few pros & cons to them:
Pros:
Cons:
Thoughts? Ideas? :) cc @rbehjati @tiziano88
[This issue is related to #15, probably a superset of it].
When building binaries, we currently pass some default flags to docker run
:
transparent-release/common/common.go
Lines 170 to 179 in 2afb748
For more flexibility, and to ensure the builds are reproducible, we should instead allow users to specify the flags that should be passed to docker run
. The flags can either be specified as a comma-separated list directly in the build-config TOML file, or they could be specified in a separate file, with the file path included in the build-config TOML file. Either way, we probably need to be able to parse these flags, and validate them.
Is there a set of flags that we should ban for security?
See also #25.
The authorizaiton logic policies will depend on which trusted builder produced the provenance file. The builder ID shows up under the builder
predicate here. The other tooling around transparent release does not support this field yet, so this issue depends on adding support for getting this builder field from provenance files.
(information provided by @rbehjati )
@jul-sh Following your suggestion in #21 (review), I've added this test. I am not sure if this is the right place for the test though.
I think we could rename the slsa
package to amber
, and have separate go files in it for provenance and endorsement, and their corresponding tests. WDYT?
Originally posted by @rbehjati in #22 (comment)
The current library for unpacking rekor logs here assumes that the log format will be rekord
based on the example rekord entry for endorsement files we need. If for some reason we do need to use more than one format, we can make this code more flexible. If we can figure out one format that works for our software, then perhaps we can avoid this to keep our code simpler.
Look into using templates for generating the text in some of the more complicated wrappers. For example, the one in verifier_wrapper.go
.
Examples of two "trusted builders" are:
The current format for endorsement files uses a hyphen in the application name:
https://github.com/project-oak/transparent-release/blob/main/schema/amber-endorsement/v1/example.json
has
oak_functions_loader-0f2189703c57845e09d8ab89164a4041c0af0a62
I'm using the app name to define several principal names in the generated auth logic. The authorization logic syntax for principal names and predicate arguments doesn't include -
, and I don't want to add it because it will probably conflict with numeric minus, which we might want (and the grammar is also fully whitespace agnostic). Can you instead use a different character, like:
oak_functions_loader_0f2189703c57845e09d8ab89164a4041c0af0a62
or
oak_functions_loader:0f2189703c57845e09d8ab89164a4041c0af0a62
or
oak_functions_loader.0f2189703c57845e09d8ab89164a4041c0af0a62
or
oak_functions_loader0f2189703c57845e09d8ab89164a4041c0af0a62
At present, we have an authorization logic tool that runs the client-side verification process. Our tool for doing this produces an authorization logic policy that collects evidence for verifying if a previously released binary "is safe" to be consumed by the client.
In addition to this, we would like tooling that works similarly for parties releasing a software binary. This tooling will be used by the product team releasing a binary instead of the client. It gathers the evidence needed to generate an endorsement file, includes a policy for deciding if the evidence gathered is sufficient, and uses this policy to decide if the release is acceptable. If the tool succeeds, an endorsement file is generated. If it fails, it throws an error.
This is a meta-issue for tracking the work to build the release tool to be used by the product team. Here are the sub-tasks:
The SLSA provenance v0.2 format has a few optional fields, in particular invocation
and metadata
, that we currently don't fill out in the provenances that we generate.
To be able to fill these fields out when generating the provenances using cmd/build
, the input .toml
should be adjusted to provide invocation
-related information.
In particular, in case of this GitHub workflow the invocation
could be filled out as follows:
"invocation.configSource.uri": "https://github.com/project-oak/oak/blob/f8c4c96375ca1d87a74238fc4450a81703d1d147/scripts/generate_provenance"
"invocation.configSource.digest": "<cryptographic digests of the file above>"
"invocation.configSource.entryPoint": ""
"invocation.parameters": "GitHub"
"invocation.environment": "<info about the GitHub Actions runner>"
More generally, invocation.configSource.entryPoint
, invocation.parameters
and invocation.environment
are optional and can be omitted.
The metadata
should be set as follows:
metadata.buildStartedOn
= The timestamp of when the build started, filled out by cmd/build
.metadata.buildFinishedOn
= The timestamp of when the build completed, filled out by cmd/build
.metadata.completeness
and metadata.reproducible
: For now we don't fill these out.The builder.id
is another field that should be added as a command line flag, instead of being included in the toml file.
From what @rbehjati tells me, the outputs of trusted builders like Github Actions and Google Cloud Build will be signed using a unique keypair for these trusted builders. This signing does not yet exist (even independently of authorization logic), but we should add the relevant keys into the authorization logic policy once the signing is set up.
I think I heard that @mariaschett is working on adding key management for this (?).
Do either of you have more relevant details for this?
For example, should predicates have snake or camel case? How about predicate arguments? When are the first letters capitalized?
Goal: make sure all principal names are sanitized using the SanitizeName
function.
Proposed solution:
Principal
struct unexported so that its initializer cannot be called outside a wrappers_interface
package which just includes wrappers.go
. Move the implementations of the specific wrappers into a new package (like transparentReleaseWrappers
).func ConstructPrincipal(contents String) {
return principal{ Contents: sanitizeName(contents) }
}
This way, the only way external packages can make a principal is by calling the constructor.
If making the principal struct unexported does not work, another way to enforce this could be to introduce a SanitizedName
type produced by SanitizeName()
, and make the Principal
initializer take one of these.
We want to write a guide for the maintainer of a repo to get started with Transparent Release.
Running the transparent release CLI on the testdata in https://github.com/project-oak/transparent-release/blob/main/testdata/build.toml yields the same binary hash as building the binary using Oak's ./scripts/docker_run
command with the same arguments.
Runnning the testdata yields the binary hash of 15dc16c42a4ac9ed77f337a4a3065a63e444c29c18c8cf69d6a6b4ae678dca5c
, as expected in the testdata.
Running the same build using ./scripts/docker_run
yields a binary with the hash of e4c5930c6c0e81f8d743da1597905df0fa66b0daf69c6bf92a5cbc561c79b127
.
For testdata: see the expected hash in the testdata, or run the testdata as outlined in this repos readme.
For running the same build with Oak's ./scripts/docker_run
:
./scripts/docker_run
with the arguments provided in testdata's command key.For connivence, that's this command in a fresh clone: git checkout 0f2189703c57845e09d8ab89164a4041c0af0a62 && ./scripts/docker_run ./scripts/runner build-functions-server && sha256sum ./oak_functions/loader/bin/oak_functions_loader
.
Further investigation is needed, but I'm fairly certain it boils down the fact that this CLI runs docker with different ENV variables and config than ./scripts/docker_run
.
In my case, ./scripts/docker_run
uses the following docker run command
docker run --rm --tty --env=TERM=xterm-256color --env=BAZEL_REMOTE_CACHE_ENABLED --env=BAZEL_GOOGLE_CREDENTIALS --env=HOST_UID --env=HOST_GID --volume=/usr/local/google/home/julsh/oak-tmp/bazel-cache:/home/docker/.cache/bazel --volume=/usr/local/google/home/julsh/oak-tmp/cargo-cache:/home/docker/.cargo --volume=/usr/local/google/home/julsh/oak-tmp/sccache-cache:/home/docker/.cache/sccache --volume=/usr/local/google/home/julsh/oak-tmp:/workspace --workdir=/workspace --network=host --volume=/var/run/docker.sock:/var/run/docker.sock --group-add=995 --interactive sha256:3ff688de7071c92700eb08462d846f92847f2a8ef33cb20638e52ae4133473c4 ./scripts/fix_docker_user_and_run './scripts/runner build-functions-server'
Whereas this CLI runs
/usr/bin/docker run --volume=/tmp/release/oak:/workspace --workdir=/workspace --rm --tty gcr.io/oak-ci/oak@sha256:53ca44b5889e2265c3ae9e542d7097b7de12ea4c6a33785da8478c7333b9a320 ./scripts/runner build-functions-server
A test coverage report can be generated using go test -cover
(see https://tip.golang.org/doc/go1.2#cover). However, we cannot use it directly on the top-level repo, since we cannot run the authorization logic tests without bazel.
A test coverage report can be generated with bazel (https://docs.bazel.build/versions/main/coverage.html). But I was not able to generate html reports using genhtml
on output generated by the bazel coverage --combined_report=lcov [target]
command.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.