Giter Site home page Giter Site logo

attestation's Introduction

in-toto Build CII Best Practices Documentation Status

in-toto provides a framework to protect the integrity of the software supply chain. It does so by verifying that each task in the chain is carried out as planned, by authorized personnel only, and that the product is not tampered with in transit.

in-toto requires a project owner to create a layout. A layout lists the sequence of steps of the software supply chain, and the functionaries authorized to perform these steps. When a functionary performs a step in-toto gathers information about the used command and the related files and stores it in a link metadata file. As a consequence link files provide the required evidence to establish a continuous chain that can be validated against the steps defined in the layout.

The layout, signed by the project owners, together with the links, signed by the designated functionaries, are released as part of the final product, and can be validated manually or via automated tooling in, e.g. a package manager.

Getting Started

Installation

in-toto is available on PyPI and can be installed via pip. See in-toto.readthedocs.io to learn about system dependencies and installation alternatives and recommendations.

pip install in-toto

Create layout, run supply chain steps and verify final product

Layout

The in-toto software supply chain layout consists of the following parts:

  • expiration date
  • readme (an optional description of the supply chain)
  • functionary keys (public keys, used to verify link metadata signatures)
  • signatures (one or more layout signatures created with the project owner key(s))
  • software supply chain steps correspond to steps carried out by a functionary as part of the software supply chain. The steps defined in the layout list the functionaries who are authorized to carry out the step (by key id). Steps require a unique name to associate them (upon verification) with link metadata that is created when a functionary carries out the step using the in-toto tools. Additionally, steps must have material and product rules which define the files a step is supposed to operate on. Material and product rules are described in the section below.
  • inspections define commands to be run during the verification process and can also list material and product rules.

Take a look at the demo layout creation example for further information on how to create an in-toto layout.

Artifact Rules

A software supply chain usually operates on a set of files, such as source code, executables, packages, or the like. in-toto calls these files artifacts. A material is an artifact that will be used when a step or inspection is carried out. Likewise, a product is an artifact that results from carrying out a step.

The in-toto layout provides a simple rule language to authorize or enforce the artifacts of a step and to chain them together. This adds the following guarantees for any given step or inspection:

  • Only artifacts authorized by the project owner are created, modified or deleted,
  • each defined creation, modification or deletion is enforced, and also
  • restricted to the scope of its definition, which chains subsequent steps and inspections together.

Note that it is up to you to properly secure your supply chain, by authorizing, enforcing and chaining materials and products using any and usually multiple of the following rules:

  • CREATE <pattern>
  • DELETE <pattern>
  • MODIFY <pattern>
  • ALLOW <pattern>
  • DISALLOW <pattern>
  • REQUIRE <file>
  • MATCH <pattern> [IN <source-path-prefix>] WITH (MATERIALS|PRODUCTS) [IN <destination-path-prefix>] FROM <step>

Rule arguments specified as <pattern> allow for Unix shell-style wildcards as implemented by Python's fnmatch.

in-toto's Artifact Rules, by default, allow artifacts to exist if they are not explicitly disallowed. As such, a DISALLOW * invocation is recommended as the final rule for most step definitions. To learn more about the different rule types, their guarantees and how they are applied, take a look at the Artifact Rules section of the in-toto specification.

Carrying out software supply chain steps

in-toto-run

in-toto-run is used to execute a step in the software supply chain. This can be anything relevant to the project such as tagging a release with git, running a test, or building a binary. The relevant step name and command are passed as arguments, along with materials, which are files required for that step's command to execute, and products which are files expected as a result of the execution of that command. These, and other relevant details pertaining to the step are stored in a link file, which is signed using the functionary's key.

If materials are not passed to the command, the link file generated just doesn't record them. Similarly, if the execution of a command via in-toto-run doesn't result in any products, they're not recorded in the link file. Any files that are modified or used in any way during the execution of the command are not recorded in the link file unless explicitly passed as artifacts. Conversely, any materials or products passed to the command are recorded in the link file even if they're not part of the execution of the command.

See this simple usage example from the demo application for more details. For a detailed list of all the command line arguments, run in-toto-run --help or look at the online documentation.

in-toto-record

in-toto-record works similar to in-toto-run but can be used for multi-part software supply chain steps, i.e. steps that are not carried out by a single command. Use in-toto-record start ... to create a preliminary link file that only records the materials, then run the commands of that step or edit files manually and finally use in-toto-record stop ... to record the products and generate the actual link metadata file. For a detailed list of all command line arguments and their usage, run in-toto-record start --help or in-toto-record stop --help, or look at the online documentation.

Release final product

In order to verify the final product with in-toto, the verifier must have access to the layout, the *.link files, and the project owner's public key(s).

Verification

Use in-toto-verify on the final product to verify that

  • the layout was signed with the project owner's private key(s),
  • has not expired,
  • each step was performed and signed by the authorized functionary,
  • the functionaries used the commands, they were supposed to use,
  • materials and products of each step were in place as defined by the rules, and
  • run the defined inspections

For a detailed list of all command line arguments and their usage, run in-toto-verify --help or look at the online documentation.

Signatures

in-toto-sign is a metadata signature helper tool to add, replace, and verify signatures within in-toto Link or Layout metadata, with options to:

  • replace (default) or add signature(s), with layout metadata able to be signed by multiple keys at once while link metadata can only be signed by one key at a time
  • write signed metadata to a specified path (if no output path is specified, layout metadata is written to the path of the input file while link metadata is written to <name>.<keyid prefix>.link)
  • verify signatures

This tool serves well to re-sign test and demo data. For example, it can be used if metadata formats or signing routines change.

For a detailed list of all command line arguments and their usage, run in-toto-sign --help or look at the online documentation.

in-toto demo

You can try in-toto by running the demo application. The demo basically outlines three users viz., Alice (project owner), Bob (functionary) and Carl (functionary) and how in-toto helps to specify a project layout and verify that the layout has been followed in a correct manner.

Specification

You can read more about how in-toto works by taking a look at the specification.

Security Issues and Bugs

See SECURITY.md.

Governance and Contributing

For information about in-toto's governance and contributing guidelines, see GOVERNANCE.md and CONTRIBUTING.md.

Acknowledgments

This project is managed by Prof. Santiago Torres-Arias at Purdue University. It is worked on by many folks in academia and industry, including members of the Secure Systems Lab at NYU and the NJIT Cybersecurity Research Center.

This research was supported by the Defense Advanced Research Projects Agency (DARPA), the Air Force Research Laboratory (AFRL), and the US National Science Foundation (NSF). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA, AFRL, and NSF. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notice herein.

attestation's People

Contributors

adamzwu avatar adityasaky avatar alanssitis avatar arewm avatar axelsimon avatar chasen-bettinger avatar danbev avatar dasiths avatar dependabot[bot] avatar frimidan avatar github-actions[bot] avatar hectorj2f avatar joshuagl avatar lehors avatar marcelamelara avatar marklodato avatar mikhailswift avatar pxp928 avatar santiagotorres avatar steiza avatar tannerjones4075 avatar teq0 avatar tomhennen avatar trishankatdatadog avatar web-flow avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attestation's Issues

Should the predicateType's hash be included as a digest?

Per https://github.com/in-toto/attestation/blob/main/spec/README.md, the predicateType value should be a TypeURI that defines both the shape and meaning of the predicate.

Unlike most other URIs used in in-toto, the hash of the predicateType does not seem to be recorded in the attestation.

I would be curious why this is the case, as this seems like a security vulnerability. The host of the URI referenced in the predicateType may silently update the content it responds with. This then creates a situation in which the meaning of the predicate object in the attestation is now different, with potentially unwelcome consequences. A consumer of the attestation must therefore trust the host of the predicateType to serve the same content that the attestation author referenced.

It seems adding a DigestSet for its content would remedy this?

Question: Support of multiple SBOM formats.

Hi,
I'm new to the conversation, hope not repeating a closed issue (didn't find such):
The current SPDX predicate is SPDX specific. This does not comply to the "separating statement from predicate" goal; suppose one wishes to use CycloneDX; she would need to define a new predicate "cyclonedx". Thus a policy engine cannot query "was an SBOM attestation created?", without parsing the actual sbom object.

An alternative would be:
predicate type: sbom
predicate {
sbom_type: <spdx, cyclonedx,...>
sbom body :
}

What am I missing?

Add ability to differentiate between different types of `materials`

Currently all materials are lumped together (except recipe.definedInMaterial). It would be useful to be able to differentiate between different types of materials. We can probably look to SPDX Relationships for prior art and inspiration.

Example classifications:

  • "source" = application-specific code
  • "library" / "dependency" = code that was compiled in but part of another project
  • "build tool" / "dev dependency" = thing that was used as part of the build invocation but did not get "compiled in"
  • "base image" = starting point for the build invocation
  • "build orchestrator" = thing that ran on the build orchestrator (not sure if this is even in scope; see #25)

Example use cases:

  • Supply chain integrity / SLSA: better prioritize the "most important" materials.
  • Licensing: identify how the license of the material transfers to the product.
  • Vulnerably tracing: better identify which materials are likely to have affected the product.

Currently this can be done ad-hoc using extension fields, but it is probably valuable to standardize this. The main challenge is coming up with something that works well for most cases and can be done in practice by generic builders like GitHub Actions and Google Cloud Build.

Defining a generalized predicate format for "human reviews" of artifacts

We started discussing what code review attestations should look like, and @iamwillbar suggested checking if we could define a generalized predicate for reviews. We could then derive code review and other formats (perhaps vuln scans like #58) from this general predicate. As a starting point, we've potentially got the following sections in the predicate.

Edit: the general consensus is that we should only handle attestations for human reviews here, so that's what we're going to be focusing on.

Meta Information

In the case of code reviews, this could identify the person performing the review, but in the case of a vuln scan, it could identify the scanner used. One thing to note is that this may be unnecessary in the case of some reviews, because we could use the signature to identify the functionary as we do with links. Further, we could capture temporal information about when the review was performed.

Artifact Information

This section essentially identifies the artifacts the review applies to. We can incorporate ITE-4 here as well. One project we're looking to work with is crev, so we could use something like purl as well to identify package-level reviews as a whole. One open question may be reviews for diffs, and how they'd chain together to apply to a file as a whole.

Edit: we can probably lean on VCS semantics via ITE-4 and tie CRs to the underlying VCS changes / commits they correspond to. Also, as noted below, this would be part of the statement's subject rather than the predicate, but it was included here to nail down exactly what we're likely to be capturing in a review.

Review Result

This is pretty self explanatory for both CRs and vuln scans. I'm not sure if a negative result CR should exist, except maybe for package-results as they're used in crev.


These are all early thoughts in defining a generalized review format, and I'm curious to hear what people think about this. Also open to hearing about other projects like crev we should be paying attention to when working on this.

Link to known attestations in repository

In today's in-toto community call while discussing open ITE's we discussed it being useful for both adopters and maintainers to have a collection of different in-toto attestations that exist in the wild.

Some examples I know of:

Support subjects that are not digests

We have use cases where we need to refer to subjects that cannot be describe by a cryptographic hash. Example: Subversion repository revision, which is only identified by URI. We'd only want to support things that are semantically immutable.

Cut initial release

We plan to cut an initial release before the end of April. Note that each layer can have separate revisions.

Questions:

  • How should we number the revision? 1? 0.1? 2021-04? Something else?
  • Should the first one be some sort of alpha / draft / provisional, or a full release?

If we use integers, the decision is simple. First release is 1.
If we use 0.x/1.x (or "draft"), which of Statement / Provenance / SPDX is 1.0 and which is 0.1 / draft?
If we date, what date should we use.

My suggestion:

  • Statement: v1 (we're fairly confident that it is good enough, and any changes could be reasonably considered v2)
  • Provenance: v1-beta1 (I'm not super confident that we won't make changes)

Thoughts?

Add support for Cyclonedx as a predicate type

intoto attestations currently document the SPDX predicate type. SBOMs and BOMs in general are a diverse space as of now and CycloneDX is the other leading industry alternative to SPDX for SBOMs and recognized by NTIA as a recognized SBOM format.

CycloneDX supports other capabilities apart from just SBOMs. A particularly interesting one is the VEX capability which introduce a standard format to attach vulnerability information.

intoto should document and introduce well defined predicate types for the various CycloneDX BOM formats (not just SBOM).

cc: @stevespringett @coderpatros

Support attestation revocation

It would be useful to have a mechanism for revoking specific sets pf attestations without having to revoke an entire key. A real-world use case is that a builder had a bad release and generating bad provenance for a short period of time. We'd like to revoke the provenance generated only by that bad release, without having to do a full key revocation, since the latter would have a much larger negative impact.

Note that signature revocation was mentioned in secure-systems-lab/dsse#39, where we said it would be a better fit inside the payload. That's why I filed the issue here.

It's also possible we push this down further into the predicate and have predicate-specific methods. For the use case above, https://slsa.dev/provenance could have a builderVersion field and we could revoke based on that. But I don't particularly like that idea since revocation seems like it would apply equally to all attestations.

I don't have good ideas for solutions, but wanted to mention this here since it is is a real issue that has already come up.

Define a "attestation bundle" data structure and naming convention

Copied from secure-systems-lab/dsse#20.

We need a data structure and file naming convention to associate multiple attestations to a single software artifact.

Motivating use case

Suppose file foo.out is associated with two attestations, both of which are required by the layout of foo.out:

  • Provenance: the build system generates a link saying that foo.out was produced from material foo.c.
  • Vulnerability scan: a scanner generates a link saying that foo.out is free of known vulnerabilities.

The build system and the scanner need to know where to place the links on the filesystem, and the verifier needs to know how to find those links when evaluating foo.out.

Data structure

Define a "bundle" data structure containing multiple attestations (i.e. multiple DSSEs).

Ideas:

  • JSON Lines
  • ZIP file with particular naming convention
  • JSON object or array

File naming convention

Define a convention for locating the attestation bundle related to a given file on disk.

Proposal: <file>.<bundle_extension>

Example: If we choose JSON Lines, perhaps the attestations for foo.out could be found in foo.out.intoto.jsonl?

Ability to refer to one attestation from another

There are likely use cases where you want to refer to one attestation from another. For example, you could have a "Policy Decision" predicate that says "I allow subject x to run in environment Y, based on input attestation Z."

Since this is just theoretical at the moment, we'll wait until we have a few concrete use cases to actually design and implement this. Please add any use cases you might have to this issue.

The straightforward solution is to treat the entire attestation as an artifact, and thus you refer to the attestation as a hash over the envelope. The downside to this approach is that it prevents one from re-encoding the envelope, such as if you add a signature to it, because doing so changes the hash. (Counter-point: don't do that.) Perhaps this is the best option.

Alternative ideas:

  • Refer to the hash of the statement, rather than the envelope. Downside: that loses who signed it, which is critical information.
  • Refer to the hash of some canonicalization of the envelope. Downside: that relies on canonicalization (frowned upon, see ITE-5), needs to be invented, and adds complexity.
  • Each attestation has a UUID. Downside: adds complexity and could be error prone. For example, what if someone else creates one with the same UUID, either accidentally or on purpose?
  • Store attestations in some ledger, then you can refer to the location within the ledger (transaction ID or leaf hash). Downside: requires a ledger, and is seemingly not any better than the straightforward solution.

Clarify that any envelope can be used

The Envelope layer is not specified in this spec. We recommend a particular one (signing-spec) but the attestation format can be used with any envelope, so long as the producer and consumer agree.

Provenance: consider splitting `builder.id`

"Builder" primarily refers to the (id)entity generating the provenance, not the "runner" that actually did the build. This feels like it could easily be broken into two pieces to make things more clear (esp. at the lower SLSA levels): 1) authority that attests to the result of the build / content of provenance 2) runner that executes the build steps.

Originally posted by @msuozzo in #39 (comment)

Explain "completeness"

Explain that "completeness" is ultimately an agreement between producer and consumer, usually through some intermediary "accreditor" as in SLSA. Counterexamples would help.

Provide guidance on level of granularity for `recipe`

We need to provide guidance on how granular a recipe should be because the question has already come up. Most builds end up executing a series of steps. We want to discourage listing every single step that was run because it makes writing policies too difficult (duplication between policy and workflow definition, and the need to chain ephemeral artifacts between steps). At the same time, if the recipe is too coarse, it can lose valuable security information, such as a "prod" vs "test". This is a bit of a judgement call.

Proposed guidance: a recipe SHOULD be the smallest unit of work that a policy would reasonably want to identify. In a CI/CD scenario, each execution SHOULD be independent: the execution of one recipe SHOULD NOT affect the starting state of the next execution.

Example: A GitHub Actions Workflow has three levels: workflow โ†’ job โ†’ step. A yaml file defines a workflow, which may contain multiple jobs, each of which may contain multiple steps. Jobs are independent, meaning that each is run within a fresh VM. Steps are dependent, meaning that each step uses the state of the previous step. Therefore, the right level of granularity is the "job" and the recipe.entryPoint is ":".

Add support for SCAI predicate

Per my discussions with @SantiagoTorres and @MarkLodato, I'm opening an issue to get this discussion going on a broader forum.

CDI (or Code Deployment Integrity) is a framework for high-integrity provenance for software artifacts. CDI enables verifiers to make trust decisions about artifacts based on attested code properties, and enables verifiers to additionally establish trust in the attesters generating provenance metadata. (See our position paper)

In more detail, CDI can enhance in-toto in three key areas:

  1. Capture metadata about code security properties/behavior.
    CDI attestations contain not only information about a step in the supply chain, but also authenticated claims about specific security properties (e.g. claim: "this binary enforces strict memory bounds checks", evidence: builder is a verified Wasm AoT compiler). CDI also supports obtaining attestations from static/binary analysis tools.

  2. Integrity for the attesters: Tool endorsements
    In order to be able to substantiate claims about code properties inserted by a specific supply chain step, CDI obtains additional attestations or endorsements (signed claims) about attesters. Endorsements about the tools may come from a variety of endorsers such as CAs or third-party auditors, or from running tools on trusted execution environments (e.g., Intel SGX). These endorsements provide additional integrity metadata about the tools, reducing trust in the tools (i.e., attesters) and allowing verifiers to determine the confidence they have in attestations coming from certain attesters.

  3. Contextual deployment policies
    Attesting to code properties throughout the supply chain enables developers to specify fine-grained contextual policies about artifacts that can be enforced at deployment time. For instance, I may wish for a container running my binary to be deployed with co-tenants so long as they are side channel-free. This last feature requires post facto auditing, but I envision these policies being attached as part of the metadata for an artifact, and enforced by a verifier (e.g., a container orchestrator) at deployment time. I also envision eventually being able to specify such policies for supply chain steps (e.g., "build my artifact with an AES library that is attested to be side channel free").

A couple extra notes. Some of these features are more well fleshed out than others. I'm also currently working on a prototype to demonstrate running compilation inside of a TEE like SGX, but supporting other TEEs will be important as well.

Where I'm uncertain is the most effective way to integrate support for CDI with in-toto. On the one hand, amending the in-toto Link predicate may make sense since CDI is complementary. At the same time, it may be better for backward compatibility to develop an in-toto compatible predicate schema for CDI, especially considering that in-toto also supports attestation bundles now. Or perhaps a combination of the two.

I'd love some input on this. Thoughts?

Provenance: indicate completeness of `recipe`

We currently have a metadata.materialsComplete field that indicates whether materials is complete or not. We need the equivalent for recipe.arguments and recipe.reproducibility. One minor thing is that it doesn't make sense for reproducibility to be complete but arguments is not.

Ideas:

  • Add metadata.argumentsComplete and metadata.reproducibilityComplete.
  • Add completeness.arguments, completeness.reproducibility, and completeness.materials (moved from metadata).
  • Use a field mask to list complete fields, as in complete: ["materials", "recipe.arguments"].
  • Do not support set-but-incomplete, and instead say that unset/null means "unknown" and set (but possibly empty) means "complete.
  • Use an enum for NONE, ARGUMENTS_ONLY, and ARGUMENTS_AND_REPRODUCIBILITY.

My inclination is the completeness.* option. That seems like it's the most straightforward. It also moves a security-critical field out of metadata.

Provide better guidance on `materials`

Currently it's unclear on what "complete" means for materials. Does it include materials used by the builder itself, e.g. the GitHub Actions orchestrator, or just the "runner"? To minimize the list, do we recommend that builds execute within a container? Is a container a sufficient security boundary? How confident do you need to be that it's coimplete?

My thinking is that we're shooting for "everything inside the runner container," and it's OK if malicious builds break out of the container, but you're capturing the intent of a "good" build.

Does this mean completeness.materials be an enum instead of a boolean?

@TomHennen FYI who raised this question.

Consider adding a "digest kind" and/or "content type" to subject

Currently, the attestation subject is a pure content digest. There are two related pieces of information we may want to consider adding:

  1. "digest kind": how to serialize the artifact into the hashing algorithm. Examples:
    • PE file: straight hash or the hash used for Authenticode?
    • JAR: straight hash or the hash used for JAR signing?
    • Git: commit ID or tag ID (if an annotated tag)?
  2. "content type": how the content is intended to be interpreted. Examples:
    • Docker image
    • Git commit
    • JPEG file
    • ZIP file

The main question is: Are there any compelling security or implementation reasons why we need this? If not, we leave it out to simplify things. If there are, then we need to come up with a design that overcomes the challenges below.

Background

Originally we had included something like this into the property name of DigestSet, e.g. "subject": {"gitCommit": "<sha1>"}. If you used "sha256", it meant straight file hash (no particular content type), but specific content types would use a digest kind appropriate for that type. Ultimately we decided against that because it added complexity without an obvious benefit.

Prior art: Rekor has a type registry that appears to be the same as "digest kind".

Potential benefits

  1. "Digest Kind" could tell verifiers how to compute the hash.

    • Counterpoint: They can just compute it in some canonical way or in multiple ways, without having to record it.
    • Counterpoint: In many applications, the hashing is done prior to reading the attestation, so having this information wouldn't help.
  2. There could be security vulnerabilities if a producer intended one kind/type and a consumer interpreted it as another. For example, the producer signed a PE Authenticode hash, but then the consumer interpreted it as a raw file hash.

    • Counterpoint: This is too abstract to warrant the cost. We need some sort of proof-of-concept to show that this would be exploitable in a somewhat realistic scenario.

Challenges

  1. Do you include the kind/type in the matching? For example, if one Provenance attestation lists a material sha256: X, type: Y and another attestation has subject: X, type: Z, does it match?

  2. What if the producer and consumer don't agree on the content type, or don't know the content type? For example, if a build system just produces files, it might not know that it is a docker image vs a zip file vs something else. Adding that configuration would add a lot of friction and room for mistakes. And if producer and consumer disagree, is that always a security issue? What if one thinks it's application/json and the other thinks it is application/vnd.oci.image.manifest.v1+json?

  3. How do you register these various digest kinds? If someone has a new one, how do they use it? What if that new type is private, e.g. a company-internal format?

    • If we use content type (which uses MediaType), do we have some mapping from media type to digest kind? What if that's wrong or incomplete?

Document convention for versioning

It would be good to come up with some convention for version numbers, using that for official specs (Statement, Provenance, etc.) and recommending it for other predicates.

Proposal

Strawman based on Semantic Versioning 2.0.0: "vMAJOR.MINOR". A change MAY increment only the minor version if consumers can parse the version as an earlier minor revision and have it still be considered accurate according to that earlier revision. For example, parsing a 1.1 message as 1.0 results in something that has the same semantic meaning as if the producer produced 1.0 directly. (I don't see a need for PATCH number since this is just a spec, not code.)

Examples for Provenance, going from 1.0 to 1.1 vs 2.0:

  • Adding a new buildFinished timestamp may increment MINOR because the field does not affect the meaning of any other field. The predicate still makes sense as a 1.0 message ignoring that field.
  • Add a new recipe.extraArgs field requires incrementing MAJOR because the overall meaning of the recipe changes. If you ignore the field, it is NOT semantically a 1.0 message because the recipe assumed that the existing 1.0 fields fully defined the recipe.
  • Modifying the meaning of builder.id requires incrementing MAJOR because it's changing an existing field. Interpreting it as 1.0 is not valid.

Alternatives

Alternative would be to just have MAJOR versions (v1, v2, v3, ...) and increment it on every single change. The downside is that then any minor increases would require all consumers to update.

Any thoughts?

Container Native Provenance Predicate?

Starting a rough brainstorm here to capture some ideas I had around the existing Provenance Predicate in this repo. The existing In-Toto Link format, Grafeas BuildProvenance and new In-Toto Attestation formats are all slightly off from my mental model.

At a high level, there are three things we care about: recipe (set of steps), materials (inputs), and final artifact (output). They each differ in how many of these you can have, and how they're defined.

Here's a comparison on their cardinality, and what I think I want:

Provenance

https://gist.github.com/dlorenc/2f31fcb3c9a5d0a06ad944f8b831b213

  • One Subject
  • One Recipe
    • Recipe only contains pointer to a material
    • Entrypoint (top level script)
    • Arguments
    • Environment
  • Multiple Materials

Link

  • Multiple Products
  • Multiple Byproducts
  • Multiple Materials
  • Single Environment

Grafeas BuildProvenance

  • Multiple Artifacts
  • Multiple Commands
    • Environment
    • Command
    • Args
  • Single Source

What I want

  • One Subject
  • One Recipe
    • Multiple Steps
      • Environment
      • Entrypoint
      • Arguments
  • Multiple Materials

In Graphviz format: https://gist.github.com/dlorenc/84fb062d0f0fd532b2cb603dc8648543

image

We can have multiple materials, one recipe that contains a set of steps. Each step can be it's own environment, typically a container image. These steps run against the materials, and produce an output. If the steps produce multiple output artifacts, the system should generate one of these for each individual output artifact.

This is all pretty rough and high level still, but I think this more closely matches the models of GCB, GitHub Actions, and TektonCD.

Create an index of predicates

Related to #54

With the provenance predicate moved to the SLSA repository, there was talk of creating an index of predicates that we can link to here. While we have few predicates now, it's likely worth discussing now how this index should work, and the formal processes surrounding the adding predicates to the index and keeping them updated long term. What sort of review mechanisms do you envision for predicates to be indexed in the in-toto/attestation repository?

cc @MarkLodato @TomHennen @SantiagoTorres @joshuagl anyone I missed?

A good TLDR for attestations

It wasn't clear to me until the community meeting today that attestations are a superset of classic in-toto links. Each attestation is simply like a standardized protobuf for certain types of additional information. Would be good to clarify this as TLDR in the README?

Provenance: add ability to add extensions?

We almost definitely will not get the Provenance schema right on the first try. When there is some new information that producers want to add to the provenance, it would be nice if they had a way to do so without having to either fork the spec or wait for a new version that adds that feature.

A few ideas:

  • Not allowed.
  • Allow additional fields to be added to any object, with any name.
    • Example: {"materials": [{..., "someNewField": "yay!"}]}
    • Pro: Simple.
    • Con: Possibility of name clash, where two producers use the same name but with different meaning.
    • Con: Precludes the "minor version" idea from #4, since a field of that name may be added in a future version.
  • Allow additional fields to be added to any object, with constraints:
    • Field name MUST be a URI (to address the two cons from above).
    • The meaning of all other fields MUST be unchanged if that field is ignored. For example, if you add a {"recipe": {..., "https://example.com/foo": true}}, then it should be perfectly safe for a consumer to ignore that new field.
  • Add an extensions field to each object, with basically the same constraints as above.

I'm leaning towards the second approach (URI fields) but would be happy to hear opinions.

Compositional Notion On Attestation Types

It'd be nice to describe a strategy for different types of attestations within the same action.

Should we have these two types of attestations live under the same object? or would they be separate attestations attached to the same action. I.e., you could have 1 subject with multiple predicates.

F.e., Having a provenance type that includes a measured boot record:

{
  "subject": [
    { "name": "curl-7.72.0.tar.bz2",
      "digest": { "sha256": "ad91970864102a59765e20ce16216efc9d6ad381471f7accceceab7d905703ef" }},
  "predicateType": "https://in-toto.io/Provenance/v1",
  "predicate": {
    "builder": { "id": "https://github.com/Attestations/GitHubHostedActions@v1" },
    "recipe": {
      "type": "https://github.com/Attestations/GitHubActionsWorkflow@v1",
      "definedInMaterial": 0,
      "entryPoint": "build.yaml:maketgz"
    },
    "metadata": {
      "buildStartedOn": "2020-08-19T08:38:00Z"
    },
    "materials": [
      {
        "uri": "git+https://github.com/curl/curl-docker@master",
        "digest": { "sha1": "d6525c840a62b398424a78d792f457477135d0cf" },
        "mediaType": "application/vnd.git.commit",
        "tags": ["source"]
      }
    ]
    "tpm-measured-boot: {
       "PCR0": "xxxx",
       "PCR1": "yyyy",
       ...
    }
  }
}

This way, we would be able to know provenance information of the build + information about the host's integrity.

Reproducible builds expressed as Provenance predicate

There are a few open questions for how to support reproducible builds in the Provenance predicate:

Q1. Should each rebuilder produce its own attestation, or should all rebuilders sign the same attestation?

(option A) With the current schema, builder is required so each rebuilder must produce a unique attestation. The consumer would then verify that all the fields are identical except builder.

(option B) An alternate idea would be for all rebuilders to sign the same statement. This would only work if builder is optional and implicit from the signing key.

Q2. Should the Provenance attestation be sufficient to reproduce a build?

I think that's a good idea, but I'm not sure we're there yet.

At a minimum, we should document that notion and perhaps have a reproducible: true field to indicate that the builder thinks it is reproducible.

Even then, the rebuilder needs to understand the recipe. For example, if GitHub Actions is used, the need to understand how to parse and run the workflow. It would be nice if that we had some standardized convention for recipes, so that all builds speak the same language. Too big a project for the attestation format, but something to think about as part of SLSA.

Provide better guidance on `subject.name`

The documentation currently does not give clear guidance on how to properly use subject[*].name. For example, is it OK to consider one attestation {subject: [A, B], predicate: P} identical to two attestations {subject: [A], predicate: P} and {subject: [B], predicate: P}. That is what the Processing Model implies, but I don't think that is what we want in the general case. There might be cases where you want to know that A and B are both produced from the same process.

Thanks @nenaddedic for reporting.

Consideration: where to place "workload" attestations

I'm thinking of something like a SPIRE SVID/Bundle to attest that a build step was actually carried out by a particular workload, on a particular node.

In the case of provenance generated by a separate orchestrator - the orchestrator could obviously include some cryptographic proof of where it was running. But we could also potentially include an attestation from the build process itself.

Where would make the most sense to include this?

[Help Wanted] Determine attestation format for vuln scans

This is a follow-up issue for the sigstore/cosign#442.

I thought it would be more appropriate to continue the discussion for the aforementioned issue, since the spec is non-cosign-related.


With @developer-guy, we are currently trying to determine a vuln scan spec, as far as we can do best. We can enrich the following brand-new attestation SPEC:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [
    {
      "name": "alpine",
      "git_commit": "a1b2c3",
      "digest": {
        "sha256": "c201c331d6142766c866"
      }
    }
  ],
  "predicateType": "SARIF",
  "predicate": {
    "timestamp": "1627564731",
    "owner": {
      "name": "<WHO_RAN_THIS_SCAN>"
    },
    "environment": {
      "name": "GitHub",
      "by": "<PIPELINE_ID>",
      "type": "<CI/CD?> (i.e., GitLab Runner)",
      "cluster": "us-east1-a.prod-cluster",
      "namespace": "<namespace>"
    },
    "success": true,
    "scanners": [
      {
        "name": "trivy",
        "version": "0.19.2",
        "db": {
          "name": "trivydb",
          "version": "v1-2021072912"
        },
        "timestamp": "1627564731",
        "result": "<SARIF RESULT HERE?>"
      },
      {
        "name": "dockle",
        "version": "v0.3.15",
        "timestamp": "1627564731",
        "result": "<SARIF RESULT HERE?>"
      }
    ]
  }
}

We called the predicateType as SARIF. But I think that name, not fits this type since the content is not in SARIF format. We may have to reconsider the name.

It's obvious that it's a bit hard to think of best practices during the implementation of the first version of the spec. It would be great if you maintainers get involved and give a hand to us to improve the overall structure. So we can easily implement the model into in-toto project in order to do validate and generate the attestation. Is that make sense to you? Thanks! We are waiting for your feedback about this.

FYI @dlorenc @NitinJain2 @trishankatdatadog

Proposal: support subjects that have no digests

Currently, an in-toto Attestation subject can have the form {name, digest} and where both fields must be present. A name field can have the value "_" when a subject has no meaningful name, but it is currently impossible to specify a subject that has no meaningful digest.

We would like to support subjects of the form {name, uri} for subjects that do not have a meaningful digest. Examples:

  • A subject URI that identifies a builder (example: https://build.example.com/[email protected]). This could be the subject of an attestation that the builder meets a certain SLSA level, and it could be referenced in the builder section of a provenance attestation.
  • A subject URI that identifies a specific revision of a source-code repository (for example, svn+ssh://<host>/<repo-name>/<revision-number>). This subject could be referenced in the materials section of a provenance attestation, and it could match the subject of an attestation that the repository meets a certain SLSA level.

Why not use a content digest?

  • Computing a digest may not be feasible. For example, computing a digest over a (large) source repository at a specific revision number; or over the components that make up a build system stack, from the applications that manage the execution of compilers and other tools, to the operating systems and virtual machines that the software runs on.
  • A digest may not have useful semantics. For example, when a build system's content digest were hypothetically computed over only the builder-specific components of the build system stack, even a trivial software update would change that digest, without affecting any of the build system's SLSA security properties, i.e. things that we actually care about.

Proposal

  • In a Statement subject:
    • A subject can have the new form {name, uri}, as an alternative to the existing form {name, digest}.
    • The subject uri field uses resource URI syntax. See "What makes a good subject URI?" below for desirable properties.
    • When searching for an attestation, require an exact match with a subject URI.
    • No change is needed in the subject name field spec. This field may contain "_", or additional information to evaluate the attestation (for example to select between "production" vs "testing").
  • No change is needed in the Provenance predicate spec. The fields of interest, builder and materials, already support an URI without a digest.

What makes a good subject URI?

  • A good subject URI has immutable semantics. If a resource is semantically changed, then its subject must also change. For example, a source repository URI svn+ssh://<host>/<repo-name>/<revision-number> has a new a revision number after each committed change, or a new build system URI https://build.example.com/worker@<version> has a new version after a change in its SLSA security posture.
  • A good subject URI has universal semantics. If a subject has different semantics for different observers, then it is not a good subject.

Potential extensions: subject matching

For now, we require exact matches when searching for an attestation subject URI. In some cases it may be desirable that an attestation can apply to a collection instead of a specific instance. For example, an attestation that all revisions of a source repository meet a certain SLSA level as of some revision number or point in time. This potential extension would require a way to match a specific URI instance in provenance, against a collection (or class) specified in an attestation subject.

Add complete examples

Add complete examples that show all the steps, including the crypto (with dummy public keys checked in).

Authenticated how?

The README says:

This repository defines the in-toto attestation format, which represents authenticated metadata about a set of software artifacts. Attestations are intended for consumption by automated policy engines, such as in-toto and Binary Authorization.

How is the metadata authenticated?

Explain why we recommend PURL and SPDX Download Location

Right now the docs simply recommend PURL and SPDX Download Location without explaining why. We have had a question for the rationale behind this suggestion.

Draft:

  • SPDX Download Location covers version control systems. There is no standard way to reference a git repo via a URI (and identify that it is git). SPDX Download Location is one of the most popular way of doing so, so we chose that.
  • PURL covers most packaging ecosystems. If one is missing, you can add it. It is really the only universal option. I haven't found any other scheme that does this.
  • Both SPDX and PURL are used extensively within the SBOM ecosystem, which fits nicely with attestations and SLSA.
  • Regular https can be used when it's literally just fetching a file with a GET request.

You can use a different URI scheme if needed. For example, within Google we'll use some internal URIs for systems that are not public. But on the internet, I suspect PURL and SPDX will cover most cases.

How do attestations change what verifiers are expected to support?

An issue raised in the community meeting today is that it's not yet entirely clear how ITE-6 aka attestations change what verifiers are expected to support by default.

For example, Santiago suggested that verifiers should support classic link "attestations" by default, and the community should decide how to add support for the rest.

Make predicateType optional

For the case of plain code signing (e.g. via cosign), there is no predicate type. (More accurately, there is some implicit predicate implied by the public key.) To support such use cases, we should make predicateType optional if predicate is not present.

Provenance: add a policy section

In the provenance spec, add a policy section documenting how the provenance is intended to be consumed. This should aid readability, just as the Processing Model helped with the Envelope/Statement spec.

Only require 1 approver

We keep finding ourselves lacking two separate reviewers for PRs. IMO it's unnecessarily burdensome to have two reviews for every change. Any objections to decreasing this down to 1?

Move Provenance to SLSA repo

The current Provenance predicate is described as a generic way to express provenance, but it was designed expressly for SLSA. It makes certain assumptions and trade-offs, such as carefully designing the fields to avoid mistakes when applying a SLSA policy. Other use cases of "provenance" may make different trade-offs, such as including the list of build steps that were performed to allow policies to detect curl | bash, which for SLSA is unnecessary and may lead to confusion.

To avoid these issues, it might be best to move all of the predicates out of this repo and instead maintain an index of links.

  • Provenance -> SLSA
  • Link -> in-toto (a different repo)
  • SPDX -> something maintained by SPDX team

That would make it more clear that (a) other definitions of "provenance" are OK for different use cases, and (b) not all predicates need to be defined in this repo.

Any thoughts? cc @adityasaky @TomHennen @dlorenc

Timestamp somewhere?

We've been discussing support for vulnerability scans as a type of "attestation" over here: sigstore/cosign#442

and it's clear that these will need some form of timestamp to work correctly. A vulnerability scan is timely, and should only be considered valid for a specific period of time after it is generated. This also helps align with the principle of "monotonicity", where the absence of an attestation should never move a decision from DISALLOW to ALLOW.

This could be done with a timestamp inside a custom scan predicate, but it might also be useful to place this at the statement layer. I'm not convinced either way yet.

cc @joshuagl @SantiagoTorres

Provenance: rename `reproducibility`

The name recipe.reproducibility sounds like whether it's a reproducible build or not, rather than the set of builder-controlled values that affected the build.

Maybe recipe.environment?

Provenance: remove `mediaType` and `tags` in favor of extensions

In Provenance, the materials[*].mediaType and materials[*].tags fields are not well defined. It is unclear exactly how they should be used or what the conventional values are. Given that we have a way to add extension fields (#8), let's remove these fields for now until we have a better idea on how to standardize them.

Attestation Context\Meta-data\Meta-information

Following #58 here, opening the mentioned issue.

The statement-level meta-data should hold enough information to enable:

  1. Simple policy decisions that are agnostic to the predicate details.
  2. Enable a first level of indexing of the attestations for later recall.
  3. Enable parsing of the predicate.

The current mandatory fields in the statement level are the subject and the predicate-type, which is, as a matter of fact, the predicate-media-type.

Fields that could be of use at the statement level include:
Predicate abstract type - "sbom", "provenance"
Predicate media type - the exact format (uri) (for SBOM- SPDX, SPDX-Lite, CycloneDX, for provenance - slsa-provenance)
When was the attestation taken: Timestamp #46
Where was the attestation taken: Location in pipeline - . I suggest an abstract location and a specific location: the abstract context could be a string with recommended values (user workstation, git-server, build machine etc.), and the specific context could be some machine ID.
Project id - could be a url such as https://github/myproject or simply a string set by the entity creating the attestation. There is a difference between the project id and the subject; the subject would typically be an artifact, but a project may produce many subjects.
One could of course use multiple subject fields (as supposed to be supported), but that is not natural.
An application specific object field - it is always convenient to have a placeholder for a generic object for implementation-specific. As I understand this is supposed to be supported see the parsing rules, but it would be better not to rely on the "undefined" but to explicitly define an application-specific object placeholder.

Such fields enable elaborated policies at the statement level (for example: require an sbom produced at build, without caring about the SBOM details), and would enable indexing to support searching attestations: search by project, subject, time, part of pipeline etc.

What are the attestation community thoughts about this?

Predicate-agnostic graph representation

Multiple predicate types may include "links" to other artifacts, which effectively forms a graph. For example, Provenance has materials and a future "PolicyDecision" may have "input policy". Right now, to traverse the graph, one needs to understand the predicate type in order to pull out the graph edges and labels.

Several people, such as @tiziano88 and @SantiagoTorres, have expressed a desire to have a generic mechanism to walk the graph without having to understand the predicate type. Prior discussion: in-toto/ITE#15 (comment)

As explained in that discussion, there are several major open questions that prevent such a common feature from being useful, such as:

  • What is the real-world use case for doing such predicate-agnostic graph traversal?
  • How would the graph traversal work if the size of the graph is intractably large and you don't understand the edge labels? If you trim at, say, depth N, why is that meaningful?
  • What is the abstract model and terminology for how this works?
  • If there are other predicates that don't support this, e.g. SPDX which has its own link convention, is there still value in doing this? In other words, if you're going to have to support multiple predicates anyway, why do we need a standard?

In my opinion, this all these details need to be worked out before we add such as feature.

In the meantime, we can suggest a convention based on what the Provenance predicate does. Other predicate types can use the same data structure and perhaps even the same field name.

Where to put implementations?

Does anyone have an opinion on where implementations of the in-toto Attestation should go?

We need to create a Java implementation that handles the 'Provenance' predicate (soon to move to the SLSA repo), in an in-toto Statement (this repo), wrapped in a DSSE envelope.

The existing in-toto-java repo seems to handle the existing link format (but not DSSE, etc...).

FYI @Alos who is looking into implementation options.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.