Background
The OSV schema has been adopted by Go, OSV, Python, Rust, and UVI to describe vulnerabilities in open-source software. The OSV schema’s key advantage over the CVE format is that it identifies the specific affected packages and versions in a precise, computable way.
For example, suppose we wanted to check whether a particular software package, as described by an SBOM, made use of any open-source components with known vulnerabilities. An SBOM for a given package ecosystem would be a list of its packages and versions. A tool can test whether each SBOM entry is affected by a database entry written to the OSV schema, without any additional information (such a version or commit graph or access to the repository containing the source code for the open-source software). This is what we mean when we say the package and version identification is computable.
We propose that the new CVE JSON schema be changed to make its package and version identification computable too. This would make it possible for vulnerability-checking tools to check SBOMs against the CVE database as easily as they can currently check SBOMs against OSV-schema databases. Adjusting the CVE JSON schema would also allow OSV-schema databases to embed their information into CVE format, allowing all their vulnerability information to be pushed upstream to the CVE database and then propagated to any CVE-aware software, a net benefit for the entire software ecosystem.
This issue focuses on computable version identification. See issue #86 for computable package identification.
Computable version identification
After identifying that a particular package listed in an SBOM matches a package in a CVE database entry (#NNN), a vulnerability scanner must next identify whether the specific version in the SBOM is considered affected by the CVE. The entry must include self-contained information sufficient to make this decision algorithmically. The current schema does not satisfy this requirement (or else it is unclear how it does).
What is the algorithm for deciding if a version is considered affected? The current spec does not provide details on how to evaluate the rules. At the start, it is unclear whether the “versions” list must be grouped by “versionGroup” before further processing, so we’ll suppose there is a single group in our examples. It was also unclear which logical operator to apply to the version entries. Issue #12 says that rules should be evaluated with AND, which makes it impossible to list individual versions. For example:
"versions": [
{"versionAffected": "=", "versionValue": "1.0.0"},
{"versionAffected": "=", "versionValue": "1.1.0"},
]
The explanation in #12 is that this means “version = 1.0.0 AND version = 1.1.0”, which doesn’t match any version at all.
According to the answer in #12, expressing multiple disjoint ranges of versions is also not possible. For example:
"versions": [
{"versionAffected": ">=", "versionValue": "1.0.0"},
{"versionAffected": "<", "versionValue": "1.2.0"},
{"versionAffected": ">=", "versionValue": "1.5.0"},
{"versionAffected": "<", "versionValue": "1.6.0"},
]
Here it seems clear the intended interpretation would be
(version >= 1.0.0 AND version < 1.2.0) OR (version >= 1.5.0 AND version < 1.6.0),
but there is no obvious way to encode this. Using ! operators would also not work. There is no boolean normal form with only one logical operator (that is, only AND, or only OR).
A second, related problem with the current schema is that even the definitions of operators like “>=” are not algorithmically precise. Clearly these are not string comparisons: 1.2.0 < 1.10.0. But neither are they simple element-wise comparisons: in packagers using Semver, 1.2.0 > 1.2.0-alpha. In Maven, even the alphabetic parts do not compare with strict regularity. In particular, this ordering applies:
"alpha" < "beta" < "milestone" < "rc" = "cr" < "snapshot" < "" = "final" = "ga" < "sp"
An operator like “>=” cannot be applied without reference to a particular version ordering algorithm, and the CVE schema omits that information.
The different operator variants are also confusing. For example, is there any difference between these two?
"versions": [
{"versionAffected": ">=", "versionValue": "1.0.0"},
{"versionAffected": "<", "versionValue": "1.2.0"},
]
"versions": [
{"versionAffected": ">=", "versionValue": "1.0.0"},
{"versionAffected": "!>=", "versionValue": "1.2.0"},
]
Or is this one any different from those two?
"versions": [
{"versionAffected": ">=", "versionValue": "1.0.0"},
{"versionAffected": "<", "versionValue": "1.2.0"},
{"versionAffected": "!>=", "versionValue": "1.3.0"},
]
The result of “is this version affected?” should be a boolean yes/no, or at worst yes/no/maybe, but the current operators allow yes/no/maybe/undocumented, with no guidance as to what CVEs should do. Should tools treat “no” differently from “undocumented”? Is it a best practice to document all the negative ranges too? Why?
The CVE schema needs to address these deficiencies so that tools have clear algorithms for deciding whether a particular version is affected by a particular CVE.
OSV’s solution
The OSV schema addresses all these ambiguities as follows, which we suggest CVE adopt the basic ideas of. This is not the only possible solution but we believe it is a good one.
The OSV schema supports both an enumeration of specific affected versions and an enumeration of specific affected ranges. The set of affected versions is the OR of the entries in these lists - there is never an AND.
A range specifies a contiguous range of versions according to some defined version ordering. Today, those are “SEMVER” (preferred), “GIT”, and “ECOSYSTEM”. The “GIT” and “ECOSYSTEM” (meaning “packager-defined ordering”) range types are not directly understandable by general-purpose tools; such ranges are extra information understandable only by special-purpose tools. A particular entry is required to ensure that all affected versions are either listed in the explicit enumeration or in a Semver-type range, both of which can be processed by standard, packager-independent algorithms.
Each range is an object with three fields: type (the ordering), introduced, and fixed. The affected versions are those >= introduced and < fixed. If introduced or fixed are omitted, then that end of the range is left open.
For packagers that use Semver ordering, such as Go, NPM, and Rust, it suffices to specify only ranges:
"affects": {
"ranges": [
{"type": "SEMVER", "introduced": "1.0.0", "fixed": "1.14.14"},
{"type": "SEMVER", "introduced": "1.15.0", "fixed": "1.15.17"}
]
}
For packagers that use other orderings, a packager-specific range can be listed, but the packager’s own vulnerability database tooling must “compile out” the range into an explicit list as well, for consumption by general-purpose tools, as in this Python example:
"affects": {
"ranges": [
{
"type": "GIT",
"repo": "https://github.com/pikepdf/pikepdf",
"fixed": "3f38f73218e5e782fe411ccbb3b44a793c0b343a"
},
{
"type": "ECOSYSTEM",
"introduced": "2.8.0",
"fixed": "2.10.0"
}
],
"versions": [
"2.8.0", "2.8.0.post1", "2.8.0.post2", "2.9.0", "2.9.1", "2.9.2"
]
}
(The “GIT” range has an additional field “repo” to specify the URL of the source repository containing the given commits.)
The “versions” list specifies the same versions as in the “ECOSYSTEM” range, just in a more accessible way. General-purpose tooling would ignore the “GIT” and “ECOSYSTEM” ranges, relying instead on the “versions” list in this case.
Potential CVE adaptation
We propose to change the current version schema from:
"versions": [{
"versionGroup": string,
"versionValue": string,
"versionAffected": string,
"platforms": [string],
"references" [...],
}],
to:
"versions": [{
"list": [string],
"range": {
"type": string, // semver, git, or packager
"fixed": string,
"introduced": string,
"repo": string, // for type git only
},
"unsure": bool,
"platforms": [string],
"references" [...],
}],
The only combining operator is OR, making the algorithm for matching much clearer. A particular version would be considered affected if it is matched by any of the entries in the overall “versions” object list. A version is matched by an entry if it appears directly in the “list” or if it is in the “range”. This structure allows non-standard ranges to include their version lists in the same object, which is an improvement over the OSV schema, and it allows a particular range or list to be qualified by a “platform” list as well.
The “unsure” entry allows a range or list to be marked as unsure, equivalent to using the current ?>= etc operators.
The current !>= etc operators are removed: to say that a version is unaffected, leave it unlisted.