cveproject / cve-schema Goto Github PK

This repository is used for the development of the CVE JSON record format. Releases of the CVE JSON record format will also be published here. This repository is managed by the CVE Quality Working Group.

License: Creative Commons Zero v1.0 Universal

JavaScript 2.44% HTML 85.44% CSS 0.95% Python 10.75% Shell 0.05% Perl 0.37%

cve-schema's Introduction

Current Version of CVE Record Format

Major changes to cve-schema repo architecture!! if you have integrations that rely on the cve-schema repo structure, please review the changes here. The latest version of the CVE JSON record format is 5.1.0. A single schema file with bundled dependencies is available here.

Note: The ADP functionality in the current schema is not yet deployed in CVE Services. The ADP functionality is currently under development and is for future use.

Note: Please refer to the CVE Services page here for known issues with the schema.

CVE Record Format Overview

cve-schema specifies the CVE record format. This is the blueprint for a rich set of JSON data that can be submitted by CVE Numbering Authorities (CNAs) and Authorized Data Publishers (ADPs) to describe a CVE record. Some examples of CVE record data include CVE ID number, affected product(s), affected version(s), and public references. While those specific items are required when assigning a CVE, there are many other optional data in the schema that can be used to enrich CVE records for community benefit.

Learn

Learn more about the CVE program at: https://www.cve.org/

This CVE record format is defined using JSON Schema. Learn more about JSON Schema at: https://json-schema.org/ .

Latest

The latest version of the record format is 5.1.0. It is specified in the JSON schema at https://github.com/CVEProject/cve-schema/blob/master/schema/CVE_Record_Format.json

A single schema file with bundled dependencies is at https://github.com/CVEProject/cve-schema/blob/master/schema/docs/CVE_Record_Format_bundled.json

Documentation and Guidance

Documentation about this format is available at https://cveproject.github.io/cve-schema/schema/docs/

A mindmap version of the CVE record structure is at https://cveproject.github.io/cve-schema/schema/docs/mindmap.html

More details about Product and Version Encodings in CVE JSON 5.1.0 record is at https://github.com/CVEProject/cve-schema/blob/master/schema/docs/versions.md

Examples

A basic example of a full record in 5.1.0 format with minimally required fields is available at https://github.com/cveproject/cve-schema/blob/master/schema/docs/full-record-basic-example.json

An advanced example of a full record in 5.1.0 format is available at https://github.com/cveproject/cve-schema/blob/master/schema/docs/full-record-advanced-example.json

A basic example of a cnaContainer, to be used with CVE Services, is available at https://github.com/cveproject/cve-schema/blob/master/schema/docs/cnaContainer-basic-example.json

An advanced example of a cnaContainer, to be used with CVE Services, is available at https://github.com/cveproject/cve-schema/blob/master/schema/docs/cnaContainer-advanced-example.json

cve-schema's People

Contributors

Stargazers

Watchers

Forkers

isabella232 erahn david-waltermire tcullum-rh ekmixon sthagen mprpic vmaas-in evansjonathan syllogy rsc mirrorfork sei-vsarvepalli chandanbn dmajoit airadier funiantongxue antonyggvzvmnxxcx profadi2 sierrezinal m4ll0k matchjuan859 dbolkensteyn jba bitcoinlegends59 anthonysidesap dim0x69 janwillemhuls g2ops imantrepreneurial qc-cna deckhandfirststar01 taladrane halcyondude peteos123 iamamoose jus4me2 jmello-pagseguro thrhxcv kojinnsosiki cowlord mhmh261 cyberflamego taldromi brettp wakohenryke seanpm2001 johnjogg markoenix moblife612 moejeb2023 osmirrors akasyagroup slumericanafi the-delta-slumerican-organization eelman4 namnlm blue-americus 44dealonbestoffer electricnroff samkenxstream kokomo1982 dimmuborgir666 empyrean-research-and-design https-3hcnsv-us2-com qpc-github zmanion satoshingmx onceagain12 huiiim af-fa ethicalsecurity-agency jessejay-ch 214255 amd-io seanmcquilling plopo-ucsd brianallen229 nerosow kapytus l3aalteshuva kernelsmith hau99 kim85jm fjscao avimoyal raboof gka8543 twonpuncho23 jillumio l00ps xsizxenjin mati137 sukhsingh93 mahamad1234 jonathanlevans gorkem9940 aka2024 rprli1 mmtayyar

cve-schema's Issues

is vendor == CNA? Capture a way to say a product is from a different origin/source/upstream

create a new tag?

product tags? type of product?

affected vendors/affectsCpe could cause ambiguity

I am not sure all of the semantics behind choosing one of affected.affectsCpe, affected.vendors, or affected.affectsSwid, but currently, the schema allows for potentially all 3 to be submitted. However, affected.vendors can contain multiple products in and of itself, each of which could have separate CPEs. Can/should we move affectsCpe elsewhere?

[Question] Why is problemTypes.descriptions an array of objects?

According to the problemTypes description:

This is problem type information (e.g. CWE identifier). Must contain: At least one entry, can be text, OWASP, CWE, please note that while only one is required you can use more than one (or indeed all three) as long as they are correct). (CNA requirement: [PROBLEMTYPE])

This makes sense, for the most part. But what we don't understand is why problemTypes.descriptions is also an array of objects? Why would it be necessary to specify multiple problemTypes items which each have multiple descriptions items? It seems that it would make more sense to have one descriptions item per problemTypes item, since we already have the ability to specify multiple problemTypes items.

[Question] Should we use REJECTED instead of REJECT?

Throughout the schema, there is the "REJECT" keyword being used along with the RESERVED keyword. Should we change REJECT to REJECTED or perhaps RESERVED to RESERVE, for consistency's sake?

v4.0 DRAFT mentions workaround, but cvelist uses work_around

DRAFT-JSON-file-format-v4 mentions a workaround JSON container, but cvelist uses "work_around".

Remove MITRE reference from dataFormat

The header of the current 5.0 draft currently looks like this:

{
    "dataType": "CVE",
    "dataFormat": "MITRE",
    "dataVersion": "5.0",
    "cveDataMeta": {},
    "containers": {}
}

The dataFormat is required to be MITRE. This should be changed to CVE_RECORD or something similar.

Add missing descriptions.

Currently marked with TODO

Proposal for cleaning up supporting tooling in this repo

This repo contains a lot of random scripts and supporting tooling that doesn't seem to have been updated as the schema evolves. So the following is a series of questions and/or proposals for what to do with these files.

Here is the current list:

cve-schema $ tree schema/v5.0/support/ | head -n -2
schema/v5.0/support/
├── CVE_4_to_5_converter
│   ├── 2020run.txt
│   ├── all22years.txt
│   └── cve4to5up.py
├── Node_Validator
│   ├── cvss-v2.0.js
│   ├── cvss-v3.0.js
│   ├── cvss-v3.1.js
│   ├── jsonSchema.js
│   ├── JsonValidator.js
│   ├── package.json
│   ├── package-lock.json
│   └── README.md
└── Python3.x_Validator
    ├── cvss-v2.0.json
    ├── cvss-v3.0.json
    ├── cvss-v3.1.json
    └── D7Validator.py

cve-schema $ tree tools | head -n -2
tools
├── cmdlinejsonvalidator.py
├── cna-assignment-info-to-json.pl
├── McAfee PSIRT Assigned CVEs Spreadsheet - 22 Dec 2016.xlsx
└── mitre-cna-assignment-info.js

CVE_4_to_5_converter: I assume this hasn't been updated with new schema changes so probably does not work very well. Do we still want to keep it? Should an issue be filed to update it? Or is MITRE working on a different version internally to do this in cve-services? What is the purpose of the .txt files? Can they be removed?
Node_Validator: the core of this script is a duplicate (and now outdated) version of the CVE schema in jsonSchema.js and duplicate CVSS schema copies. The validation itself uses the jsonschema dependency to validate a specific CVE record against the CVE schema. At best, we should update this to use schema files in this repo, but I would probably just recommend removing this piece altogether and have a single Python script for the purpose of validation.
Python3.x_Validator: this looks to be an inferior version of the tools/cmdlinejsonvalidator.py so I'd recommend removing it and improving the version in tools/. It too includes dupliate copies of the CVSS schema.
tools/cmdlinejsonvalidator.py: rename to cve_schema_lint.py (other suggestions welcome) and refactor it to validate files against any schema in the repo.
tools/cna-assignment-info-to-json.pl: I'm not sure what the purpose of this script is, it expects some values as input but I'm not sure what those values are. It supposedly outputs v4.0 schema records. Unless someone finds this useful, I'd remove it. If we need to keep it, I would propose moving it under schema/v4.0/tools to indicate it's 4.0-specific.
tools/McAfee PSIRT Assigned CVEs Spreadsheet - 22 Dec 2016.xlsx: unsure why this is included under tools/. Remove?
tools/mitre-cna-assignment-info.js: this is another form-based generator for v4.0 records. Move under schema/v4.0/tools?

Lastly, this repo desperately needs a GitHub action to validate each schema on every change and a restriction on merging PRs that don't pass validation. We could also throw a couple example CVE records under examples/ for each schema version to validate against. This has the added benefit of having to update a real-world example alongside a schema change.

If no one else volunteers, I'd be happy to make these changes and submit a PR after they are agreed to.

supportingMedia improve descriptions, encoding

SupportingMedia : type -explian, encoding of value.
Change encoding to enum [base64 and utf8].

Allow supporting media for description-like attributes

#/definitions/descriptions defines an array of items that have properties of lang, value, and supportingMedia that allows the representation of value in additional formats. The exploits, workarounds, and mitigations definitions are also defined as arrays with attributes lang and value. If we extracted the definition of a single description into its own definition, we could use it as a ref for the list of items for all three of the aforementioned attributes. References already follow this pattern:

"references": {
    "type": "array",
    "description": "<description truncated>",
    "items": {
        "$ref": "#/definitions/reference"
    },
    "minItems": 1,
    "maxItems": 500,
    "uniqueItems": true
},

[Question] What tag do we use in a reference object to a CVE page?

A pretty common use case for the #/definitions/reference definition's tags array attribute will be to link to a CVE web page. Of the existing tags:

"examples": [
    "Broken Link",
    "Exploit",
    "Issue Tracking",
    "Mailing List",
    "Mitigation",
    "Not Applicable",
    "Patch",
    "Permissions Required",
    "Press/Media Coverage",
    "Product",
    "Release Notes",
    "Technical Description",
    "Third Party Advisory",
    "Tool Signature",
    "VDB Entry",
    "Vendor Advisory"
]

There doesn't seem to a good choice for that though. We considered Vendor Advisory but that might link to a different page that displays the advisory fixing the CVE. As an example, for CVE-2018-1111, we might want to refer to:

https://access.redhat.com/security/cve/cve-2018-1111

while this:

https://access.redhat.com/errata/RHSA-2018:1453

would be a reference with tag Vendor Advisory. Do we just make up a tag for the former CVE page? Technical Description is also not exactly accurate since our CVE pages are more of executive summaries with technical details included in the related Bugzilla bug or in advisories.

It's also not exactly clear what some of these tags are supposed to represent. What does Product mean? What kind of a reference is Not Applicable (might as well not list it)?

Schema uses DataMeta across many properties instead of MetaData

Properties are labeled as XXXDataMetaXXX which appears to be non-standard phrasing.
Request relabeling to "MetaData" avoid possible confusion.

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 279 in d493c6b

"cveDataMetaPublic": {

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 343 in d493c6b

"cveDataMetaReserved": {

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 379 in d493c6b

"cveDataMetaReject": {

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 412 in d493c6b

"providerDataMeta": {

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Lines 436 to 437 in d493c6b

    
           "providerDataMeta": { 
        
               "$ref": "#/definitions/providerDataMeta"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 477 in d493c6b

"providerDataMeta",

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Lines 491 to 492 in d493c6b

    
           "providerDataMeta": { 
        
               "$ref": "#/definitions/providerDataMeta"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 532 in d493c6b

"providerDataMeta"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Lines 1010 to 1011 in d493c6b

    
           "cveDataMeta": { 
        
               "$ref": "#/definitions/cveDataMetaPublic"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 1021 in d493c6b

"cveDataMeta",

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Lines 1037 to 1038 in d493c6b

    
           "cveDataMeta": { 
        
               "$ref": "#/definitions/cveDataMetaReserved"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 1048 in d493c6b

"cveDataMeta"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Lines 1063 to 1064 in d493c6b

    
           "cveDataMeta": { 
        
               "$ref": "#/definitions/cveDataMetaReject"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 1074 in d493c6b

"cveDataMeta"

v4.0 DRAFT mentions both !>= and !=> version_affected operators

I noticed that the version_affected section in the v4.0 DRAFT CVE JSON Schema document mentions both the !=> but lists the !>= operator in the table. Additionally, CVE files contain the !=> operator (ex: 2018/6xxx/CVE-2018-6339.json), while others contain the !>= operator (ex: 2019/1xxx/CVE-2019-1573.json. Is this a typo, or is !=> not fully documented?

Add `format` constraint to URIs and dates where appropriate

There was consensus on previous QWG calls to add format constraints to URI and date properties. Need to review the 5.0 format to make sure all appropriate properties have been updated.

Create a tag registry as a simple JSON file with list of tag values in an array

Add tags schema definitions.
(1) ADP
(2),CNA

eg.,

[ "unsupported-when-assigned",
"exclusively-hosted-service",
"disputed"
]

Create a tag registry for URL/reference tags

Product Object needs clarification and potential changes

We decided to test out the current v5.0 schema in a practical application with a real CVE, and we discovered that the product object's properties are at least confusing, and may need to be refactored.

Let's use an example. Let's say we are shipping package dnsmasq in product Red Hat Enterprise Linux 7. Looking at the current schema, the immediately obvious thing to do would be to provide Red Hat Enterprise Linux 7 as the productName, as indeed, it is the product's name. However, moving on to the other properties is where the confusion begins.

The modules array has an item description of:

Name of the affected component, feature, module, sub-component, sub-product, API, command, utility, program, or functionality (optional).

programRoutines is described as:

Name of the affected source code file, function, method, subroutine, or procedure (optional).

There is potentially some ambiguity there with modules because "functionality" and "API" is listed in modules and "function" is listed in programRoutines.

Moving on, as we understand it, packageName is supposed to be the name of the package only if different from the productName. IIRC the intention here was for example if the CVE was in the tool opj_compress but that tool lived in https://github.com/uclouvain/openjpeg , we could set productName to opj_compress and packageName to openjpeg. Thus, the packageName will route correctly using the collectionUrl to the location of the product even if the two names are different.

Going back to my original example, this creates a lot of confusion and I'm not sure how we could use this in practice. By way of exercise:

Our product is Red Hat Enterprise Linux 7. This would mean that in the dnsmasq example, dnsmasq must be listed as a module of our product. The issue with this is that the collectionUrl for the dnsmasq package, and the packageName would be separate from the productName. However, IIUC as aforementioned, packageName is supposed to map to the product itself. So I would expect that the collectionUrl + packageName for RHEL7 would actually lead me to the download page for the entire RHEL7 image with those semantics. Finally, modules is an array, which means it cannot be used with packageName, which is just a string. There would be no way to map modules with their packages in a repository, and therefore modules is not suitable for this type of usage.

E.G. RHEL7 dnsmasq package would be for example at https://access.redhat.com/downloads/content/dnsmasq/2.76-17.el7_9.1/x86_64/fd431d51/package

Additionally, we would need to represent both the Red Hat Enterprise Linux product version and the dnsmasq package version. How would this be represented in the current product schema? It's not clear. In our example, we actually put "Red Hat Enterprise Linux 7" in the productName property. However, there is the versions property, and we are not sure if that corresponds to the entity named in productName or the module. Additionally, since modules is an array, so as you can see, this gets more confusing and likely to be misinterpreted or faulty.

Having said all of this, I propose the following:

Disambiguate the description of productName, modules, and programRoutines items.
Determine whether versions applies to the entity named in productName or module
Depending on 2, provide a way to specify both a product and a module version. If modules is going to be an array, maybe the versions should be defined and then referenced in there and at the product level?

There may be even more complex chains of products that would best be represented here. For example, One can install gcc-toolset 10 which contains the binutils package inside of Red Hat Enterprise Linux 8. How would this be represented? Because now we have a product and its version, a module and its version, and finally a package, and its version, and so on.

Since modules is an array, we could put gcc-toolset-10 in it, along with binutils, and hopefully corresponding versions, but there may be a better approach.

Descriptions for dateRequested and dateAssigned need clarification.

Many of the dates in the cveMetadataPublished section are ambiguous.

dateRequested says "the date/time this issue was requested." Does this mean it is the date the CVE ID was reserved or is there some other request involved?

dateAssigned says "the date/time this was assigned." What is "this?" Is it the CVE record or the ID? What is it assigned to?

Review CVE Record 5.0 fields to ensure maximum lenght constraints are suffeciently enforced

Currently there are some fields (i.e., description/value) that have no maximum length constraints. All fields should be reviewed to ensure that reasonable length constraints are enforced.

[Question] What is the correct way to specify the vendor's security impact rating?

Red Hat uses the Low/Moderate/Important/Critical impact scale for rating the severity of every security issue (see Understanding Red Hat security ratings). We'd like to include this information in the CVE record but are unclear where to best include this.

Do we include it in impacts as:

"impacts": [
  {
    "descriptions": [
      {
        "lang": "en",
        "value": "Moderate"
      }
    ]
  }
],

But that provides absolutely no context around the value itself.

Do we use metrics?

"metrics": [
  {
    "other": {
      "type":"Red Hat severity rating",
      "content": {
        "value": "Moderate",
        "namespace": "https://access.redhat.com/security/updates/classification"
      }
    }
  }
],

Bit better since at least type tells you what the value is. But the content object has literally no defined restrictions so I imagine it will be used differently by every vendor out there.

Do we define a custom x_redhat_severity attribute in the cna/adp container? Also not ideal.

An additional hurdle is that we may also provide product-specific impacts when they are lower than those of the vulnerability in general. As an example, CVE-2019-10161 had an impact of Important except for libvirt as shipped with RHEL 6 where the impact was lowered to Moderate. Could we perhaps improve the product container to contain product-specific impact ratings?

v4.0 DRAFT says affects is mandatory at the root-level, but often is not

I noticed that in the v4.0 DRAFT affects is Mandatory in the root-level. However, many CVE files in cvelist omit "affects".

Discrepancy between v4.0 .schema file and DRAFT markdown document

After reviewing the CVE_JSON_4.0_min_public.schema JSON Schema definition and then reading the DRAFT-JSON-file-format-v4.md document, I noticed that the .schema file was missing many of the keys/Objects described in DRAFT-JSON-file-format-v4.md (ex: configuration, impact, version_affected). Is this intentional or a mistake? Is CVE_JSON_4.0_min_public.schema intended to be a minimal subset of the CVE v4.0 JSON Schema? If so, does a complete JSON Schema definition file exist for CVE v4.0?

language definition pattern not allowed in ECMAScript

Thanks to @mprpic for pointing this out.

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 1017 in a7a19cc

"pattern":"^[[:alpha:]]{2,3}(?:$|-[[:alnum:]]{2,3}$)"

uses characters classes not allowed in the ECMAScript regex flavor. JSON Schema itself recommends only this regex syntax: https://json-schema.org/understanding-json-schema/reference/regular_expressions.html. Also seems to be inconsistent with other similar regex formats already in the schema.

CVSS schema error

Hi,

There's an error in the CVSS 3.0/3.1 schemas with the privilegesRequiredType and modifiedPrivilegesRequiredType properties: an extra U in their regex patterns (I guess, or maybe a missing enum's member).

PR:[UNLH] → [ "HIGH", "LOW", "NONE" ]
MPR:[XUNLH] → [ "HIGH", "LOW", "NONE", "NOT_DEFINED" ]

cve-schema/schema/v5.0/imports/cvss/cvss-v3.1.json

Line 111 in f25944d

    
           "pattern": "^CVSS:3.1/((AV:[NALP]|AC:[LH]|PR:[UNLH]|UI:[NR]|S:[UC]|[CIA]:[NLH]|E:[XUPFH]|RL:[XOTWU]|RC:[XURC]|[CIA]R:[XLMH]|MAV:[XNALP]|MAC:[XLH]|MPR:[XUNLH]|MUI:[XNR]|MS:[XUC]|M[CIA]:[XNLH])/)*(AV:[NALP]|AC:[LH]|PR:[UNLH]|UI:[NR]|S:[UC]|[CIA]:[NLH]|E:[XUPFH]|RL:[XOTWU]|RC:[XURC]|[CIA]R:[XLMH]|MAV:[XNALP]|MAC:[XLH]|MPR:[XUNLH]|MUI:[XNR]|MS:[XUC]|M[CIA]:[XNLH])$"

Disclaimer: I know from #17 that this is not upstream, but I didn't find a CVSS public repository at @FIRSTdotorg or elseweb.

cc: @ViperGeek @dariuswiles

[Question] Per-product exploit, workaround, and mitigation attributes

Currently, the exploit, workaround, and mitigation attributes are defined on the top level of the CNA and ADP containers. Oftentimes though, this information is specific to a particular affected product version (i.e. a mitigation that applies to a particular version, or a particular platform). Would it make sense to allow them to be used within the product object instead?

taxonomyMappings needs to be in adp and cna containers

taxonomyMappings is currently only under definitions. Needs to have a field in cna and adp containers to make it appear in the schema.

Improve credit section in the schema

Example:

credits: [
    value: "John Doe"
    // one person or one org
    user: "account in registry"
    type: [
    "researcher"
    "patch"
    "developer"
    "reporter
    "tool"
    "sponsor"
    ]
]

remove affectsSwid

Mostly a reminder to not forget this one - we discussed removing affectsSwid from v5.0 as it's not really in use.

ReplacedBy better served as an array

The ReplacedBy property is currently a string with expected comma separation. Wouldn't this be better served as an array instead of having users parse out each value between commas?

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Lines 323 to 326 in d493c6b

    
           "replacedBy": { 
        
               "type": "string", 
        
               "description": "a single CVE ID or list of CVE IDs (comma separated)", 
        
               "pattern": "^(CVE-[0-9]{4}-[0-9]{4,})\\s*(,\\s*CVE-[0-9]{4}-[0-9]{4,})*$"

Add a pattern to ISO 639-2 properties to enforce the language code format

Add a pattern constraint to all ISO 639-2 properties to enforce the language code format.

Add descriptions where it says TODO

JSON user guide, provide common scenarios and examples

use wiki

[Question] Clarification of the `source` attribute

Both the CNA and ADP containers include a source attribute that is defined as:

"source": {
    "type": "object",
    "description": "This is the source information (who discovered it, who
        researched it, etc.) and optionally a chain of CNA information (e.g.
        the originating CNA and subsequent parent CNAs who have processed it
        before it arrives at the MITRE root).\n Must contain: IF this is in the
        root level it MUST contain a CNA_chain entry, IF this source entry is
        NOT in the root (e.g. it is part of a vendor statement) then it must
        contain at least one type of data entry.",
    "minProperties": 1
},

What is the use case for this object? Can we get an example of its intended values? Vulnogram seems to use it to generate:

"source": {
    "advisory": "<CNA specific bug tracking IDs>",
    "defect": [<CNA specific advisory IDs (Optional)>],
    "discovery": "<some value>"
}

but none of that is defined in the schema and the values seem fairly arbitrary (assuming they will remain the same for 5.0).

define affectsCpe

URLs should have JSON format "uri"

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 17 in 9d03606

"type": "string",

Invalid JSON in master

Current master has invalid JSON due to a missing comma that was introduced in bdca526 . I submitted a PR to fix it: #33

Update up-converter for JSON 4.0 to 5.0

Use of the term vendor does not encompass open source projects effectively

Please see oasis-tcs/csaf#247

The same applies here. Thanks to @kestewart for bringing this up.

computable open-source version information

Background

The OSV schema has been adopted by Go, OSV, Python, Rust, and UVI to describe vulnerabilities in open-source software. The OSV schema’s key advantage over the CVE format is that it identifies the specific affected packages and versions in a precise, computable way.

For example, suppose we wanted to check whether a particular software package, as described by an SBOM, made use of any open-source components with known vulnerabilities. An SBOM for a given package ecosystem would be a list of its packages and versions. A tool can test whether each SBOM entry is affected by a database entry written to the OSV schema, without any additional information (such a version or commit graph or access to the repository containing the source code for the open-source software). This is what we mean when we say the package and version identification is computable.

We propose that the new CVE JSON schema be changed to make its package and version identification computable too. This would make it possible for vulnerability-checking tools to check SBOMs against the CVE database as easily as they can currently check SBOMs against OSV-schema databases. Adjusting the CVE JSON schema would also allow OSV-schema databases to embed their information into CVE format, allowing all their vulnerability information to be pushed upstream to the CVE database and then propagated to any CVE-aware software, a net benefit for the entire software ecosystem.

This issue focuses on computable version identification. See issue #86 for computable package identification.

Computable version identification

After identifying that a particular package listed in an SBOM matches a package in a CVE database entry (#NNN), a vulnerability scanner must next identify whether the specific version in the SBOM is considered affected by the CVE. The entry must include self-contained information sufficient to make this decision algorithmically. The current schema does not satisfy this requirement (or else it is unclear how it does).

What is the algorithm for deciding if a version is considered affected? The current spec does not provide details on how to evaluate the rules. At the start, it is unclear whether the “versions” list must be grouped by “versionGroup” before further processing, so we’ll suppose there is a single group in our examples. It was also unclear which logical operator to apply to the version entries. Issue #12 says that rules should be evaluated with AND, which makes it impossible to list individual versions. For example:

"versions": [
  {"versionAffected": "=", "versionValue": "1.0.0"},
  {"versionAffected": "=", "versionValue": "1.1.0"},
]

The explanation in #12 is that this means “version = 1.0.0 AND version = 1.1.0”, which doesn’t match any version at all.

According to the answer in #12, expressing multiple disjoint ranges of versions is also not possible. For example:

"versions": [
  {"versionAffected": ">=", "versionValue": "1.0.0"},
  {"versionAffected": "<", "versionValue": "1.2.0"},
  {"versionAffected": ">=", "versionValue": "1.5.0"},
  {"versionAffected": "<", "versionValue": "1.6.0"},
]

Here it seems clear the intended interpretation would be

(version >= 1.0.0 AND version < 1.2.0) OR (version >= 1.5.0 AND version < 1.6.0),

but there is no obvious way to encode this. Using ! operators would also not work. There is no boolean normal form with only one logical operator (that is, only AND, or only OR).

A second, related problem with the current schema is that even the definitions of operators like “>=” are not algorithmically precise. Clearly these are not string comparisons: 1.2.0 < 1.10.0. But neither are they simple element-wise comparisons: in packagers using Semver, 1.2.0 > 1.2.0-alpha. In Maven, even the alphabetic parts do not compare with strict regularity. In particular, this ordering applies:

"alpha" < "beta" < "milestone" < "rc" = "cr" < "snapshot" < "" = "final" = "ga" < "sp"

An operator like “>=” cannot be applied without reference to a particular version ordering algorithm, and the CVE schema omits that information.

The different operator variants are also confusing. For example, is there any difference between these two?

"versions": [
  {"versionAffected": ">=", "versionValue": "1.0.0"},
  {"versionAffected": "<", "versionValue": "1.2.0"},
]

"versions": [
  {"versionAffected": ">=", "versionValue": "1.0.0"},
  {"versionAffected": "!>=", "versionValue": "1.2.0"},
]

Or is this one any different from those two?

"versions": [
  {"versionAffected": ">=", "versionValue": "1.0.0"},
  {"versionAffected": "<", "versionValue": "1.2.0"},
  {"versionAffected": "!>=", "versionValue": "1.3.0"},
]

The result of “is this version affected?” should be a boolean yes/no, or at worst yes/no/maybe, but the current operators allow yes/no/maybe/undocumented, with no guidance as to what CVEs should do. Should tools treat “no” differently from “undocumented”? Is it a best practice to document all the negative ranges too? Why?

The CVE schema needs to address these deficiencies so that tools have clear algorithms for deciding whether a particular version is affected by a particular CVE.

OSV’s solution

The OSV schema addresses all these ambiguities as follows, which we suggest CVE adopt the basic ideas of. This is not the only possible solution but we believe it is a good one.

The OSV schema supports both an enumeration of specific affected versions and an enumeration of specific affected ranges. The set of affected versions is the OR of the entries in these lists - there is never an AND.

A range specifies a contiguous range of versions according to some defined version ordering. Today, those are “SEMVER” (preferred), “GIT”, and “ECOSYSTEM”. The “GIT” and “ECOSYSTEM” (meaning “packager-defined ordering”) range types are not directly understandable by general-purpose tools; such ranges are extra information understandable only by special-purpose tools. A particular entry is required to ensure that all affected versions are either listed in the explicit enumeration or in a Semver-type range, both of which can be processed by standard, packager-independent algorithms.

Each range is an object with three fields: type (the ordering), introduced, and fixed. The affected versions are those >= introduced and < fixed. If introduced or fixed are omitted, then that end of the range is left open.

For packagers that use Semver ordering, such as Go, NPM, and Rust, it suffices to specify only ranges:

"affects": {
  "ranges": [
    {"type": "SEMVER", "introduced": "1.0.0", "fixed": "1.14.14"},
    {"type": "SEMVER", "introduced": "1.15.0", "fixed": "1.15.17"}
  ]
}

For packagers that use other orderings, a packager-specific range can be listed, but the packager’s own vulnerability database tooling must “compile out” the range into an explicit list as well, for consumption by general-purpose tools, as in this Python example:

"affects": {
  "ranges": [
    {
      "type": "GIT",
      "repo": "https://github.com/pikepdf/pikepdf",
      "fixed": "3f38f73218e5e782fe411ccbb3b44a793c0b343a"
    },
    {
      "type": "ECOSYSTEM",
      "introduced": "2.8.0",
      "fixed": "2.10.0"
    }
  ],
  "versions": [
    "2.8.0", "2.8.0.post1", "2.8.0.post2", "2.9.0", "2.9.1", "2.9.2"
  ]
}

(The “GIT” range has an additional field “repo” to specify the URL of the source repository containing the given commits.)

The “versions” list specifies the same versions as in the “ECOSYSTEM” range, just in a more accessible way. General-purpose tooling would ignore the “GIT” and “ECOSYSTEM” ranges, relying instead on the “versions” list in this case.

Potential CVE adaptation

We propose to change the current version schema from:

"versions": [{
  "versionGroup": string,
  "versionValue": string,
  "versionAffected": string,
  "platforms": [string],
  "references" [...],
}],

to:

"versions": [{
  "list": [string],
  "range": {
    "type": string,  // semver, git, or packager
    "fixed": string,
    "introduced": string,
    "repo": string,  // for type git only
  },
  "unsure": bool,
  "platforms": [string],
  "references" [...],
}],

The only combining operator is OR, making the algorithm for matching much clearer. A particular version would be considered affected if it is matched by any of the entries in the overall “versions” object list. A version is matched by an entry if it appears directly in the “list” or if it is in the “range”. This structure allows non-standard ranges to include their version lists in the same object, which is an improvement over the OSV schema, and it allows a particular range or list to be qualified by a “platform” list as well.

The “unsure” entry allows a range or list to be marked as unsure, equivalent to using the current ?>= etc operators.

The current !>= etc operators are removed: to say that a version is unaffected, leave it unlisted.

Add referenced JSON subschemas to repository

The 5.0 JSON format references a few external JSON schema. These should be added to the repository to allow the referenced schemas to be resolved.

For example:

"cvssV3_1": {
    "$ref": "file:cvss-v3.1.json"
},
"cvssV3_0": {
    "$ref": "file:cvss-v3.0.json"
},
"cvssV2_0": {
    "$ref": "file:cvss-v2.0.json"
}

Should be updated to resolvable locations.

Use consistent plural attribute names for array fields

The following attribute names in both the CNA and ADP containers should be pluralized since they refer to definitions that are themselves arrays:

"configuration": {
    "$ref": "#/definitions/configurations"
},
"workaround": {
    "$ref": "#/definitions/workarounds"
},
"exploit": {
    "$ref": "#/definitions/exploits"
},
"credit": {
    "$ref": "#/definitions/credits"
},

issue templates for Tag request/proposal

Completeness of the affected field.

field to capture if the affected field is complete wrt vendors/products/versions.

Give example of overlapping version_data

Applicable to 4.0 and 5.0, version_data description includes "...This also allows more complex statements such as "Product X between versions 10.2 and 10.8" to be put in a machine-readable format." but it's not entirely clear how you'd represent it, i.e. let's say 2.4.x is affected between versions 2.4.33 and 2.4.43 are (the fix is in 2.4.44), would you do

{ version_value:"2.4", version_affected:"<",version_value:"2.4.44" },
{ version_value:"2.4", version_affected:">=",version_value:"2.4.33" }

or is it expected to use the !<

{ version_value:"2.4", version_affected:"<",version_value:"2.4.44" },
{ version_value:"2.4", version_affected:"!<",version_value:"2.4.33" }

Do we know if anything is actually machine parsing the affected versions? Given the Mitre entries are mostly plaintext (i.e. simply { "version_value": "2.4.33 to 2.4.43" }

CVSS schema $id

(Comes from #57)

I agree that the bugfix is a very minor change and it's unlikely to break some workflow, so it probably doesn't justify a bump. But the query is part of the canonical URI, so both effectively are two different schemas although the URI of the first one leads to the second one.

This kind of redirection is not an issue at all for an HTML page, which can be easily consolidated by a rel=canonical header or a 301 status code. However, for an application modeled after a JSON Schema with a reachable URI (i.e. a URI that is also a URL), that can easily become a problem (e.g. when it comes to dereferencing pointers).

Two simple solutions could be:

Treat the query as the patch number and just serve both versions.
Remove the query and rely on server headers to avoid caching:
- HTTP 1.0: Pragma: no-cache
- HTTP 1.1: Cache-Control: no-cache, no-store, must-revalidate
- Proxies: Expires: 0

IMHO, the first option it's the most reliable.

add dateRejected to Published.cveMetadata

In the Published.cveMetadata object, when “state” is “PUBLISHED”, “datePublished” can be used to find the date when the CVE was published. But when “state” is “REJECTED”, there is no corresponding date.

We propose to add “dateRejected” to fill this gap.

Include support for root cause CVE tags

We previously discussed including support for CVE level tags (which can be applied to the CNA or ADP containers) that assist in identification of root cause.

Tag	Definition
Hardware Root Cause	Tag this to a CVE if the primary root cause of the security vulnerability is originated from the hardware component of the affected product(s). The intent is to facilitate Hardware Designers to learn how to prevent similar weakness. Even when a hardware vulnerability can be addressed by a SW workaround, the “Hardware Root Cause” tag should still be applied, since the focus is on how the issue is introduced, not how it is remediated.
Software Root Cause	Tag this to a CVE if the primary root cause of the security vulnerability is originated from the software component of the affected product(s). The intent is to facilitate Software Developers to learn how to prevent similar weakness.

This could be expanded to include other concepts such as protocol or specification root causes. Ex:

Tag	Definition
Specification Root Cause	Tag this to a CVE if the primary root cause of the security vulnerability is originated from the industry specification that the affected product(s) comply with. The intent is to facilitate Industry Specification Groups to learn how to prevent similar weakness. If the root cause of the CVE is related to inappropriate adoption of an industry standard (e.g., use of an obsolete cryptographic algorithm) or incorrect implementation of an industry standard (e.g., product does not implement the error recovery flow as captured in the protocol specification) in the affected product(s), the appropriate “Hardware Root Cause” or “Software Root Cause” should be applied instead.

Product mappings not machine readable?

Hello, I was reviewing the new CVE schema, and it appears that the product mappings are still not machine readable. For consumers of the data, this means that they typically need to have humans translate the strings in the product section into something actionable.

Is this something that the project would consider addressing? There's another effort going on around software descriptions within the SPDX project where they have identified a number of existing formats that could be used to describe a piece of software. I think there's a lot of value in supporting these kinds of mappings. https://spdx.github.io/spdx-spec/v3-draft/external-repository-identifiers/

Revisit the idea of having version required in the description

I continue to question the reason for requiring version information in the description. It should be optional. I've been told it's so reader's can quickly see the version info w/o having to look in the version section. To me, that's just a display issue. Anyone who is purposely going to read the JSON will know how to find the version info. Anyone who's looking at the record in an application can easily be shown the description and the version info, in whatever manner they prefer. When the version info is simple, we're just putting the same info in the JSON twice which is nonsensical. When the version info is complex, you are putting version info in 2 places, and that version info is unlikely to match because we just said that it's complex.

I propose version info should be required in the record as it is already, and the only version info in the description should be clarification, as needed, to explain complex version info to a human reader. In both cases, when an application is showing a rendered view of the record, it should decide where and how to present the version info and the description and anything else it deems important. Carrying the same information in two places is always a red flag and I feel like we're mixing presentation layer info in our data. In MVC terms, we are mixing Model (data) and View (presentation) layers which, similarly, is also always a red flag

cve-schema/schema/v5.0/CVE_JSON_5.0.schema

Line 608 in 9d03606

    
           "description": "Affected products defined by CPE. This is an array of CPE values (vulnerable and not), we use an array so that we can make multiple statements about the same version and they are separate (if we used a JSON object we'd essentially be keying on the CPE name and they would have to overlap). Also this allows things like cveDataVersion or cveDescription to be applied directly to the product entry. This also allows more complex statements such as \"Product X between versions 10.2 and 10.8\" to be put in a machine-readable format. As well since multiple statements can be used multiple branches of the same product can be defined here.",

I'm going to "cc" a few folks to hopefully spur some conversation
@mprpic @mattrbianchi @chandanbn @jwhitmore-mitre @david-waltermire-nist (I don't think this will work for Dave as github doesn't recognize him on this repo, but whatevs)

computable open-source package information

Background

This issue focuses on computable package identification. See issue #87 for computable version identification.

Computable package identification

The lack of computable package identification was also raised in issue #79. In that discussion, it was suggested to use the combination of collectionUrl and packageName as a precise identifier. This could be sufficient, provided each ecosystem publishes the exact spelling of its standard “collectionUrl” and the syntax of its “packageName”. To avoid misspellings and other problems, ideally there should be a canonical list of “collectionUrl” values, or a canonical list of links to the pages where ecosystems have defined their own “collectionUrl” and “packageName” syntaxes. (Presumably there is an equivalent list of canonical “vendorName” values.)

It is unclear, however, why the collectionUrl and packageName are nested under the “vendor” and “product” keys. What would it mean for the same collectionUrl/packageName to appear with different “vendorName” or “productName” values? The packager’s URL should be sufficient to identify a collection of packages. “Vendor” and “product” are attributes that make sense for commercial software identified by a plain-language name, but not for URL-scoped open-source software.

If CVE is to support robust open-source vulnerability tooling, it should name the packages clearly and simply. To correct the scoping problem, the top-level structure of the “affected” object needs to be changed to not be so vendor-centric.

One possible solution would be to simplify affected > vendors > products and affected > cpes nesting down to just “affected”. That is, replace:

{
  "affected": {
    "vendors": [{
      "vendorName": string,
      "products": [{
        "productName": string,
        "modules": [string],
        "programFiles": [string],
        "programRoutines": [string],
        "packageName": string,
        "collectionURL": string,
        ...
      }]
    }],
    "cpes": [...],
  }
}

with

{
  "affected": [{
    "vendorName": string,
    "productName": string,
    "packagerUrl": string,
    "packageName": string,
    "cpe": string,
    "modules": [string],
    "programFiles": [string],
    "programRoutines": [string],
    ...
  }]
}

Each affected object would be required to have at least one of (1) vendorName and productName, (2) packagerUrl and packageName, or (3) cpe. It would be fine to list more than one of these if there are multiple clear ways to identify the package, but open-source vulnerability scanners would use (2).
We have renamed collectionUrl to packagerUrl to make the connection to packageName clearer.

Of course, there may be other ways to present this data. For example, perhaps it would make the requirements clearer to group the vendor and open-source package info as in:

{
  "affected": [{
    "product": {
      "vendorName": string,
      "productName": string,
    },
    "package": {
      "packagerUrl": string,
      "packageName": string,
    },
    "cpe": string,
    "modules": [string],
    "programFiles": [string],
    "programRoutines": [string],
    ...
  }]
}

and then the requirement would be more simply stated as “at least one of product, package, or cpe must be present.”

Another possibility would be to say that each packager it itself a vendor, but that still leaves the question of ensuring that open-source packagers have a canonical identification, as well as what “productName” means versus “packageName”.

This general topic is also raised by #70, #78, and #79. What is important is that the CVE schema make clear how to write and access a record that treats the combination of packager URL and package name as the unique identifier for an open-source package.

I would be happy to prepare a PR if there is consensus here on the general direction of the path forward.

reintroduce packageName.

Should we have two fields for naming a package in a repo vs productName?
Should they be mutually exclusively?
or is there a use for for supporting both for the same package?

	"providerDataMeta": {
	"$ref": "#/definitions/providerDataMeta"

	"replacedBy": {
	"type": "string",
	"description": "a single CVE ID or list of CVE IDs (comma separated)",
	"pattern": "^(CVE-[0-9]{4}-[0-9]{4,})\\s(,\\sCVE-[0-9]{4}-[0-9]{4,})*$"

cveproject / cve-schema Goto Github PK

cve-schema's Introduction

Current Version of CVE Record Format

CVE Record Format Overview

Learn

Latest

Documentation and Guidance

Examples

cve-schema's People

Contributors

Stargazers

Watchers

Forkers

cve-schema's Issues

Background

Computable version identification

OSV’s solution

Potential CVE adaptation

Background

Computable package identification

Recommend Projects

Recommend Topics

Recommend Org