Current Implementation
For every component within a BOM uploaded to the API server, the API server will publish an event to the EventNew
Kafka topic.
Those events currently have the form:
Key |
Value |
Project UUID |
Component Details |
Example
ebb10845-8f95-4194-85c0-0ff6c5ab3cdf
{
"uuid": "445dc140-5638-4eb7-9409-53204d7f3cae",
"group": "xerces",
"name": "xercesImpl",
"version": "2.12.2",
"purl": "pkg:maven/xerces/[email protected]?type=jar",
"cpe": null
}
The Kafka producer used by the API server utilizes the default partitioner, meaning that events with the same key will always end up in the same topic partition.
Kafka streams applications (read: consumer groups) in the analyzer application consume from the EventNew
topic. At the time of writing, those applications are:
Application Name |
Class |
OSSConsumer |
org.acme.consumer.OSSIndexBatcher |
SnykAnalyzer |
org.acme.consumer.SnykAnalyzer |
Quoting the streams architecture documentation:
Kafka Streams creates a fixed number of stream tasks based on the input stream partitions for the application, with each task being assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of stream partitions to stream tasks never changes, hence the stream task is a fixed unit of parallelism of the application.
Applying this to our current implementation, this means that events for the same project UUID will always end up being processed by the same streams task (maps to a JVM thread) within a streams application.
Example
- The
EventNew
topic is created with 3 partitions
- A BOM with 200 components is uploaded to the DT project with UUID
ebb10845-8f95-4194-85c0-0ff6c5ab3cdf
- The API server sends 200 messages with key
ebb10845-8f95-4194-85c0-0ff6c5ab3cdf
to the EventNew
topic
- The default partitioner assigns all 200 events to partition
1
- The streams applications OSSConsumer and SnykAnalyzer are started with 3 threads each
- Thread
1
of each streams application is assigned to partition 1
of the EventNew
topic
- Both thread
1
s process the 200 events, while threads 0
and 2
of both Streams applications remain idle
Analyzers perform lookups with external services (OSS Index, Snyk, VulnDB APIs), unless they experience a cache hit for the component at hand. They will emit messages of the following form to the vuln-result
topic:
Key |
Value |
Component UUID |
Vulnerability Details (may be null when no vuln has been found) |
Example
445dc140-5638-4eb7-9409-53204d7f3cae
{
"vulnerability": {
"vulnId": "CVE-2017-10355",
"source": "NVD",
"description": "sonatype-2017-0348 - xerces:xercesImpl - Denial of Service (DoS)\n\nThe software contains multiple threads or executable segments that are waiting for each other to release a necessary lock, resulting in deadlock.",
"references": "* [https://ossindex.sonatype.org/vulnerability/sonatype-2017-0348?component-type=maven&component-name=xerces%2FxercesImpl&utm_source=unknown&utm_medium=integration&utm_content=Alpine](https://ossindex.sonatype.org/vulnerability/sonatype-2017-0348?component-type=maven&component-name=xerces%2FxercesImpl&utm_source=unknown&utm_medium=integration&utm_content=Alpine)\n* [https://blogs.securiteam.com/index.php/archives/3271](https://blogs.securiteam.com/index.php/archives/3271)",
"cwes": [
{
"cweId": 833,
"name": "Deadlock"
}
],
"severity": "MEDIUM",
"affectedProjectCount": 0
},
"identity": "OSSINDEX_ANALYZER"
}
Using the component UUID from the message key, the API server can easily correlate the message with a specific component in the portfolio.
Benefits
- ✅ Each event in
EventNew
represents a component in DT and thus a nicely encapsulated unit of work for the analyzer
- ✅ Easy correlation of
vuln-result
events to components in the portfolio
Drawbacks
- ⛔ Projects with many components can clog a topic partition, keeping one streams task super busy while others run idle
- Benefits of parallelizing the analysis work is done at the project-, rather than at the component-level
- ⛔ DT can consider components to be different, despite them having idential PURLs or CPEs. OSS Index, Snyk etc. don't do that, so triggering a scan for each DT component will result in many redundant calls
- ⛔ Because the same PURL or CPE may be analyzed in multiple stream tasks at once, there will be race conditions for cache lookups, again causing redundant calls to external services
- ⛔ The cache lookup issues and redundant calls mentioned above contribute to faster exceeding of rate limits emposed by the external services
- ⛔ If we want to support ad-hoc scanning of components or BOMs for which no project in DT exists, we can't rely on the project or component UUID to always be available for message keys
Note
Points (2) - (4) exist in vanilla DT, too.
Proposed Solution
Both of the following variants are based on option 2 in Alioune's comment here: DependencyTrack/dependency-track#2023 (comment) (and I think it is also what he was referring to in syalioune#1 (comment)):
- Using a combination of object pools and sharding based on queues (in-memory or not) : Having a pool of analyzer objects with the proper logic fetching components to process from a dedicated queue.
- Upstream process have to be updated to always send (or it can be wrapped) identical purls to the same queue (using some kind of hashed based partitioning).
This way, there would be no need for synchronization between the analyzer objects as identitical purls would always be processed by the same analyzer
Variant 1: Only work with PURL / CPE / SWID Tag ID
Instead of using the project UUID as message key, we use the identifiers used for vulnerability analysis:
- PURL
- CPE
- SWID Tag ID (at a later point in time)
- etc.
Further, the entire analysis process will happen without any relations to component identities in DT. There will be no IDs or UUIDs of components or projects transmitted.
Note
There is a complication regarding PURLs, in that they can contain qualifiers and sub-paths.
For example, pkg:maven/com.acme/[email protected]
and pkg:maven/com.acme/[email protected]?type=jar
are technically different, but describe the same component, and are treated as equal by all (currently known) analyzers.
We could implement a custom Kafka partitioner that would ensure that pkg:maven/com.acme/[email protected]
and pkg:maven/com.acme/[email protected]?type=jar
end up in the same partition. The partitioner would treat PURLs as equal, as long as their coordinates (type, namespace, name, version) are the same.
A similar strategy will be necessary for CPEs, too.
Key |
Value |
PURL / CPE / SWID Tag ID |
Nothing / Additional Details |
Example
pkg:maven/com.acme/[email protected]?type=jar
cpe:2.3:a:apache:xerces2_java:*:*:*:*:*:*:*:*
Results emitted by the analyzers would then have the form of:
Key |
Value |
PURL / CPE / SWID Tag ID |
Vulnerability Details (may be null when no vuln has been found) |
Example
pkg:maven/com.acme/[email protected]?type=jar
{
"vulnerability": {
"vulnId": "CVE-2017-10355",
"source": "NVD",
"description": "sonatype-2017-0348 - xerces:xercesImpl - Denial of Service (DoS)\n\nThe software contains multiple threads or executable segments that are waiting for each other to release a necessary lock, resulting in deadlock.",
"references": "* [https://ossindex.sonatype.org/vulnerability/sonatype-2017-0348?component-type=maven&component-name=xerces%2FxercesImpl&utm_source=unknown&utm_medium=integration&utm_content=Alpine](https://ossindex.sonatype.org/vulnerability/sonatype-2017-0348?component-type=maven&component-name=xerces%2FxercesImpl&utm_source=unknown&utm_medium=integration&utm_content=Alpine)\n* [https://blogs.securiteam.com/index.php/archives/3271](https://blogs.securiteam.com/index.php/archives/3271)",
"cwes": [
{
"cweId": 833,
"name": "Deadlock"
}
],
"severity": "MEDIUM",
"affectedProjectCount": 0
},
"identity": "OSSINDEX_ANALYZER"
}
Benefits
- ✅ Kafka streams guarantees us that the same PURL will be processed by the same streams task, solving the problem of race conditions in cache lookups
- ✅ Consequently, processing the same PURL multiple times is not an issue, because caching is more effective
- ✅ The API server can perform best-effort de-duplication of those identifiers before sending them off to Kafka. That way, a BOM upload to the same project should never result in duplicate PURL / CPE events. This can contribute to less overall load on the system.
- ✅ Streams tasks additionally get a chance to perform further de-duplication, so they don't process the same PURL multiple times within a window / batch. Duplicate PURL events can simply be discarded.
- ✅ Simplification of the recurring analysis of the entire portfolio: Instead of iterating over all individual components every X hours, iterate over all unique PURLs, CPEs, SWID Tag IDs in the entire portfolio and send them to Kafka
- This has a potential to drastically reduce the effort and time needed to analyze the entire portfolio
- ✅ Vulnerability analysis results will be applied to all affected components in the portfolio in one go, whereas the current approach only applied them to a single component at a time
Drawbacks
- ⛔ More responsibility on the API server: Messages from the
vuln-result
topic will no longer be tied to a specific project UUID or component UUID
- Instead, the API server will have to apply the results to all components matching the given PURL / CPE
- This can be an expensive operation, but can be optimized with proper indexes and efficient use of transactions. Should be tested though
- ⛔ Batching of
EventNew
events (as required for OSS Index) will be harder (#50 (comment))
- ⛔ May not be efficient for use cases where the system is only exposed to little load, or BOMs are uploaded only sporadically (#50 (comment))
Possible Solution 2: Only change the key for EventNew
messages
As a compromise between the current solution and variant 1 as described above: Still set the message key to PURL / CPE, but include the component UUID in the message body.
Example
pkg:maven/com.acme/[email protected]?type=jar
{
"uuid": "445dc140-5638-4eb7-9409-53204d7f3cae",
"group": "xerces",
"name": "xercesImpl",
"version": "2.12.2"
}
cpe:2.3:a:apache:xerces2_java:*:*:*:*:*:*:*:*
{
"uuid": "445dc140-5638-4eb7-9409-53204d7f3cae",
"group": "xerces",
"name": "xercesImpl",
"version": "2.12.2"
}
Benefits
TBD
Drawbacks
TBD