nexb / dejacode Goto Github PK
View Code? Open in Web Editor NEWAutomate open source license compliance and ensure software supply chain integrity
Home Page: https://dejacode.readthedocs.io
License: GNU Affero General Public License v3.0
Automate open source license compliance and ensure software supply chain integrity
Home Page: https://dejacode.readthedocs.io
License: GNU Affero General Public License v3.0
When adding new packages to DejaCode, there are often several URLs for the same package (for instance when adding PyPI packages there can be multiple wheels and sdist for the same package).
Then once the scans are completed, one may update one of the packages, usually to fill in fields that are blank.
Once this is done for one package it would be great to have a convenient way to compare, sync and possibly merge the new attribute values of the updated package with the other packages created for the same version, or perhaps even gather attribute values from previous versions of the same package.
Otherwise, the user typically needs to follow an error prone and work intensive sequence of actions using multiple browser tabs and copy/paste:
Some detail design is needed of course, but this would be a great enhancement to package maintenance usability. The recent enhancements to PurlDB for the package_set
feature might be helpful here.
The results of a DejaCode Product Compare can be very useful in the ongoing analysis and communication activities that happen during product SCA or the defining of a new product version.
It would be extremely helpful for the User to be able to save the results in Excel format. Other nice formats would be a Word document (where the results would be in a Word document table, if possible) and/or html and/or pdf. But the most important and useful output format would probably be Excel, in order to support analysis, editing, and annotation among product stakeholders.
Note that there are some non-editable workarounds:
When displaying a package vulnerability, also provide a link to the Vulnerability definition in the VulnerableCode public site (VCIO) at https://public.vulnerablecode.io/
When you view the Vulnerabilities tab of a Package (see example screenshot) it presents the purl(s) of Fixed package(s) when available. If the Fixed package is not define in your dataspace, it activates a +
icon to enable an "Add Package" process, which currently presents the Add Package form with only the available purl fields populated. An improved process would do the following (or something better and equivalent):
This improved process takes advantage of available integrations (VCIO, SCIO) and data resources when adding a new Package to DejaCode.
It would be very useful to enable a DejaCode superuser administrator to make a copy of the standard attribution template and style sheet used by DejaCode and modify it to meet specific business requirements. Such a feature might include:
Ideally, this enhancement would use much of the same code as ScanCode.io (SCIO), such as:
The documentation should also explain the differences between product attribution documents and SBOMs, including the purpose of each.
Perhaps this is a user "pilot" error, but when I create a Package in DejaCode from a SourceForge download URL, I get strange results. A recent Add Package using
https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
resulted in a Package with a filename of download
rather than scribus-1.6.0.tar.gz
.
It also resulted in the rather verbose PURL value of
pkg:generic/download?download_url=https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
I scanned the package, using the same download URL, directly in SCIO v32.0.8, and it returned a PURL value of
pkg:autotools/scribus-1.6.0
in the key_files_packages section
So it appears that the rather eccentric download conventions of SourceForge are messing things up a bit.
Let's also add a workflow to publish DejaCode Docker images on GHCR, see: https://github.com/nexB/scancode.io/blob/main/.github/workflows/publish-docker.yml
For background discussion see
nexB/scancode.io#176
and
nexB/scancode.io#333
Celery appears to work fine in the Docker context so far, so this is not urgent.
We should store Dependencies as Packages in DejaCode. Also, in addition to simply creating Product Packages, we really need to provide the necessary qualifiers for Dependencies, especially whether they are declared as required or optional. Needs design. The processes that import Product Inventory Items from ScanCode results, or from an SBOM that provides dependency details, need to be enhanced as well as the model and the corresponding UI presentation in DejaCode.
As we do for Package, the Dependency model should be aligned with the ScanCode-toolkit and ScanCode.io ones:
Note that this improvement would enhance both license compliance and vulnerability management processes in DejaCode.
When adding a Package to DejaCode from a Download URL scan, it would be very useful for the package authors to be populated automatically. See also:
Our current instructions for installing DjC provides references to PURLDB and VulnerableCode in the Application Settings section, but there is no mention of the settings for SCIO integration.
The current documentation does not provide any information about which DejaCode features require integration with one or more of SCIO, VulnerableCode or PURLDB or the current options for installing these modules.
We need to document the functionality that depends on the installation and integration of other AboutCode projects and the current options for integration. We do not need to provide installation details that are better handled by the documentation for each project, but we need the big picture perspective from DejaCode.
To get more value out of our VulnerableCodeDB integration, it would be great if we could add a "has_vulnerability" property to both the Package model and the Component model to support queries and column templates.
This is complicated by the fact that the DejaCode Report system is made to work on the DejaCode Database values, I'm not sure how we'll be able to accomplish since the Vulnerability data is stored in an external DB.
One idea would be to fetch both list of all vulnerable PURLs and CPEs references in the VulnerableCodeDB and store this the DejaCode cache. These lists could be updated in the cache on a daily basis. This would required new specialized API endpoints on the VulnerableCode side. (Also, as a first step before implementing any of this, we should get some stats about the amount of data stored in the VulnerableCodeDB and how it will evolve.)
We have a working prototype for this but we do not have the infrastructure in place for periodic async tasks (the celerybeat worker service need to be setup, or alternatively we could complete the migration to RQ that has direct support for periodic tasks. ( @tdruez please update this remark if the RQ migration complete is now complete ).
Also we might consider a few additional things (maybe they belong in different issues, but perhaps best discussed in this context first):
We need a way for a DejaCode Superuser, who is also an Atlassian JIRA administrator, to use the DejaCode UI to configure integration between DejaCode Requests and JIRA Issues (requests, tickets, whatever). Design needed of course. Potential approaches include the following:
We could use the DejaCode Webhook system (already supported for Slack), to add a mapping for JIRA.
A JIRA Webhook needs to be configured, providing a DejaCode URL that would receive the data from JIRA and map it into a DejaCode issue creation. See
DejaCode currently shows the full purl as the identifier on the PurlDB form.
If possible, this should also be done on the DejaCode User Packages list.
Currently when we have a Report Query that returns a field with multiple values (for instance multiple tags for a license), they are presented as a list of values separated by commas such as in True, False, True, True
When two columns have similarly "aligned" values such as a license key, it is therefore hard to see which license has which tag:
Component License Redistribution
ABC apache-2.0, False, False,
mit, bsd, False, True
gpl-2.0
An alternative approach would be to separate each of the multi-values with a new line instead of a comma, enabling us to present a visual alignment of the values:
Component License Redistribution
ABC apache-2.0 False
mit False
bsd False
gpl-2.0 True
Note that the solution is a bit complicated because of the multiple Report output formats supported:
Any solution needs to work with all supported formats.
When adding a package to a product after upgrading the version in the codebase:
ProductPackage
entry and I lost field values such as the purpose
We could add an option to the "Add to Product", to enable automatic replacement of the ProductPackage entry, keeping some of the field values such as purpose
.
This would only be applied in case there's 1 entry with the same purl (different version).
@DennisClark Let me know your thoughts on this.
From #26 (comment)
one other observation, which is not directly related to this issue, but something that is somewhat perplexing. DejaCode found the existing scans that I created yesterday for the 4 packages (good) and apparently they did not get re-scanned (fine I think) but it did not perform any of the auto-updates to fields on the package (not so good), such as the license-expression, even though 3 of the 4 scans have a declared license. See attached.
Note that this is not urgent as this is not a usual use case as it's more a side effect of a testing behavior (removing and re-adding already scanned packages)
Introduce VEX Support to DejaCode
Here are a few suggested details (subject to improvement upon review):
A VEX (Vulnerability Exploitability Exchange) is an assertion about the status of a vulnerability in specific products.
In DejaCode a VEX exists only in the context of a Product. Our first implementation of VEX support will apply only to Product Packages.
The standard VEX Status can be:
● Not affected – No remediation is required regarding this vulnerability.
● Affected – Actions are recommended to remediate or address this vulnerability.
● Fixed – Represents that these product versions contain a fix for the vulnerability.
● Under Investigation – It is not yet known whether these product versions are affected by the vulnerability. An update will be provided in a later release.
DejaCode should support the standard VEX Status list. To avoid adding too much complexity to the data model, this could simply be coded into DejaCode, rather than creating a new VEX Status code table.
Given that a Product Package can have more than one vulnerability (VCID) and that a vulnerability can apply to more than one Product Package, it is probably best to consider defining a VEX in DejaCode as relating to an overall Product. Consider an on-demand process (button or command) in DejaCode that collects all the Vulnerabilities currently associated with Product Packages and creates or refreshes a list that we can call “Product VEX List” (working title) and presents them on a new tab (“VEX List”) of the Product User View.
The “logical” key of a Product VEX List is Product+VCID+PackageID, and the presentation should be in that order, with one row for each Product VEX. Supporting data elements should include:
VEX Status (default value “Under Investigation”) – modifiable
VEX Action. modifiable. free form text. If the status is Affected, a valid VEX must have an action statement that tells the product user what to do.
VEX Impact modifiable. free form text. If the status is Not affected, a valid VEX must have an impact statement to further explain details.
VEX Notes. modifiable. free form text. Additional notes to explain the VEX.
DejaCode Processing:
From the Product VEX list, ability to open a Product VEX detail form.
From the Product VEX list, provide a navigation link to the Product Package details.
Provide full support for Product VEX in Reporting.
Provide full support for Product VEX in the DejaCode API.
(future) Generate DejaCode Notifications when a Product VEX is created and when the VEX Status is modified. Provide a link to the Product VEX from the Notification.
Some useful files, background, and links:
See the example VEX at
https://github.com/CycloneDX/bom-examples/blob/master/VEX/vex.json
There is a descriptive overview of the CycloneDX approach to VEX here
https://github.com/CycloneDX/bom-examples/tree/master/VEX
Note that we are primarily interested in what they call the "Independent BOM and VEX BOM" rather that an SBOM with embedded VEX info, mainly because it is always important to remember that an SBOM is essentially static, associated with a specific Version of a package (or in our case a Product defined in DejaCode) while the VEX is intended to report time-critical information about potential impact of a software vulnerability and how it is being addressed.
The OSAF standard format, recommended by the CycloneDX team, is described here:
https://www.oasis-open.org/2022/11/28/common-security-advisory-framework-version-2-0-oasis-standard-is-now-published/
The OSAF also provides a downloadable package of the spec here:
https://docs.oasis-open.org/csaf/csaf/v2.0/os/csaf-v2.0-os.zip
The most useful file in that package for us is probably csaf_json_schema.json
Additional guidelines from CISA 2023-11-06 attached.
When-to-Issue-a-VEX-508c.pdf.zip
Interesting commentary from Tom Alrich attached.
When will there be VEX tools.pdf
/home/
tabDejaCode currently hides empty fields in the Component User Details view (although this needs to be reviewed and confirmed), but it should be enhanced to do that in all of the major application objects (Owners, Licenses, Packages, Products), perhaps with a button to "Show unused fields" in the UI.
While analyzing a Package in DejaCode recently created from https://github.com/VictoriaMetrics/VictoriaMetrics/archive/refs/tags/v1.93.9.tar.gz I used the Download Scan data
button to get a copy of the scan results, and discovered that when I examined the downloaded file that there was no summary information in it, such as the license clarity scoring details. I submitted the same package directly to ScanCode.io and discovered that it produces two scan results files, summary-<timestamp>.json
and scancode-<timestamp>.json
and that the summary file has the information that I needed.
Currently the scan results page on Package only gives me the ability to download the scan result details. I think we should improve that by either:
Download Scan summary
and Download Scan details
The summary has a lot of information that is useful to the user wanting to know more about the Package.
We need to add add pagination on the Product Inventory tab, since the number of Inventory Items can easily be well over 100.
Also, we need to fix the default ordering (which is currently based on pk
) and provide sorting options (perhaps on the column headers).
Introduce VEX Import (VEX Ingest) capability to DejaCode
Refer to #15 for background details, especially the suggested improvements to the DejaCode Product and Product Package models.
The initial challenge is to identify the specific standard VEX formats to support. We want to support all the commonly implemented formats.
More details to follow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.