nexb / dejacode Goto Github PK
View Code? Open in Web Editor NEWAutomate open source license compliance and ensure software supply chain integrity
Home Page: https://dejacode.readthedocs.io
License: GNU Affero General Public License v3.0
Automate open source license compliance and ensure software supply chain integrity
Home Page: https://dejacode.readthedocs.io
License: GNU Affero General Public License v3.0
I am trying to run both ScanCode.io and DejaCode locally (macOS).
Until this step, I could follow the given instructions: Install and configure ScanCode.io
Though, when I observe 'Integrations Sections' within DejaCode, it shows following error:
ScanCode.io instance is reachable via configured URL: http://localhost:8080
Can somebody please help me with this issue?
/home/
tabWhen adding new packages to DejaCode, there are often several URLs for the same package (for instance when adding PyPI packages there can be multiple wheels and sdist for the same package).
Then once the scans are completed, one may update one of the packages, usually to fill in fields that are blank.
Once this is done for one package it would be great to have a convenient way to compare, sync and possibly merge the new attribute values of the updated package with the other packages created for the same version, or perhaps even gather attribute values from previous versions of the same package.
Otherwise, the user typically needs to follow an error prone and work intensive sequence of actions using multiple browser tabs and copy/paste:
Some detail design is needed of course, but this would be a great enhancement to package maintenance usability. The recent enhancements to PurlDB for the package_set
feature might be helpful here.
We should store Dependencies as Packages in DejaCode. Also, in addition to simply creating Product Packages, we really need to provide the necessary qualifiers for Dependencies, especially whether they are declared as required or optional. Needs design. The processes that import Product Inventory Items from ScanCode results, or from an SBOM that provides dependency details, need to be enhanced as well as the model and the corresponding UI presentation in DejaCode.
As we do for Package, the Dependency model should be aligned with the ScanCode-toolkit and ScanCode.io ones:
Note that this improvement would enhance both license compliance and vulnerability management processes in DejaCode.
When a package is in DejaCode and has been furthered scanned, or is in the purldb I would like to drill down aka. navigate to its scan details either in ScanCode.io or the PurlDB.
In particular I would like to see details about license detection results (and scores), license clarity scores and navigate to the resource details
DejaCode Reports are being presented now with the columns compressed to squeeze them into the available screen space, rather than a more natural column width that spreads them out more legibly. This is a recently occurring problem. See attached screenshot of the 2-Product Package SBOM
report run in the nexB dataspace. The problem exists with all the Reports.
Importing an SBOM into a DejaCode Product can be disappointing if the SBOM does not have much license information. A nice feature would be to provide a new command option to "Improve Packages from PurlDB" on the Product "Scan" dropdown:
Step through the Product Packages
Use the PURL to find an entry in the PurlDB
Apply PurlDB field values to empty fields in the Product Package and corresponding Package definitions.
To get more value out of our VulnerableCodeDB integration, it would be great if we could add a "has_vulnerability" property to both the Package model and the Component model to support queries and column templates.
This is complicated by the fact that the DejaCode Report system is made to work on the DejaCode Database values, I'm not sure how we'll be able to accomplish since the Vulnerability data is stored in an external DB.
One idea would be to fetch both list of all vulnerable PURLs and CPEs references in the VulnerableCodeDB and store this the DejaCode cache. These lists could be updated in the cache on a daily basis. This would required new specialized API endpoints on the VulnerableCode side. (Also, as a first step before implementing any of this, we should get some stats about the amount of data stored in the VulnerableCodeDB and how it will evolve.)
We have a working prototype for this but we do not have the infrastructure in place for periodic async tasks (the celerybeat worker service need to be setup, or alternatively we could complete the migration to RQ that has direct support for periodic tasks. ( @tdruez please update this remark if the RQ migration complete is now complete ).
Also we might consider a few additional things (maybe they belong in different issues, but perhaps best discussed in this context first):
Perhaps this is a user "pilot" error, but when I create a Package in DejaCode from a SourceForge download URL, I get strange results. A recent Add Package using
https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
resulted in a Package with a filename of download
rather than scribus-1.6.0.tar.gz
.
It also resulted in the rather verbose PURL value of
pkg:generic/download?download_url=https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
I scanned the package, using the same download URL, directly in SCIO v32.0.8, and it returned a PURL value of
pkg:autotools/scribus-1.6.0
in the key_files_packages section
So it appears that the rather eccentric download conventions of SourceForge are messing things up a bit.
We should use the ComponentKeyword as suggestions in the keyword autocomplete fields and remove the validation to allow any keywords values.
In DejaCode, at the moment, only keywords existing as ComponentKeyword in the Dataspace are allowed. This approach is not compatible with automatically adding arbitrary keywords from Scan results or other integration processes.
Background: Scan results from SCTK returns keywords such as:
wrapper
proxy
decorator
Development Status :: 5 - Production/Stable
Programming Language :: Python :: 2
Programming Language :: Python :: 2.7
Programming Language :: Python :: 3
Programming Language :: Python :: 3.5
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: Implementation :: CPython
Programming Language :: Python :: Implementation :: PyPy
The Admin "Activity Log" reports are ignoring the entered "Days of Activity", and appear to always use the default of 90 days.
Also, the reports do not appear to be sorted in any logical order. The best sort order would be by "Action time" descending.
Example of an Owner Activity Log (which was requested to be for 30 days of activity) attached.
DejaCode currently shows the full purl as the identifier on the PurlDB form.
If possible, this should also be done on the DejaCode User Packages list.
From #26 (comment)
one other observation, which is not directly related to this issue, but something that is somewhat perplexing. DejaCode found the existing scans that I created yesterday for the 4 packages (good) and apparently they did not get re-scanned (fine I think) but it did not perform any of the auto-updates to fields on the package (not so good), such as the license-expression, even though 3 of the 4 scans have a declared license. See attached.
Note that this is not urgent as this is not a usual use case as it's more a side effect of a testing behavior (removing and re-adding already scanned packages)
While analyzing a Package in DejaCode recently created from https://github.com/VictoriaMetrics/VictoriaMetrics/archive/refs/tags/v1.93.9.tar.gz I used the Download Scan data
button to get a copy of the scan results, and discovered that when I examined the downloaded file that there was no summary information in it, such as the license clarity scoring details. I submitted the same package directly to ScanCode.io and discovered that it produces two scan results files, summary-<timestamp>.json
and scancode-<timestamp>.json
and that the summary file has the information that I needed.
Currently the scan results page on Package only gives me the ability to download the scan result details. I think we should improve that by either:
Download Scan summary
and Download Scan details
The summary has a lot of information that is useful to the user wanting to know more about the Package.
Introduce VEX Import (VEX Ingest) capability to DejaCode
Refer to #15 for background details, especially the suggested improvements to the DejaCode Product and Product Package models.
The initial challenge is to identify the specific standard VEX formats to support. We want to support all the commonly implemented formats.
More details to follow.
It would be very useful to enable a DejaCode superuser administrator to make a copy of the standard attribution template and style sheet used by DejaCode and modify it to meet specific business requirements. Such a feature might include:
Ideally, this enhancement would use much of the same code as ScanCode.io (SCIO), such as:
The documentation should also explain the differences between product attribution documents and SBOMs, including the purpose of each.
ABOUT File Visibility and the DejaCode UI
DejaCode currently provides a simple and convenient interface that enables a user to generate an ABOUT file (and associated files) for a Package or Component; however, the DejaCode user does not actually see what the ABOUT file is going to look like or what it looks like after the file is generated. The ABOUT file is in the user’s file system, separate from the DejaCode application and its database; only the single user sees the ABOUT file unless it is shared with the user’s team by some other technology.
Consider a new tab in the Package details user view, perhaps called “ABOUT”, that offers a rendering of its ABOUT file exactly as it will appear in a generated .yml file. This “preview” would allow the user to evaluate the Package data as it exists in DejaCode, complete with any curations, before it is actually generated. The same “preview” would also provide an appropriate means to confirm the successful Import ( see #43 ) of an ABOUT file into the DejaCode database, where the Import would either create or update a Package identified by the PURL.
If a DejaCode user updates Package details, then after those changes are saved the proposed ABOUT tab would show any impact on the ABOUT file fields as a preview of an ABOUT file that can potentially be generated.
When you view the Vulnerabilities tab of a Package (see example screenshot) it presents the purl(s) of Fixed package(s) when available. If the Fixed package is not define in your dataspace, it activates a +
icon to enable an "Add Package" process, which currently presents the Add Package form with only the available purl fields populated. An improved process would do the following (or something better and equivalent):
This improved process takes advantage of available integrations (VCIO, SCIO) and data resources when adding a new Package to DejaCode.
We should raise the default timeout value, 6 minutes is not always enough while importing SBOMs and Manifests.
When adding a package to a product after upgrading the version in the codebase:
ProductPackage
entry and I lost field values such as the purpose
We could add an option to the "Add to Product", to enable automatic replacement of the ProductPackage entry, keeping some of the field values such as purpose
.
This would only be applied in case there's 1 entry with the same purl (different version).
@DennisClark Let me know your thoughts on this.
Currently when we have a Report Query that returns a field with multiple values (for instance multiple tags for a license), they are presented as a list of values separated by commas such as in True, False, True, True
When two columns have similarly "aligned" values such as a license key, it is therefore hard to see which license has which tag:
Component License Redistribution
ABC apache-2.0, False, False,
mit, bsd, False, True
gpl-2.0
An alternative approach would be to separate each of the multi-values with a new line instead of a comma, enabling us to present a visual alignment of the values:
Component License Redistribution
ABC apache-2.0 False
mit False
bsd False
gpl-2.0 True
Note that the solution is a bit complicated because of the multiple Report output formats supported:
Any solution needs to work with all supported formats.
The SBOM parts were moved into a new specialized "load_sbom" pipeline.
We need to use the newer ScanCode.io pipeline names... OR keep aliases to avoid breaking the API.
For background discussion see
nexB/scancode.io#176
and
nexB/scancode.io#333
Celery appears to work fine in the Docker context so far, so this is not urgent.
Describe the bug
Following the installation docs, the web
component will not start correctly, leading to a "Bad Gateway" error in nginx.
To Reproduce
Steps to reproduce the behavior:
gunicorn: error: unrecognized arguments: --workers 4
Expected behavior
With the default parameter configuration/installation docs, the service should start cleanly.
Context
Removing the continuation backslash in
Line 33 in 024697b
The installation docs and the Makefile are using the docker compose
command. This is unknown to the Docker version shipped on Ubuntu 22.04 (Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1
), which only knows docker-compose
.
With this, the default installation and all Makefile commands will not work out of the box.
Following recent changes in SCIO we no longer can import product's data from package manifest
I would like to have the ability to simply upload a package manifest or lockfile as a product like in the former "import_manifest" feature.
We could do it best IMHO as two imports: import SBOM
and import_manifest
that would other behave the same.
DejaCode currently hides empty fields in the Component User Details view (although this needs to be reviewed and confirmed), but it should be enhanced to do that in all of the major application objects (Owners, Licenses, Packages, Products), perhaps with a button to "Show unused fields" in the UI.
Is your enhancement request related to a problem? Please describe.
When navigating through the hierarchy, it is not apparent which package/dep has further deps.
For example in the following screenshot all the packages seem the same.
But if you click on Qt, it clearly has several dependencies.
What are the benefits of the requested enhancement?
It would make navigating the package hierarchy easier.
Describe the solution you would like
I think a visual indicator – e.g. a connector dot on the right-edge of the package bubble – would be enough.
Introduce VEX Support to DejaCode
Here are a few suggested details (subject to improvement upon review):
A VEX (Vulnerability Exploitability Exchange) is an assertion about the status of a vulnerability in specific products.
In DejaCode a VEX exists only in the context of a Product. Our first implementation of VEX support will apply only to Product Packages.
The standard VEX Status can be:
● Not affected – No remediation is required regarding this vulnerability.
● Affected – Actions are recommended to remediate or address this vulnerability.
● Fixed – Represents that these product versions contain a fix for the vulnerability.
● Under Investigation – It is not yet known whether these product versions are affected by the vulnerability. An update will be provided in a later release.
DejaCode should support the standard VEX Status list. To avoid adding too much complexity to the data model, this could simply be coded into DejaCode, rather than creating a new VEX Status code table.
Given that a Product Package can have more than one vulnerability (VCID) and that a vulnerability can apply to more than one Product Package, it is probably best to consider defining a VEX in DejaCode as relating to an overall Product. Consider an on-demand process (button or command) in DejaCode that collects all the Vulnerabilities currently associated with Product Packages and creates or refreshes a list that we can call “Product VEX List” (working title) and presents them on a new tab (“VEX List”) of the Product User View.
The “logical” key of a Product VEX List is Product+VCID+PackageID, and the presentation should be in that order, with one row for each Product VEX. Supporting data elements should include:
DejaCode Processing:
Some useful files, background, and links:
See the example VEX at
There is a descriptive overview of the CycloneDX approach to VEX here
Note that we are primarily interested in what they call the "Independent BOM and VEX BOM" rather that an SBOM with embedded VEX info, mainly because it is always important to remember that an SBOM is essentially static, associated with a specific Version of a package (or in our case a Product defined in DejaCode) while the VEX is intended to report time-critical information about potential impact of a software vulnerability and how it is being addressed.
The CSAF standard format, recommended by the CycloneDX team, is described here:
https://www.oasis-open.org/2022/11/28/common-security-advisory-framework-version-2-0-oasis-standard-is-now-published/
The CSAF also provides a downloadable package of the spec here:
https://docs.oasis-open.org/csaf/csaf/v2.0/os/csaf-v2.0-os.zip
The most useful file in that package for us is probably csaf_json_schema.json
Additional guidelines from CISA 2023-11-06 attached.
When-to-Issue-a-VEX-508c.pdf.zip
Interesting commentary from Tom Alrich attached.
When will there be VEX tools.pdf
I would like to have a way to contribute additional custom reporting as new code in DejaCode.
This would not be code that is part of the standard DejaCode, but rather a separate project that I could install as its own wheel.
We could reuse the same approach as in ScanCode toolkit plugins or for ScanCode.io pipelines
Let's also add a workflow to publish DejaCode Docker images on GHCR, see: https://github.com/nexB/scancode.io/blob/main/.github/workflows/publish-docker.yml
See related issue #42
DejaCode currently provides a simple and convenient interface that enables a user to generate an ABOUT file (and associated files) for a Package or Component; however, the direction is only outbound from DejaCode.
Consider enhancing DejaCode to support the ability to import an ABOUT file, with the result being either the creation of a new Package or updates to a Package already defined in DejaCode.
Once the import is completed, then the DejaCode user would be able to see all the Package details in the standard DejaCode details view, and also reproduced in the ABOUT tab.
Problem: provide more clarity for "Declared License" vs "Concluded License" .
Benefit: support the completeness of an SBOM.
Create an additional declared_license field on Package. When a package scan is completed update both the current assigned_license field and this new declared_license field with the same values. The intention is to retain the declared_license as an historical record, so that the assigned_license field essentially becomes the "concluded license" (we can change the help text on that field).
Store the additional licenses from the scan results on the package model as well. This will support deeper analysis and reporting, enabling users to comment on why specific additional licenses impact or do not impact the licensing terms as the package is expected to be used in an organization.
More design details to follow.
See related issue nexB/scancode-toolkit#2897
We should assign a category+usage-policy to a license-expression to clarify license WITH exception cases.
It could be that all exceptions to a Copyleft license turn a license expression into a Copyleft-limited
It's important to clarify that the scope of this improvement is limited to "license WITH exception" cases and not more complex license expressions that express multiple licenses connected by the "AND" operator; that is, the "(license WITH exception)", ideally surrounded by parentheses, can be thought of as its own unit (a molecule?) and we can apply a category to that. Since the most common cases exist with the general rule that the category of the exception prevails over the category of the target license, we can make that the default behavior, but ultimately this should be controlled by SCTK detection rules to handle odd cases where that is not what is actually happening, for example, "exceptions" that simply tell you what you are allowed to do but don't really modify the target license terms.
When displaying a package vulnerability, also provide a link to the Vulnerability definition in the VulnerableCode public site (VCIO) at https://public.vulnerablecode.io/
The results of a DejaCode Product Compare can be very useful in the ongoing analysis and communication activities that happen during product SCA or the defining of a new product version.
It would be extremely helpful for the User to be able to save the results in Excel format. Other nice formats would be a Word document (where the results would be in a Word document table, if possible) and/or html and/or pdf. But the most important and useful output format would probably be Excel, in order to support analysis, editing, and annotation among product stakeholders.
Note that there are some non-editable workarounds:
When adding a Package to DejaCode from a Download URL scan, it would be very useful for the package authors to be populated automatically. See also:
We need a way for a DejaCode Superuser, who is also an Atlassian JIRA administrator, to use the DejaCode UI to configure integration between DejaCode Requests and JIRA Issues (requests, tickets, whatever). Design needed of course. Potential approaches include the following:
We could use the DejaCode Webhook system (already supported for Slack), to add a mapping for JIRA.
A JIRA Webhook needs to be configured, providing a DejaCode URL that would receive the data from JIRA and map it into a DejaCode issue creation. See
In the context of automation, we should add the following Product features in the REST API:
This would allow to automatically feed new DejaCode Products from a CI workflow, such as using https://github.com/nexB/scancode-action, for example:
scancode-action
is triggered to run a scan_codebase
pipelineOur primary DejaCode documentation is now at: https://dejacode.readthedocs.io/en/latest/, but there is no link to it from the DjC Home page.
The working idea here is to come up with the best way to identify cross-package relationships, especially to be able to get to
(1) the source code
and
(2) more complete copyright+license data, which usually comes from the source code.
We could start by displaying the values for contains_source_code
, source_packages
, code_view_url
, and vcs_url
in the "Detected Package" section of the Scan tab (when a value is available). The ScanCode package model has this support for source code relationships (which are also in PurlDB):
the `contains_source_code` boolean flags tells if the package itself contains source code: https://github.com/nexB/scancode-toolkit/blob/0465269543eb338086c10bdeb1e81d3013522b4d/src/packagedcode/models.py#L452
the `source_packages` field is a list of Package URLs that may exist for this package https://github.com/nexB/scancode-toolkit/blob/0465269543eb338086c10bdeb1e81d3013522b4d/src/packagedcode/models.py#L457
the `code_view_url` and `vcs_url` provide reference URLs to view or fetch actual source code https://github.com/nexB/scancode-toolkit/blob/0465269543eb338086c10bdeb1e81d3013522b4d/src/packagedcode/models.py#L414
Now that we have standardized on PURL as the package identifier, we should be able to pursue this DejaCode improvement using package-set values via integration with the PurlDB.
We need to add add pagination on the Product Inventory tab, since the number of Inventory Items can easily be well over 100.
Also, we need to fix the default ordering (which is currently based on pk
) and provide sorting options (perhaps on the column headers).
On the DejaCode PurlDB view, if I use the Filter dropdown and enter gem
and click the Filter button (or simply press return), the application is updating the URL with all the possible Filter parameters like this
https://public.dejacode.com/purldb/?type__iexact=gem&namespace__iexact=&name__iexact=&version__iexact=&download_url__iexact=&filename__iexact=&sha1=&md5=&size=&release_date=&purl=&q=&sort=
which is not actually filtering by a type of gem
and it continues to show all of the packages in the PurlDB.
Our current instructions for installing DjC provides references to PURLDB and VulnerableCode in the Application Settings section, but there is no mention of the settings for SCIO integration.
The current documentation does not provide any information about which DejaCode features require integration with one or more of SCIO, VulnerableCode or PURLDB or the current options for installing these modules.
We need to document the functionality that depends on the installation and integration of other AboutCode projects and the current options for integration. We do not need to provide installation details that are better handled by the documentation for each project, but we need the big picture perspective from DejaCode.
A Reference section in the DejaCode User Guide is needed to explain the differences between Components and Packages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.