Giter Site home page Giter Site logo

nexb / scancode.io Goto Github PK

View Code? Open in Web Editor NEW
91.0 14.0 82.0 51.06 MB

ScanCode.io is a server to script and automate software composition analysis pipelines with ScanPipe pipelines. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ Google Summer of Code, nexB and others generous sponsors!

Home Page: https://scancodeio.readthedocs.io

License: Apache License 2.0

Makefile 0.34% Python 84.90% HTML 13.57% Dockerfile 0.20% JavaScript 0.91% Java 0.06% C++ 0.02%
sca software-composition-analysis open-source license scancode docker virtual-machine cyclonedx package-url purl

scancode.io's Introduction

ScanCode.io

ScanCode.io is a server to script and automate software composition analysis with ScanPipe pipelines.

First application is for Docker container and VM composition analysis.

Getting started

The ScanCode.io documentation is available here: https://scancodeio.readthedocs.org/

If you have questions that are not covered by our Documentation or FAQs, please ask them in Discussions.

If you want to contribute to ScanCode.io, start with our Contributing page.

A new GitHub action is now available at scancode-action to run ScanCode.io pipelines from your GitHub Workflows. Visit https://scancodeio.readthedocs.io/en/latest/automation.html to learn more about automation.

Build and tests status

Tests Documentation
CI Tests Status

Documentation Build Status

License

SPDX-License-Identifier: Apache-2.0

The ScanCode.io software is licensed under the Apache License version 2.0. Data generated with ScanCode.io is provided as-is without warranties. ScanCode is a trademark of nexB Inc.

You may not use this software except in compliance with the License. You may obtain a copy of the License at: http://apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Data Generated with ScanCode.io is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. No content created from ScanCode.io should be considered or used as legal advice. Consult an Attorney for any legal advice.

scancode.io's People

Contributors

0xmpij avatar aalexanderr avatar avishrantssh avatar ayansinhamahapatra avatar divyansh044 avatar hritik14 avatar hyounes4560 avatar jayanth-kumar-morem avatar jonoyang avatar keshav-space avatar lf32 avatar philcali avatar pombredanne avatar swastkk avatar tdruez avatar tg1999 avatar xerrni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scancode.io's Issues

Provide a way to use scanpipe with an in-process database

Depending on the environment, using Postgres can be a little unwieldy (requires a daemon, root access, etc.).

Is the DB used for much more than just storing pipeline statuses and results? Is there any reason a solution like SQLite couldn't work here?

Assignment of Download URL to a detected package needs to be improved

A recent scan (using ScanCode.io) of vscode-1.33.1.tar.gz (from https://github.com/microsoft/vscode/archive/1.33.1.tar.gz ) resulted in the assignment of invalid Download URL's to detected packages. An example is the detected package clojure-1.0.0.tgz which was assigned a Download URL of https://registry.npmjs.org/clojure/-/clojure-1.0.0.tgz which does not work.

An archive of the scan output is attached.

vscode-1.33.1.tar.gz_scan.json.zip

Add support for "failed" task_output in Run.get_run_id method

The following may occur when a task has failed early, before running a pipeline. The task_output will then have some content but no "run-id 123456789" string to be found, raising an issue on get_run_id calls.

Exception raised in callable attribute "get_run_id"; 
original exception was: 'NoneType' object has no attribute 'group'

Support newer versions of python

Is there a reason python3.6 is specifically required? On some Debian environments, it is impossible to use -m venv without the newest version of python.

design-needed: Create app to display licenses and license rules

These should be the scancode-toolkit licenses.
Ideally we should also have an easy to add new records which should then trigger a PR in scancode-toolkit
There should also ideally be a simple API that could be use d by aboutcde-toolkit to fetch license texts for attribution generation

Improve the short description of this scancode.io project

Here is a suggested short description for the scancode.io project:

ScanCode.io is a server that manages and performs scancode-toolkit scans, and it enables you to automate ScanCode analysis with ScanPipe pipelines and Container Analysis.

Scan a single text for license, report detection details and quality

Using scancode-toolkit I would like to

  1. have a screen where I can paste a text to trigger a license detection

  2. in the results see which parts of the text were detected/with which text highlighted

  3. the quality of the detection should be analyzed by https://github.com/nexB/scancode-results-analyzer/

  4. if the quality or results are not good I should have an option to:
    4.1 automatically create a ticket at Github filled with the issue data (and the possible corrected suggestion)
    4.2 OR create a PR with new license rules and/or a rule update with the fix to resolve this issue

First Resource in scanpipe JSON results does not have a path

I set up scanpipe locally, run the scan_codebase pipeline on https://registry.npmjs.org/@uifabric/charting/-/charting-2.7.5.tgz, and downloaded the results in json when it was done. I tried to upload the scan directly to matchcode to see what happens and matchcode runs into an error because of the first Resource in the results:

{
  "for_packages": [],
  "path": "",
  "size": 0,
  "sha1": "",
  "md5": "",
  "copyrights": [],
  "holders": [],
  "authors": [],
  "licenses": [],
  "license_expressions": [],
  "emails": [],
  "urls": [],
  "status": "",
  "type": "directory",
  "extra_data": {},
  "name": "codebase",
  "extension": "",
  "programming_language": "",
  "mime_type": "",
  "file_type": ""
},

I'd expect there to be a path value for the codebase directory (which all the files sit in) and I would also expect that all the subsequent Resource paths to be prefixed by codebase as well. Conversely, since the rest of the paths are not prefixed with codebase, you could just opt to remove this Resource and I wouldn't think it would mess anything up in the results?

Add a basic extract and scan pipeline

This would eventually replace the scanner API.

  • We have today an older "scanner" API that can scan a single package and this is NOT using the scanpipe pipelines. It would would make sense to use pipelines also for this

Select codebase content from SCIO- DB for JSON or CSV output

When we have a fully extracted codebase in SCIO-DB, we will need to be able to extract subsets of that codebase for (at least) two reasons:

  1. The size of output files - the current practical limits for the number of files that you can effectively manage (search, filter, etc.) in Excel/Calc (CSV) or SCWB (JSON) is about 500k and 150K respectively. These limits are also relative to the amount of column/field data in a file, but the primary constraint seems to be the number of rows.
  2. The purpose of an analysis step - for the current D2D tracing of Deploy code to Devel code you need to create separate CSV files for the Deploy and Devel subsets of the codebase.

The general principle for defining the codebase file data (row) to be extracted is top-down - i.e. by specifying higher-level directories. It would be ideal to have some tree view of the codebase where you can check off the subsets of the codebase that you want to extract for analysis in Excel/Calc or SCWB.

Assignment of package filenames needs to be improved

A recent scan (using ScanCode.io) of bootstrap-4.3.1.tar.gz (from https://github.com/twbs/bootstrap/archive/v4.3.1.tar.gz ) resulted in the identification of a couple of packages that are unique (different URL's), but the assigned filenames are confusingly the same:

Package 1:
Filename 4.3.0
Download URL https://www.nuget.org/api/v2/package/bootstrap/4.3.0
Package URL pkg:nuget/[email protected]

Package 2:
Filename 4.3.0
Download URL https://www.nuget.org/api/v2/package/bootstrap.sass/4.3.0
Package URL pkg:nuget/[email protected]

In each case, the Download URL and Package URL appear to be quite good, but the Filename could be improved to reflect what that filename would actually be if downloaded.

Attaching an archive of the scan results output.

bootstrap-4.3.1.tar.gz_scan.json.zip

Select field/column content from SCIO-DB for XLSX output

In addition to selecting codebase subsets from a Project in SCIO-DB (see #48), we often want to extract a specific subset of columns for a particular type of analysis in Excel/Calc. For example, you might want just the Copyright and License fields/columns for Files and Packages without the license match_rule data and without the other Package data.
It would be ideal to have some way to select the fields/columns you want to extract from a list with check boxes or similar UI. We will, of course, want to combine this with the capability to select Codebase rows in one UI. That combination would be essentially be the first version of an SCIO Reports module.

design-needed: Comprehensive pipeline with reports

Form a chat with @daniel-eder

In fact it may be very interesting to create components in the pipeline that produce such output, such as SBOMS, lists of Todos and Dont's, a full "compliance" file including all copyrights, licenses, notices, disclaimers, etc as required by each license, each step handling one specific type of output

Add new pipeline for basic codebase scan

The input to that pipeline would be code archive(s).
The pipeline would:

  1. extract the archives (not recursively at first)
  2. run the equivalent of scancode -clipeu scan on that code
  3. create one or more inventory analysis "workfile" report as CSV, JSON (or XLS TBD) listing all the captured resources and packages in a format TBD.

The reports would be created by the pipeline and stored on disk to be retrieved by the API

Failed to scan Debian Docker image

While scanning https://hub.docker.com/_/mongo with docker.py pipeline we get:

        "01-debian-agpl-sspl-mongo-latest.tar"
    ],
    "next_run": null,
    "runs": [
        {
            "url": "http://127.0.0.1:8001/api/runs/c9337b56-04b6-45c4-a1e8-aa82c85edb19/",
            "pipeline": "scanpipe/pipelines/docker.py",
            "description": "A pipeline to analyze a Docker image.",
            "project": "http://127.0.0.1:8001/api/projects/28f6738c-f2f8-49d9-8299-ab6ff9e9987f/",
            "uuid": "c9337b56-04b6-45c4-a1e8-aa82c85edb19",
            "run_id": "1599755581029165",
            "created_date": "2020-09-10T16:32:58.966347Z",
            "task_id": "7771f598-6b51-434e-b663-96b09b4488e1",
            "task_start_date": "2020-09-10T16:32:59.017750Z",
            "task_end_date": "2020-09-10T16:38:01.455060Z",
            "task_exitcode": 1,
            "task_output": [
                "Validating your flow...",
                "    The graph looks good!",
                "Running pylint...",
                "    Pylint is happy!",
                "2020-09-10 16:33:01.031 Workflow starting (run-id 1599755581029165):",
                "2020-09-10 16:33:01.035 [1599755581029165/start/1 (pid 26030)] Task is starting.",
                "2020-09-10 16:33:01.990 [1599755581029165/start/1 (pid 26030)] Task finished successfully.",
                "2020-09-10 16:33:01.995 [1599755581029165/extract_images/2 (pid 26037)] Task is starting.",
                "2020-09-10 16:33:04.129 [1599755581029165/extract_images/2 (pid 26037)] Task finished successfully.",
                "2020-09-10 16:33:04.134 [1599755581029165/extract_layers/3 (pid 26043)] Task is starting.",
                "2020-09-10 16:33:05.773 [1599755581029165/extract_layers/3 (pid 26043)] Task finished successfully.",
                "2020-09-10 16:33:05.838 [1599755581029165/find_images_linux_distro/4 (pid 26049)] Task is starting.",
                "2020-09-10 16:33:06.812 [1599755581029165/find_images_linux_distro/4 (pid 26049)] Task finished successfully.",
                "2020-09-10 16:33:06.816 [1599755581029165/collect_images_information/5 (pid 26055)] Task is starting.",
                "2020-09-10 16:33:07.782 [1599755581029165/collect_images_information/5 (pid 26055)] Task finished successfully.",
                "2020-09-10 16:33:07.786 [1599755581029165/collect_and_create_codebase_resources/6 (pid 26062)] Task is starting.",
                "2020-09-10 16:33:54.536 [1599755581029165/collect_and_create_codebase_resources/6 (pid 26062)] Task finished successfully.",
                "2020-09-10 16:33:54.541 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)] Task is starting.",
                "2020-09-10 16:38:00.696 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)] <flow DockerPipeline step collect_and_create_system_packages> failed:",
                "2020-09-10 16:38:00.697 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     Internal error",
                "2020-09-10 16:38:00.697 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)] Traceback (most recent call last):",
                "2020-09-10 16:38:01.054 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/metaflow/cli.py\", line 883, in main",
                "2020-09-10 16:38:01.054 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     start(auto_envvar_prefix='METAFLOW', obj=state)",
                "2020-09-10 16:38:01.054 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/click/core.py\", line 829, in __call__",
                "2020-09-10 16:38:01.054 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     return self.main(args, kwargs)",
                "2020-09-10 16:38:01.054 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/click/core.py\", line 782, in main",
                "2020-09-10 16:38:01.054 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     rv = self.invoke(ctx)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/click/core.py\", line 1259, in invoke",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     return _process_result(sub_ctx.command.invoke(sub_ctx))",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/click/core.py\", line 1066, in invoke",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     return ctx.invoke(self.callback, ctx.params)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/click/core.py\", line 610, in invoke",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     return callback(args, kwargs)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/click/decorators.py\", line 33, in new_func",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     return f(get_current_context().obj, args, kwargs)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/metaflow/cli.py\", line 444, in step",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     max_user_code_retries)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/metaflow/task.py\", line 394, in run_step",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     self._exec_step_function(step_func)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/metaflow/task.py\", line 47, in _exec_step_function",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     step_function()",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"scanpipe/pipelines/docker.py\", line 87, in collect_and_create_system_packages",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     docker_pipes.scan_image_for_system_packages(self.project, image)",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/scanpipe/pipes/docker.py\", line 105, in scan_image_for_system_packages",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     for i, (purl, package, layer) in enumerate(installed_packages):",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/container_inspector/image.py\", line 329, in get_installed_packages",
                "2020-09-10 16:38:01.055 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     for purl, package in layer.get_installed_packages(packages_getter):",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/scanpipe/pipes/debian.py\", line 16, in package_getter",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     for package in packages:",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/packagedcode/debian.py\", line 211, in get_installed_packages",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     for package in parse_status_file(base_status_file_loc, distro=distro):",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]   File \"/tmp/scancode.io/lib/python3.6/site-packages/packagedcode/debian.py\", line 228, in parse_status_file",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)]     raise FileNotFoundError('[Errno 2] No such file or directory: {}'.format(repr(location)))",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/scancode.io/var/projects/sspl-28f6738c/codebase/01-debian-agpl-sspl-mongo-latest.tar-extract/6f90c94ad68f6b08882985f9884f3154469709ca8af796d52726ac7562f7ff1c/var/lib/dpkg/status'",
                "2020-09-10 16:38:01.056 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)] ",
                "2020-09-10 16:38:01.057 [1599755581029165/collect_and_create_system_packages/7 (pid 26104)] Task failed.",
                "2020-09-10 16:38:01.057 Workflow failed.",
                "2020-09-10 16:38:01.057 Terminating 0 active tasks...",
                "2020-09-10 16:38:01.057 Flushing logs...",
                "    Step failure:",
                "    Step collect_and_create_system_packages (task-id 7) failed.",
                ""
            ],
            "execution_time": 302

Create license scan quality improvement campaigns for specific ecosystems

Doing massive scans of all the packages of a given ecosystem (say Maven, PyPi, etc.) I would like to:

Some candidates for these could be these:

Create/generate documentation for "pipes"

To help a pipeline creator, knowing which are the "pipes" available, what they do and how to use them would be mighty useful.
Since a "pipe" is just a plain Python function (that we organize by module for clarity), the best approach would be to use docstrings and generate a clean doc from that.

Use project name as argument to run a pipeline

We already use the name in all the management commands but the uuid is used on the Pipeline class.

scanpipe run --project <NAME> and pipelines/docker.py run --project <UUID>

Let's use name everywhere for consistency.

Process rootfs one at a time

In the root_filesystems.py pipeline in the step:

    @step
    def match_not_analyzed_to_system_packages(self):
        """
        Match not-yet-analyzed files to files already related to system packages.
        """
        rootfs.match_not_analyzed(
            self.project,
            reference_status="system-package",
            not_analyzed_status="",
        )
        self.next(self.match_not_analyzed_to_application_packages)

... we should consider doing it one rootfs at a time if there are several rootfs at once in the project e.g. for rfs in self.root_filesystems:...

Improve/add documentation

The documentation we have is sparse. We would need:

  • a basic and descriptive README that explain what this is
  • installation documentation
  • usage documentation
  • scanpipe pipelines tutorials and documentation

Generate standard JSON files from SCIO-DB

When you use ScanPipe to analyze a codebase and load the Scan data into the SCIO-DB, you should also be able to extract the Scan data from the SCIO-DB according to any combination of the basic ScanCode runtime parameters:
--info
--copyright
--license
--package
--email
--url
The primary output format should be standard SCTK JSON.
I am not sure what we should put in the output file header, but we would at least want to know what version of SCTK was used for the original Scan. In any case this JSON file must be compatible with ScanCode Workbench since the primary use case is to view Scan data there.

Track which Docker image/layer a resource or package is found

In a docker analysis pipeline, we have the layer information as this is always part of the resource path. We may also have the image name in the path, but that's not guaranteed. We need to have a way to attach explicitly to a discovered package and a codebase resource which docker image it is found in and also determine what is the base image/base image layers.
In the simplest way, this would be stored as extra_data attached to each of these objects and extra_data should also be returned with the JSON results.

Generate streamlined analysis workfile (CSV) from SCIO-DB

A streamlined CSV workfile will be very useful for SCA planning. The columns we need are listed below by SCTK runtime option using current SCTK CSV output column names. If there are multiple values in a JSON field we want all of the values in one cell ("flattened").

Info:
Resource
type
name
base_name
extension
size
sha1
mime_type
file_type
programming_language

Copyrights:
copyright
copyright_holder
author

Licenses:
license_expression
license__key
license__score
license__category
license__owner

email
url

Packages:
package__type
package__namespace
package__name
package__version
package__primary_language
package__description
package__release_date
package__homepage_url
package__download_url
package__size
package__sha1
package__vcs_url
package__copyright
package__license_expression
package__declared_license
package__notice_text

Add support for RPM-based distros for docker and rootfs images scanpipe

There is no easy way to access the RPM database but through librpm and the rpm executable.
The installed RPMs database comes in three formats:

  1. bdb: a legacy Berkeley DB hash used as a key/value where the value is a binary blob that contains all the RPM data. The format of this blob should be the same as the RPM header format and scancode-toolkit can parse the headers. This is the format that was/is used in older RH, CentOS, Fedora and most every RPM distros.
  2. sqlite: a SQLite database where one table is used as a key/value store where the value is a binary blob that contains all the RPM data in the same binary format as in 1. and the RPM header. This is the format that is used in newer RH, CentOS and Fedora versions.
  3. ndb: a new key/value store that is built-in librpm. This is the format used by newer openSUSE distros

librpm provides support for each of these formats and also contains a built-in read-only handler for the 1. bdb format such that librpm can be built without Berkeley DB and still can read an older RPM db (for instance to convert it to a newer format).

It needs to be built with specific flags to enable all these formats (typically a given build of a distro does not nee to support all the formats).

The installed DBs locations are:

Distro Path Format
CentOS 8 /var/lib/rpm/Packages Berkeley DB (Hash, version 9, native byte-order)
CentOS 5 /var/lib/rpm/Packages Berkeley DB (Hash, version 8, native byte-order)
Fedora 30 /var/lib/rpm/rpmdb.sqlite SQLite 3.x database
Fedora 20 /var/lib/rpm/Packages Berkeley DB (Hash, version 9, native byte-order)
openMandriva /var/lib/rpm/Packages Berkeley DB (Hash, version 10, native byte-order)
RHEL 8 /var/lib/rpm/Packages Berkeley DB (Hash, version 9, native byte-order)
openSUSE 20200528 /usr/lib/sysimage/rpm/Packages.db data, but this is the ndb format

In addition on Fedora distros there are files under /etc/yum.repos.d/* that contains base and mirror URLs for the repo used to install RPMs. Each file is in .ini format. On openSUSE and SLES, these are under /etc/zypp/repos.d

The licenses (when not deleted as in some CentOS Docker images) are found in /usr/share/licenses/<package name>/<license files> or /usr/share/doc/<package name>/<license files>

If using the rpm cli, this can create an XML like output:
./rpm --query --all --qf '[%{*:xml}\n]' --rcfile=./rpmrc --dbpath=<path to>/var/lib/rpm > somefile.xml
The .rcfile option may not be needed, but when using a fresh RPM build this is needed.

The RPM db may need to be rebuilt first when this is a bdb format from an older version than the bdb with which librpm was built.

Add reporting features

Being able to report CSV, SPDX and so on would be useful. We could either:

  1. support all scancode-toolkit plugins
  2. add a reporting module

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.