Giter Site home page Giter Site logo

anthonyharrison / sbom4python Goto Github PK

View Code? Open in Web Editor NEW
21.0 2.0 6.0 160 KB

A tool to generate a SBOM (Software Bill of Materials) for an installed Python module

License: Apache License 2.0

Python 100.00%
cyclonedx devsecops python sbom sbom-generator security spdx

sbom4python's Introduction

SBOM4Python

The SBOM4Python is a free, open source tool to generate a SBOM (Software Bill of Materials) for an installed Python module in a number of formats including SPDX and CycloneDX. It identifies all of the dependent components which are explicity defined (typically via requirements.txt file) or implicitly as a hidden dependency.

It is intended to be used as part of a continuous integration system to enable accurate records of SBOMs to be maintained and also to support subsequent audit needs to determine if a particular component (and version) has been used.

Installation

To install use the following command:

pip install sbom4python

Alternatively, just clone the repo and install dependencies using the following command:

pip install -U -r requirements.txt

The tool requires Python 3 (3.7+). It is recommended to use a virtual python environment especially if you are using different versions of python. virtualenv is a tool for setting up virtual python environments which allows you to have all the dependencies for the tool set up in a single environment, or have different environments set up for testing using different versions of Python.

Issues with Windows Installation

When running on Windows, if you get the following error

ImportError: failed to find libmagic. Check your installation

This is because of a mismatch with the installation of the magic library. To resolve, please issue the following commands

pip uninstall python-magic
pip uninstall python-magic-bin

pip install python-magic
pip install python-magic-bin

Usage

usage: sbom4python [-h] [-m MODULE] [--exclude-license] [--include-file] [-d]
                   [--sbom {spdx,cyclonedx}] [--format {tag,json,yaml}]
                   [-o OUTPUT_FILE] [-g GRAPH] [-V]

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit

Input:
  -m MODULE, --module MODULE
                        identity of python module
  --exclude-license     suppress detecting the license of components
  --include-file        include reporting files associated with module

Output:
  -d, --debug           add debug information
  --sbom {spdx,cyclonedx}
                        specify type of software bill of materials (sbom) to
                        generate (default: spdx)
  --format {tag,json,yaml}
                        specify format of software bill of materials (sbom) (default: tag)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        output filename (default: output to stdout)
  -g GRAPH, --graph GRAPH
                        filename for dependency graph

Operation

The --module option is used to identify the Python module.

The --sbom option is used to specify the format of the generated SBOM (the default is SPDX). The --format option can be used to specify the formatting of the SBOM (the default is Tag Value format for a SPDX SBOM). JSON format is supported for both SPDX and CycloneDX SBOMs).

The --output-file option is used to control the destination of the output generated by the tool. The default is to report to the console but can be stored in a file (specified using --output-file option).

The tool attempts to determine the license of each module. This can be suppressed using the --exclude-license option in which case all licences are reported as 'NOASSERTION'.

The tool can optionally include the files associated with the installed module. This can be specified using the --include-file option. As the filenames are relative to the directory in which the tool is invoked, it is recommended that the tool is launched in a directory where the source files are available.

The --graph option is used to generate a dependency graph of the components within the SBOM. The format of the graph file is compatible with the DOT language used by the GraphViz application.

Licence

Licenced under the Apache 2.0 Licence.

The tool uses a local copy of the SPDX Licenses List which is released under Creative Commons Attribution 3.0 (CC-BY-3.0).

Limitations

This tool is meant to support software development and security audit functions. However the usefulness of the tool is dependent on the SBOM data which is provided to the tool. Unfortunately, the tool is unable to determine the validity or completeness of such a SBOM file; users of the tool are therefore reminded that they should assert the quality of any data which is provided to the tool.

When processing and validating licenses, the application will use a set of synonyms to attempt to map some license identifiers to the correct SPDX License Identifiers. However, the user of the tool is reminded that they should assert the quality of any data which is provided by the tool particularly where the license identifier has been modified.

Whilst PURL and CPE references are automatically generated for each Python module, the accuracy of such references cannot be guaranteed as they are dependent on the validity of the data associated with the Python module.

Feedback and Contributions

Bugs and feature requests can be made via GitHub Issues.

sbom4python's People

Contributors

anthonyharrison avatar eyu-dev avatar vargenau avatar you-ne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sbom4python's Issues

Dependency Tree vs Dependency Graph

As far as I have understood the implementation of the scanner.py module, it builds a dependency tree. In the end, every package A has at most one unique parent, which requires A.

IMO this assumption is not correct, because multiple packages may require A as a first-level dependency:

Example (obtained using pip show XY):

  • flask requires: click, itsdangerous, Jinja2, Werkzeug
  • jinja2requires: MarkupSafe
  • Werkzeug requires: MarkupSafe

Here, MarkupSafe has multiple parents and thus one relationship is missed.

sbom4python should rather build a dependency graph than a dependency in order to ensure not to miss any relationships in the SBOMs.

failed to use this tool to generate SBOM from utf-8 files

Thank you four your work, but I encountered the following error when using SBOM to generate files, could you please give me some advice? The logs are as followed:
D:\tool\software\anaconda\envs\KNsbom\Scripts>sbom4python -m D:\the_code\python --sbom spdx --format json -o D:\the_code\STM32project\2-1STM32-22112801\output_sbom.json
Traceback (most recent call last):
File "D:\tool\software\anaconda\envs\KNsbom\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "D:\tool\software\anaconda\envs\KNsbom\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\tool\software\anaconda\envs\KNsbom\Scripts\sbom4python.exe_main
.py", line 7, in
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\sbom4python\cli.py", line 131, in main
sbom_scan = SBOMScanner(
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\sbom4python\scanner.py", line 27, in init
self.sbom_package = SBOMPackage()
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\lib4sbom\data\package.py", line 13, in init
self.license = LicenseScanner()
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\lib4sbom\license.py", line 18, in init
self.licenses = json.load(licfile)
File "D:\tool\software\anaconda\envs\KNsbom\lib\json_init
.py", line 293, in load
return loads(fp.read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 87462: illegal multibyte sequence

It seems this the work has voked in dealing with files coding not in 'gbk'. I doubt if i use the wrong command, the command is as followed:
sbom4python -m D:\the_code\python --sbom spdx --format json -o D:\the_code\STM32project\2-1STM32-22112801\output_sbom.json

Could you please give me some advice? Thank you very much.

Is there any support for `conda` modules?

I have a python project that I would like to create an sbom for. I currently have it set up using conda, but that means that some requirements (e.g. opencv), have a different name in my requirements file (e.g. opencv instead of opencv-python. This means that if I try to build the module, I can't pip install the resulting wheels as it tries to pip install the requirements and fails. What is the right way to use sbom4python for a conda project?

Problems running sbom4python on Windows

Hi Anthony,

I tried to run sbom4python on Windows. The result was

PS E:\Software\Python\sbom4python> sbom4python -m capycli --sbom cyclonedx --format json -o sbom_AH.json
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\Python311\Scripts\sbom4python.exe\__main__.py", line 4, in <module>
  File "C:\Program Files\Python311\Lib\site-packages\sbom4python\cli.py", line 14, in <module>
    from sbom4python.scanner import SBOMScanner
  File "C:\Program Files\Python311\Lib\site-packages\sbom4python\scanner.py", line 12, in <module>
    from sbom4files.filescanner import FileScanner
  File "C:\Program Files\Python311\Lib\site-packages\sbom4files\filescanner.py", line 7, in <module>
    import magic
  File "C:\Program Files\Python311\Lib\site-packages\magic\__init__.py", line 209, in <module>
    libmagic = loader.load_lib()
               ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\magic\loader.py", line 49, in load_lib
    raise ImportError('failed to find libmagic.  Check your installation')
ImportError: failed to find libmagic.  Check your installation

I assume all libraries have been installed:

pip list
Package                  Version     Editable project location
------------------------ ----------- ----------------------------------------
aiofiles                 22.1.0
aiosqlite                0.18.0
anyio                    3.6.2
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
attrs                    22.1.0
Babel                    2.12.1
backcall                 0.2.0
beautifulsoup4           4.11.2
binaryornot              0.4.4
black                    23.3.0
bleach                   5.0.1
BomConverter             0.1         E:\Siemens\Software\Python\bom-converter
boolean.py               4.0
certifi                  2022.12.7
cffi                     1.15.1
chardet                  5.1.0
charset-normalizer       2.1.1
click                    8.1.3
colorama                 0.4.6
comm                     0.1.2
commonmark               0.9.1
contourpy                1.0.7
cycler                   0.11.0
cyclonedx-bom            3.11.0
cyclonedx-python-lib     3.1.5
dateparser               1.1.7
debugpy                  1.6.6
decorator                5.1.1
defusedxml               0.7.1
docutils                 0.19
et-xmlfile               1.1.0
executing                1.2.0
fastjsonschema           2.16.3
filelock                 3.11.0
flake8                   5.0.4
fonttools                4.39.0
fqdn                     1.5.1
idna                     3.4
importlib-metadata       5.2.0
ipykernel                6.21.3
ipython                  8.11.0
ipython-genutils         0.2.0
isoduration              20.11.0
isort                    5.12.0
jaraco.classes           3.2.3
jedi                     0.18.2
Jinja2                   3.1.2
json5                    0.9.11
jsonpointer              2.3
jsonschema               4.17.3
jupyter_client           8.0.3
jupyter_core             5.2.0
jupyter-events           0.6.3
jupyter_server           2.4.0
jupyter_server_fileid    0.8.0
jupyter_server_terminals 0.4.4
jupyter_server_ydoc      0.6.1
jupyter-ydoc             0.2.3
jupyterlab               3.6.1
jupyterlab-pygments      0.2.2
jupyterlab_server        2.20.0
keyring                  23.13.1
kiwisolver               1.4.4
lib4sbom                 0.4.0
license-expression       30.1.0
MarkupSafe               2.1.2
matplotlib               3.7.1
matplotlib-inline        0.1.6
mccabe                   0.7.0
mistune                  2.0.5
more-itertools           9.0.0
mpmath                   1.3.0
mypy                     1.3.0
mypy-extensions          1.0.0
nbclassic                0.5.3
nbclient                 0.7.2
nbconvert                7.2.9
nbformat                 5.7.3
nest-asyncio             1.5.6
networkx                 3.1
notebook                 6.5.3
notebook_shim            0.2.2
numdifftools             0.9.41
numpy                    1.24.2
openpyxl                 3.1.2
packageurl-python        0.10.4
packaging                23.0
pandas                   1.5.3
pandocfilters            1.5.0
parso                    0.8.3
pathspec                 0.11.1
pickleshare              0.7.5
Pillow                   9.4.0
pip                      22.3.1
pip-requirements-parser  32.0.1
pkginfo                  1.9.2
platformdirs             3.1.1
prometheus-client        0.16.0
prompt-toolkit           3.0.38
psutil                   5.9.4
pure-eval                0.2.2
pycodestyle              2.9.1
pycparser                2.21
pyflakes                 2.5.0
Pygments                 2.13.0
pyparsing                3.0.9
pyrsistent               0.19.2
pyrsistent               0.19.2
python-dateutil          2.8.2
python-debian            0.1.49
python-json-logger       2.0.7
python-magic             0.4.27
python-magic-bin         0.4.14
pytz                     2022.7.1
pytz-deprecation-shim    0.1.0.post0
pywin32                  305
pywin32-ctypes           0.2.0
pywinpty                 2.0.10
PyYAML                   6.0
pyzmq                    25.0.0
readme-renderer          37.3
regex                    2022.10.31
regex                    2022.10.31
requests                 2.28.1
requests-toolbelt        0.10.1
reuse                    1.1.2
rfc3339-validator        0.1.4
rfc3986                  2.0.0
rfc3986-validator        0.1.1
rich                     12.6.0
sbom2dot                 0.3.0
sbom4files               0.3.0
sbom4python              0.10.0
scipy                    1.10.1
semantic-version         2.10.0
Send2Trash               1.8.0
setuptools               67.6.0
six                      1.16.0
sniffio                  1.3.0
sortedcontainers         2.4.0
soupsieve                2.4
stack-data               0.6.2
standard-bom-validator   0.1
StandardBomValidator     0.1
sympy                    1.11.1
terminado                0.17.1
tinycss2                 1.2.1
toml                     0.10.2
toml                     0.10.2
tomli                    2.0.1
torch                    2.0.0
tornado                  6.2
traitlets                5.9.0
twine                    4.0.2
types-colorama           0.4.15.12
typing_extensions        4.5.0
tzdata                   2022.7
tzdata                   2022.7
tzlocal                  4.2
tzlocal                  4.2
uri-template             1.2.0
urllib3                  1.26.13
wcwidth                  0.2.6
webcolors                1.12
webencodings             0.5.1
websocket-client         1.5.1
wheel                    0.34.2
y-py                     0.5.9
ypy-websocket            0.8.2
zipp                     3.11.0

I am running Python 3.11.0 on Windows 10.

Feature Request: License URL from product source

I noticed the new version has the SPDX short-form license and a url like so:

           "id": "PSF-2.0",
            "url": "https://opensource.org/licenses/Python-2.0"

I've been told by a few of our teams that they'd rather have a link to the license where it appears in the source for the component, rather than a central site with license texts. Apparently we've had some issues where the reported license doesn't match the one in the code, and one of our legal reps now requires everyone to dig up the source code license file to verify and validate.

Obviously this isn't your problem, since you don't have to work with a grumpy legal rep, but I figured I'd put it in as a feature request just in case you can come up with a genius way to make this an option in the future. (Since github has a standard location for licenses, it might be easy to find in some cases but likely not all.)

BUGFIX: various bugs when a line of `pip show module` does not contain an ":" delimited entry.

Description

Whenever the result of pip show module, that is stored in out, then parsed line by line (which are stored in the array entry);
contains a line that:

  1. either contains a line wich can't be split with ":" as a delimiter because it doesn't contain it.
  2. either a line that contains the ":" character, but where it is not used to delimit a relevant field for sbom4python.

A bug arises.

In case 1, the execution stops with a "list index out of range" error.
In case 2, the resulting SBOM contains meaningless fields.

CASE 1 error trace:

Traceback (most recent call last):
  File "/home/user/project/.venv/bin/sbom4python", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/cli.py", line 134, in main
    sbom_scan.process_python_module(module_name)
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 213, in process_python_module
    self.analyze(self.get("Name"), self.get("Requires"))
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 207, in analyze
    if self.process_module(r, parent):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 78, in process_module
    line.split(f"{entry[0]}:", 1)[1].strip().rstrip("\n")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

It seems that some modules return a uncommonly formatted value when pip show module, for example kiwisolver.

Reproduction

To reproduce this error:
after creating a fresh .venv (python 3.11.2 for me), run :

pip install sbom4python kiwisolver
sbom4python -d -m kiwisolver

Invalid SPDX generated

The SPDX file is in some cases invalid because of incorrect license identifiers.

scancode-toolkit.spdx.txt

Examples in the above scan:

PackageLicenseConcluded: Apache-2
PackageLicenseConcluded: ASL 2.0
PackageLicenseConcluded: BSD
PackageLicenseConcluded: LGPL
PackageLicenseConcluded: MIT/X

I understand the information is taken from a package metadata that is not in SPDX format, but you should not output it as it is.
Or you are able to map it to a correct SPDX identifier, or you should create a custom LicenseRef-

Feature request: Including optional feature's dependencies

I recently noticed a case where an SBOM that included twisted as a dependency was not listed as a dependency. After careful review, I found that twisted was installed as twisted[tls] and, as a consequence, additional sub-dependencies are installed. I unsuccessfully tried generating an sbom for twisted[tls]. As a workaround, I had to generate SBOMs for the additional sub-dependencies and merge them. It would be great if these can be added automatically by sbom4python given the correct command line input.

SPDX Relationships Semantics

First of all, thanks for the work on the nice and lightweight cli tool for creating SBOMs for Python projects.

Regarding SPDX SBOMs, I assume that sbom4pyhton currently generates dependencies with worng semantics. If I read the SPDX documentation on relationships correctly, I assume that a DEPENDS_ON relationship is more appropriate than a CONTAINS relationship to express the build and run dependency between two packages. CONTAINS is suitable for archieves, which physically contain a other files.

Example:

pip show jinja2
Name: Jinja2
Version: 3.1.2
Summary: A very fast and expressive template engine.
Home-page: https://palletsprojects.com/p/jinja/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD-3-Clause
Location: /Users/david/repos/python/sbom4python/env/lib/python3.10/site-packages
Requires: MarkupSafe
Required-by: Flask

Extract from the generated sbom.spdx.json:

{
      "spdxElementId": "SPDXRef-Package-4-jinja2",
      "relatedSpdxElement": "SPDXRef-Package-5-markupsafe",
      "relationshipType": "CONTAINS"
}

Regressions in v0.4.0

I have regenerated SBOM for cve-bin-tool and I see a couple of regressions.

The most obvious one is that whitespaces between words are now replaced with underscores:

-PackageSupplier: Person: Terri Oda
+PackageSupplier: Person: Terri_Oda
-PackageSupplier: Organization: Andrew Svetlov <[email protected]>
+PackageSupplier: Organization: Andrew_Svetlov_<[email protected]>

Some PackageSuppliers have gone missing but I guess it could be some real-life change? I don't know how you get this info.
Example from aiosignal and idna:

-PackageSupplier: Person: Nikolay Kim
+PackageSupplier: NOASSERTION
-PackageSupplier: Person: Kim Davies
+PackageSupplier: NOASSERTION

Some licences were lost. Example from idna:

-##### Reported license BSD-3-Clause
-PackageLicenseConcluded: BSD-3-Clause
-PackageLicenseDeclared: BSD-3-Clause
+##### Reported license
+PackageLicenseConcluded: NOASSERTION
+PackageLicenseDeclared: NOASSERTION

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.