Giter Site home page Giter Site logo

Comments (4)

pombredanne avatar pombredanne commented on September 1, 2024

@ivanayov Thanks for the report. license-expression is not exactly a license detection library, you want to use ScanCode for this... or you feed it with known license symbols and how they map to SPDX or ScanCode license keys.
Let me illustrate this with a snippet:

>>> expression = 'LGPLv2.1 and GPLv2 or GPL2'

I can parse any expression with a valid syntax:

>>> Licensing().parse(expression)
OR(AND(LicenseSymbol('LGPLv2.1', is_exception=False), 
LicenseSymbol('GPLv2', is_exception=False)), 
LicenseSymbol('GPL2', is_exception=False))

But this expression will not validate as I did not specify what my license symbols are:

>>> Licensing().validate(expression)
ExpressionInfo(
    original_expression='LGPLv2.1 and GPLv2 or GPL2',
    normalized_expression=None,
    errors=['Unknown license key(s): LGPLv2.1, GPLv2, GPL2'],
    invalid_symbols=['LGPLv2.1', 'GPLv2', 'GPL2']
)

If I feed the Licensing with license symbols (here a simple list of strings), then things will validate alright:

>>> symbols = ['LGPLv2.1', 'GPLv2', 'GPL2']
>>> Licensing(symbols=symbols).parse(expression)
OR(AND(LicenseSymbol('LGPLv2.1', is_exception=False), 
LicenseSymbol('GPLv2', is_exception=False)), 
LicenseSymbol('GPL2', is_exception=False))
>>> Licensing(symbols=symbols).validate(expression)
ExpressionInfo(
    original_expression='LGPLv2.1 and GPLv2 or GPL2',
    normalized_expression='(LGPLv2.1 AND GPLv2) OR GPL2',
    errors=[],
    invalid_symbols=[]
)

and unknown symbols will be reported:

>>> Licensing(symbols=symbols).parse('GPL2 and foobar')
AND(LicenseSymbol('GPL2', is_exception=False), LicenseSymbol('foobar', is_exception=False))
>>> Licensing(symbols=symbols).validate('GPL2 and foobar')
ExpressionInfo(
    original_expression='GPL2 and foobar',
    normalized_expression=None,
    errors=['Unknown license key(s): foobar'],
    invalid_symbols=['foobar']
)

Based on your message above, I assume that you want to get proper detected and normalized license from RPM packages in VMware photon?

If so the the right solution would be a combo of:

  • establish a mapping of the individual license keys used in RPMs to ScanCode keys (which means also implicitly to SPDX keys)
  • normalize the RPM-side expression syntax to a valid expression syntax. For instance, replace a comma by an 'AND' or if there is no 'AND' or 'OR' in an original expression, then replace the spaces with an AND, or if there are symbols that contain spaces, normalized them not to contain spaces (though this library can handle symbols with spaces too)
  • create a licensing with the symbols mapping
  • do a first lightweight parsing of the expression and check for any unknown symbols with validate
  • if some symbols do not exists, run ScanCode license detection on them and replace the detected expression in the parsed expression, and validate again

Eventually this should be what https://github.com/nexB/scancode-toolkit/blob/4be4ba976d8d732538e72db97b311af39ca81432/src/packagedcode/rpm.py#L381 does and there is an attempt in aboutcode-org/scancode-toolkit#2894 to improve this by @adii21-Ux

The general case is in https://github.com/nexB/scancode-toolkit/blob/4be4ba976d8d732538e72db97b311af39ca81432/src/packagedcode/licensing.py#L109 and https://github.com/nexB/scancode-toolkit/blob/4be4ba976d8d732538e72db97b311af39ca81432/src/licensedcode/match_spdx_lid.py

The main issue to track RPM-related license detection is in aboutcode-org/scancode-toolkit#2412 "Improve license detection of declared RPM licenses"

So in conclusion, this is something that would benefit from some love... Can I interest you in helping make this work for RPM packages in general and photon packages in particular? If we do it in ScanCode, this would be available to everyone, including any tool that sues ScanCode (such as tern that may be of direct interest to you since you mentioned images above)

from license-expression.

pombredanne avatar pombredanne commented on September 1, 2024

@ivanayov gentle ping... did my explanation make sense?

from license-expression.

ivanayov avatar ivanayov commented on September 1, 2024

Thank you very much @pombredanne! It was very helpful and detailed explanation.
I'd need to do an estimate and then would be happy to help on the issue, depending on how long it would take.

from license-expression.

pombredanne avatar pombredanne commented on September 1, 2024

gentle ping

from license-expression.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.