ajinabraham / libsast Goto Github PK

View Code? Open in Web Editor NEW

118.0 7.0 17.0 293 KB

Generic SAST Library

Home Page: https://opensecurity.in

License: GNU Lesser General Public License v3.0

Python 98.62% Java 1.08% Handlebars 0.30%

appsec sast semanticgrep semgrep patternmatch regex codeanalysis staticanalysis security static-analyzer

libsast's Introduction

libsast

Generic SAST for Security Engineers. Powered by regex based pattern matcher and semantic aware semgrep.

Made with in India

Support libsast

Donate via Paypal:
Sponsor the Project:

Install

pip install libsast

Pattern Matcher is cross-platform, but Semgrep supports only Mac and Linux.

Command Line Options

$ libsast
usage: libsast [-h] [-o OUTPUT] [-p PATTERN_FILE] [-s SGREP_PATTERN_FILE]
               [--sgrep-file-extensions SGREP_FILE_EXTENSIONS [SGREP_FILE_EXTENSIONS ...]]
               [--file-extensions FILE_EXTENSIONS [FILE_EXTENSIONS ...]]
               [--ignore-filenames IGNORE_FILENAMES [IGNORE_FILENAMES ...]]
               [--ignore-extensions IGNORE_EXTENSIONS [IGNORE_EXTENSIONS ...]]
               [--ignore-paths IGNORE_PATHS [IGNORE_PATHS ...]]
               [--show-progress] [-v]
               [path [path ...]]

positional arguments:
  path                  Path can be file(s) or directories

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output filename to save JSON report.
  -p PATTERN_FILE, --pattern-file PATTERN_FILE
                        YAML pattern file, directory or url
  -s SGREP_PATTERN_FILE, --sgrep-pattern-file SGREP_PATTERN_FILE
                        sgrep rules directory
  --sgrep-file-extensions SGREP_FILE_EXTENSIONS [SGREP_FILE_EXTENSIONS ...]
                        File extensions that should be scanned with sgrep
  --file-extensions FILE_EXTENSIONS [FILE_EXTENSIONS ...]
                        File extensions that should be scanned with pattern
                        matcher
  --ignore-filenames IGNORE_FILENAMES [IGNORE_FILENAMES ...]
                        File name(s) to ignore
  --ignore-extensions IGNORE_EXTENSIONS [IGNORE_EXTENSIONS ...]
                        File extension(s) to ignore in lower case
  --ignore-paths IGNORE_PATHS [IGNORE_PATHS ...]
                        Path(s) to ignore
  --show-progress       Show scan progress
  -v, --version         Show libsast version

Example Usage

$ libsast -s tests/assets/rules/semantic_grep/ -p tests/assets/rules/pattern_matcher/ tests/assets/files/
{
  "pattern_matcher": {
    "test_regex": {
      "files": [
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            28,
            28
          ],
          "match_position": [
            1141,
            1149
          ],
          "match_string": ".close()"
        }
      ],
      "metadata": {}
    },
    "test_regex_and": {
      "files": [
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            3,
            3
          ],
          "match_position": [
            52,
            66
          ],
          "match_string": "webkit.WebView"
        },
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            7,
            7
          ],
          "match_position": [
            194,
            254
          ],
          "match_string": ".loadUrl(\"file:/\" + Environment.getExternalStorageDirectory("
        }
      ],
      "metadata": {}
    },
    "test_regex_and_not": {
      "files": [
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            42,
            42
          ],
          "match_position": [
            1415,
            1424
          ],
          "match_string": "WKWebView"
        },
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            40,
            40
          ],
          "match_position": [
            1363,
            1372
          ],
          "match_string": "WKWebView"
        }
      ],
      "metadata": {}
    },
    "test_regex_and_or": {
      "files": [
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            50,
            50
          ],
          "match_position": [
            1551,
            1571
          ],
          "match_string": "telephony.SmsManager"
        },
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            58,
            58
          ],
          "match_position": [
            1973,
            1988
          ],
          "match_string": "sendTextMessage"
        }
      ],
      "metadata": {}
    },
    "test_regex_multiline_and_metadata": {
      "files": [
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            52,
            52
          ],
          "match_position": [
            1586,
            1684
          ],
          "match_string": "public void onRequestPermissionsResult(int requestCode,String permissions[], int[] grantResults) {"
        },
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            10,
            11
          ],
          "match_position": [
            297,
            368
          ],
          "match_string": "public static ForgeAccount add(Context context, ForgeAccount account) {"
        }
      ],
      "metadata": {
        "cwe": "CWE-1051 Initialization with Hard-Coded Network Resource Configuration Data",
        "description": "This is a rule to test regex",
        "foo": "bar",
        "masvs": "MSTG-STORAGE-3",
        "owasp-mobile": "M1: Improper Platform Usage",
        "owasp-web": "A10: Insufficient Logging & Monitoring",
        "severity": "info"
      }
    },
    "test_regex_or": {
      "files": [
        {
          "file_path": "tests/assets/files/test_matcher.test",
          "match_lines": [
            26,
            26
          ],
          "match_position": [
            1040,
            1067
          ],
          "match_string": "Context.MODE_WORLD_READABLE"
        }
      ],
      "metadata": {}
    }
  },
  "semantic_grep": {
    "errors": [
      {
        "code": 3,
        "level": "warn",
        "message": "Semgrep Core WARN - Lexical error in file tests/assets/files/test_matcher.test:40\n\tunrecognized symbols: !",
        "path": "tests/assets/files/test_matcher.test",
        "type": "Lexical error"
      },
    ],
    "matches": {
      "boto-client-ip": {
        "files": [
          {
            "file_path": "tests/assets/files/example_file.py",
            "match_lines": [
              4,
              4
            ],
            "match_position": [
              24,
              31
            ],
            "match_string": "c = boto3.client(host='8.8.8.8')"
          }
        ],
        "metadata": {
          "cwe": "CWE-1050 Excessive Platform Resource Consumption within a Loop",
          "description": "boto client using IP address",
          "owasp-web": "A8: Insecure Deserialization",
          "severity": "ERROR"
        }
      }
    }
  }
}

Python API

>>> from libsast import Scanner
>>> options = {'match_rules': '/Users/ajinabraham/Code/njsscan/njsscan/rules/pattern_matcher', 'sgrep_rules': '/Users/ajinabraham/Code/njsscan/njsscan/rules/semantic_grep', 'sgrep_extensions': {'', '.js'}, 'match_extensions': {'.hbs', '.sh', '.ejs', '.toml', '.mustache', '.tmpl', '.jade', '.json', '.ect', '.vue', '.yml', '.hdbs', '.tl', '.html', '.haml', '.dust', '.pug', '.tpl'}, 'ignore_filenames': {'bootstrap.min.js', '.DS_Store', 'bootstrap-tour.js', 'd3.min.js', 'tinymce.js', 'codemirror.js', 'tinymce.min.js', 'react-dom.production.min.js', 'react.js', 'jquery.min.js', 'react.production.min.js', 'codemirror-compressed.js', 'axios.min.js', 'angular.min.js', 'raphael-min.js', 'vue.min.js'}, 'ignore_extensions': {'.7z', '.exe', '.rar', '.zip', '.a', '.o', '.tz'}, 'ignore_paths': {'__MACOSX', 'jquery', 'fixtures', 'node_modules', 'bower_components', 'example', 'spec'}, 'show_progress': False}
>>> paths = ['../njsscan/tests/assets/dot_njsscan/']
>>> scanner = Scanner(options, paths)
>>> scanner.scan()
{'pattern_matcher': {'handlebar_mustache_template': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/ignore_ext.hbs', 'match_string': '{{{html}}}', 'match_position': (52, 62), 'match_lines': (1, 1)}], 'metadata': {'id': 'handlebar_mustache_template', 'description': 'The Handlebar.js/Mustache.js template has an unescaped variable. Untrusted user input passed to this variable results in Cross Site Scripting (XSS).', 'type': 'Regex', 'pattern': '{{{.+}}}|{{[ ]*&[\\w]+.*}}', 'severity': 'ERROR', 'input_case': 'exact', 'cwe': "CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')", 'owasp': 'A1: Injection'}}}, 'semantic_grep': {'matches': {'node_aes_ecb': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/lorem_scan.js', 'match_position': (16, 87), 'match_lines': (14, 14), 'match_string': "let decipher = crypto.createDecipheriv('aes-128-ecb', Buffer.from(ENCRYPTION_KEY), iv);"}], 'metadata': {'owasp': 'A9: Using Components with Known Vulnerabilities', 'cwe': 'CWE-327: Use of a Broken or Risky Cryptographic Algorithm', 'description': 'AES with ECB mode is deterministic in nature and not suitable for encrypting large amount of repetitive data.', 'severity': 'ERROR'}}, 'node_tls_reject': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/skip_dir/skip_me.js', 'match_position': (9, 58), 'match_lines': (9, 9), 'match_string': "        process.env['NODE_TLS_REJECT_UNAUTHORIZED'] = '0';"}, {'file_path': '../njsscan/tests/assets/dot_njsscan/skip_dir/skip_me.js', 'match_position': (9, 55), 'match_lines': (18, 18), 'match_string': '        process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";'}], 'metadata': {'owasp': 'A6: Security Misconfiguration', 'cwe': 'CWE-295: Improper Certificate Validation', 'description': "Setting 'NODE_TLS_REJECT_UNAUTHORIZED' to 0 will allow node server to accept self signed certificates and is not a secure behaviour.", 'severity': 'ERROR'}}, 'node_curl_ssl_verify_disable': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/skip_dir/skip_me.js', 'match_position': (5, 11), 'match_lines': (45, 51), 'match_string': '    curl(url,\n\n        {\n\n            SSL_VERIFYPEER: 0\n\n        },\n\n        function (err) {\n\n            response.end(this.body);\n\n        })'}], 'metadata': {'owasp': 'A6: Security Misconfiguration', 'cwe': 'CWE-599: Missing Validation of OpenSSL Certificate', 'description': 'SSL Certificate verification for node-curl is disabled.', 'severity': 'ERROR'}}, 'regex_injection_dos': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/lorem_scan.js', 'match_position': (5, 37), 'match_lines': (25, 27), 'match_string': '    var key = req.param("key");\n\n    // Regex created from user input\n\n    var re = new RegExp("\\\\b" + key);'}], 'metadata': {'owasp': 'A1: Injection', 'cwe': 'CWE-400: Uncontrolled Resource Consumption', 'description': 'User controlled data in RegExp() can make the application vulnerable to layer 7 DoS.', 'severity': 'ERROR'}}, 'express_xss': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/skip.js', 'match_position': (9, 55), 'match_lines': (7, 10), 'match_string': '        var str = new Buffer(req.cookies.profile, \'base64\').toString();\n\n        var obj = serialize.unserialize(str);\n\n        if (obj.username) {\n\n            res.send("Hello " + escape(obj.username));'}], 'metadata': {'owasp': 'A1: Injection', 'cwe': "CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')", 'description': 'Untrusted User Input in Response will result in Reflected Cross Site Scripting Vulnerability.', 'severity': 'ERROR'}}, 'generic_path_traversal': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/lorem_scan.js', 'match_position': (5, 35), 'match_lines': (36, 37), 'match_string': "    var filePath = path.join(__dirname, '/' + req.query.load);\n\n    fileSystem.readFile(filePath); // ignore: generic_path_traversal"}, {'file_path': '../njsscan/tests/assets/dot_njsscan/lorem_scan.js', 'match_position': (5, 35), 'match_lines': (42, 43), 'match_string': "    var filePath = path.join(__dirname, '/' + req.query.load);\n\n    fileSystem.readFile(filePath); // detect this"}], 'metadata': {'owasp': 'A5: Broken Access Control', 'cwe': 'CWE-23: Relative Path Traversal', 'description': 'Untrusted user input in readFile()/readFileSync() can endup in Directory Traversal Attacks.', 'severity': 'ERROR'}}, 'express_open_redirect': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/lorem_scan.js', 'match_position': (5, 26), 'match_lines': (49, 51), 'match_string': '    var target = req.param("target");\n\n    // BAD: sanitization doesn\'t apply here\n\n    res.redirect(target); //ignore: express_open_redirect'}], 'metadata': {'owasp': 'A1: Injection', 'cwe': "CWE-601: URL Redirection to Untrusted Site ('Open Redirect')", 'description': 'Untrusted user input in redirect() can result in Open Redirect vulnerability.', 'severity': 'ERROR'}}, 'node_deserialize': {'files': [{'file_path': '../njsscan/tests/assets/dot_njsscan/skip.js', 'match_position': (19, 45), 'match_lines': (8, 8), 'match_string': '        var obj = serialize.unserialize(str);'}], 'metadata': {'owasp': 'A8: Insecure Deserialization', 'cwe': 'CWE-502: Deserialization of Untrusted Data', 'description': "User controlled data in 'unserialize()' or 'deserialize()' function can result in Object Injection or Remote Code Injection.", 'severity': 'ERROR'}}}, 'errors': [{'type': 'SourceParseError', 'code': 3, 'short_msg': 'parse error', 'long_msg': 'Could not parse .njsscan as javascript', 'level': 'warn', 'spans': [{'start': {'line': 2, 'col': 20}, 'end': {'line': 2, 'col': 21}, 'source_hash': 'c60298be568bfb1325d92cbb3c0bc1450a25b85bb2e4000bdc3267c05f1c8c73', 'file': '.njsscan', 'context_start': None, 'context_end': None}], 'help': 'If the code appears to be valid, this may be a semgrep bug.'}, {'type': 'SourceParseError', 'code': 3, 'short_msg': 'parse error', 'long_msg': 'Could not parse no_ext_scan as javascript', 'level': 'warn', 'spans': [{'start': {'line': 1, 'col': 3}, 'end': {'line': 1, 'col': 5}, 'source_hash': 'f002e2a715be216987dd1b134e7b9fa6eef28e3caa82dead0109c4cdc489e089', 'file': 'no_ext_scan', 'context_start': None, 'context_end': None}], 'help': 'If the code appears to be valid, this may be a semgrep bug.'}]}}

Write you own Static Analysis tool

With libsast, you can write your own static analysis tools. libsast provides two matching engines:

Pattern Matcher
Semantic Grep

Pattern Matcher

Currently Pattern Matcher supports any language.

Use Regex 101 to write simple Python Regex rule patterns.

A sample rule looks like

- id: test_regex_or
  message: This is a rule to test regex_or
  input_case: exact
  pattern:
  - MODE_WORLD_READABLE|Context\.MODE_WORLD_READABLE
  - openFileOutput\(\s*".+"\s*,\s*1\s*\)
  severity: error
  type: RegexOr
  metadata:
    owasp-web: a1
    reference: http://foo.bar
    foo: Some extra metadata

A rule consist of

id : A unique id for the rule.
message: A description for the rule.
input_case: It can be exact, upper or lower. Data will be converted to lower case/upper case/as it is before comparing with the regex.
pattern: List of patterns depends on type.
severity: It can be error, warning or info.
type: Pattern Matcher supports Regex, RegexAnd, RegexOr, RegexAndOr, RegexAndNot.
metadata (optional): Define your own custom fields that you can use as metadata along with standard mappings.

1. Regex - if regex1 in input
2. RegexAnd - if regex1 in input and regex2 in input
3. RegexOr - if regex1 in input or regex2 in input
4. RegexAndOr -  if regex1 in input and (regex2 in input or regex3 in input)
5. RegexAndNot - if regex1 in input and not regex2 in input

Example: Pattern Matcher Rule

Test your pattern matcher rules

$ libsast -p tests/assets/rules/pattern_matcher/patterns.yaml tests/assets/files/

Inbuilt Standard Mapping Support

Metadata fields also support libsast standard mapping.

For example, the metadata field owasp-web: a1 will get expanded at runtime as owasp-web: 'A1: Injection'.

Currently Supports

Semantic Grep

Semantic Grep uses semgrep, a fast and syntax-aware semantic code pattern search for many languages: like grep but for code.

Currently it supports Python, Java, JavaScript, Go and C.

Use semgrep.dev to write semantic grep rule patterns.

A sample rule for Python code looks like

rules:
  - id: boto-client-ip
    patterns:
      - pattern-inside: boto3.client(host="...")
      - pattern-regex: '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
    message: "boto client using IP address"
    languages: [python]
    severity: ERROR
    metadata:
      owasp-web: a2
      owasp-mobile: m7
      cwe: cwe-1048
      foo: Some extra metadata

See semgrep documentation here.

Example: Semantic Grep Rule

Test your semgrep rules

$ libsast -s tests/assets/rules/semantic_grep/sgrep.yaml tests/assets/files/

Realworld Implementations

njsscan SAST is built with libsast pattern matcher and semantic grep.
nodejsscan nodejsscan is a static security code scanner for Node.js applications.
MobSF Static Code Analyzer for Android and iOS mobile applications.
mobsfscan mobsfscan is a static security code scanner for Mobile applications built for Android (Java, Kotlin) & iOS (Swift, Objective C).

libsast's People

Contributors

Stargazers

Watchers

Forkers

chanpu9 hatchetxuexi azharanees riusksk orziruo opensecurityin silentsoul04 rotem-cider ksw9722 polling-repo-continua pombredanne maatticlabs tengfei1010 botasb lgtm-migrator ohyeah521 chuqingr

libsast's Issues

I have some problem for MOBSF

My apps contain FFmpeg ，and the customer needs to add the relro safe compilation option to the ffmpeg library. After I added the option during compilation, I used linux's checksec to detect library and found that relro has been added. However, when I package library into the app and put it into mobsf for detection, mobsf reminds me that my library does not contain relro. Now I have no good way. I hope I can get help。

Fix bug in --ignore-paths option on Windows

Bug description

The implementation of the --ignore-paths option is not working correctly on Windows.
Let’s assume our working directory is C:\Users\Administrator\Documents\project and includes the following directories and files:

rules\test.yaml
src\to_ignore\file_to_ignore.txt

Contents of test.yaml:

- id: test_rule
  message: >-
    test message
  type: Regex
  pattern: test
  severity: INFO
  input_case: exact

Contents of file_to_ignore.txt:

test

Libsast command executed in in powershell:

libsast -p .\rules\test.yaml .\src\ --ignore-paths src\to_ignore

Expected behavior

Libsast should ignore the path and not display any output.

Actual behavior

Libsast does not ignore the path and outputs the following:

{
  "pattern_matcher": {
    "test_rule": {
      "files": [
        {
          "file_path": "src/to_ignore/file_to_ignore.txt",
          "match_lines": [
            1,
            1
          ],
          "match_position": [
            1,
            4
          ],
          "match_string": "test"
        }
      ],
      "metadata": {
        "description": "test message",
        "severity": "INFO"
      }
    }
  }
}

Solution

The validate_file() function within scanner.py is implemented in the following way:

ignore_paths = any(pp in path.as_posix() for pp in self.ignore_paths)

As a result, you check if an ignored path (=string) is found in a posix representation of the file path. This check will work on *nix systems, but not on Windows because backslashes are used to separate directories and files within the path.

Please consider converting the pp variable to a Path and using the posix representation for pp as well so you compare the same path representations with each other.

ImportError: Cannot import name 'semgrep_main' from 'semgrep' in libsast

Issue Description:

I encountered an ImportError when trying to use libsast with semgrep. It seems like the semgrep_main function is no longer available or has been moved in the latest version of semgrep, causing libsast to fail when invoking it.

Steps to Reproduce:

Ensure Python, semgrep, and libsast are installed.
Run libsast with a command similar to: $ libsast -s ./log4j.yaml ./log4j-injection.java

Expected Behavior:

libsast should successfully scan the specified files without any import errors.

Actual Behavior:

Received an ImportError indicating that semgrep_main cannot be imported from the 'semgrep' package. Here's the traceback for reference:

$ libsast -s ./log4j.yaml ./log4j-injection.java
Traceback (most recent call last):
  File "/home/cpuu/anaconda3/bin/libsast", line 8, in <module>
    sys.exit(main())
  File "/home/cpuu/anaconda3/lib/python3.10/site-packages/libsast/__main__.py", line 93, in main
    result = Scanner(options, args.path).scan()
  File "/home/cpuu/anaconda3/lib/python3.10/site-packages/libsast/scanner.py", line 65, in scan
    self.options).scan(valid_paths)
  File "/home/cpuu/anaconda3/lib/python3.10/site-packages/libsast/core_sgrep/semantic_sgrep.py", line 40, in scan
    sgrep_out = invoke_semgrep(paths, self.scan_rules)
  File "/home/cpuu/anaconda3/lib/python3.10/site-packages/libsast/core_sgrep/helpers.py", line 13, in invoke_semgrep
    from semgrep import semgrep_main
ImportError: cannot import name 'semgrep_main' from 'semgrep' (/home/cpuu/anaconda3/lib/python3.10/site-packages/semgrep/__init__.py)

To further investigate the issue, I conducted a basic test to verify the importability of semgrep in Python. Here are my findings:

Launching Python and importing semgrep as a module works without any issues, indicating that the semgrep package is installed correctly and is recognized by Python.

$ python
Python 3.10.9 (main, Mar  1 2023, 18:23:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import semgrep

This command completes without any errors, confirming that the basic installation of semgrep is intact and functional.
However, when attempting to specifically import semgrep_main from semgrep, I encounter an ImportError, which suggests that the issue is not with the semgrep package installation but rather with the accessibility or existence of the semgrep_main function within the package.

>>> from semgrep import semgrep_main
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'semgrep_main' from 'semgrep' (/home/cpuu/anaconda3/lib/python3.10/site-packages/semgrep/__init__.py)

This test highlights that the issue specifically revolves around the semgrep_main import, aligning with the error encountered when using libsast.

These observations suggest that there may have been changes in the semgrep package that affected the availability of semgrep_main, leading to compatibility issues with libsast. This additional information should help in diagnosing the root cause of the ImportError and in determining the appropriate version compatibility between libsast and semgrep.

Environment:

OS: Ubuntu 22.04 (WSL Linux)
Python version: 3.10.9
semgrep version: 1.62.0
libsast version: v2.0.3

Attempts to Resolve:

I've checked for updates to both libsast and semgrep but haven't found any specific information regarding changes to semgrep_main. It appears that recent updates to semgrep may have deprecated or moved this function, causing compatibility issues with libsast.

Compatibility Question:

In addition to the above issue, I would like to inquire about the compatibility of libsast with semgrep versions. Given the ImportError encountered, it appears there might be a mismatch in version compatibility between libsast and the current version of semgrep I am using (1.62.0).

Could you please provide guidance on which version of semgrep is optimized for use with libsast v2.0.3? Understanding the recommended version could help in resolving the import error and ensure smooth operation of libsast for my projects.

Thank you for your support and looking forward to your advice on this matter.

Parallelizing the code

H there,

Thank you for your code. It's quite useful.
I am thinking whether could it be improved by parallelizing the for loops through the rules. When there are many rules, it is quite slow.
Any thought about this?

Regards!

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

The `files` array order is not consistent

MobSF/mobsfscan#30

bump semgrep

bump to https://github.com/returntocorp/semgrep/releases/tag/v0.117.0 or0.111.0

ValueError: need at most 63 handles, got a sequence of length 66

Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python3\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "C:\Python3\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Python3\lib\multiprocessing\pool.py", line 519, in _handle_workers
cls._wait_for_updates(current_sentinels, change_notifier)
File "C:\Python3\lib\multiprocessing\pool.py", line 499, in _wait_for_updates
wait(sentinels, timeout=timeout)
File "C:\Python3\lib\multiprocessing\connection.py", line 884, in wait
ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
File "C:\Python3\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 66

Improve operation efficiency

Thank you very much for your open source. When I used this project, I found a problem. It's too slow. Can we use multithreading or multiprocessing to improve efficiency? It takes 30 minutes or more to scan a 20m apk file when it is used in mobfs.

This is the java source code. You can test it，
java_source.zip

This is the matching rule I use：
https://github.com/MobSF/Mobile-Security-Framework-MobSF/blob/master/mobsf/StaticAnalyzer/views/android/rules/android_rules.yaml

libsast-1.3.5-py3.9.egg-info/requires.txt

Would it be possible to update this from --
5:semgrep==0.34.0

to --
5:semgrep>=0.34.0

?? I am trying to include njsscan in the BlackArch repos and this is failing my initial build.

---- snip ----
File "/usr/lib/python3.9/site-packages/pkg_resources/init.py", line 3251, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python3.9/site-packages/pkg_resources/init.py", line 569, in _build_master
return cls._build_from_requirements(requires)
File "/usr/lib/python3.9/site-packages/pkg_resources/init.py", line 582, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/usr/lib/python3.9/site-packages/pkg_resources/init.py", line 770, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'semgrep==0.34.0' distribution was not found and is required by libsast

Update: our install includes semgrep 0.35.0

Much appreciated.
~!>D

Mobile Top 10 2023: Updates

https://owasp.org/www-project-mobile-top-10/

semgrep v0.84.0 support

Can libsast be updated to support semgrep v0.84.0?

Interested in using mobsfscan for an existing project that already uses semgrep v0.84.0

(Related to MobSF/mobsfscan#32)

OSError: [Errno 24] Too many open files

[ERROR] 2023-09-01 09:54:34 - libsast scan
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\venv\lib\site-packages\libsast\core_matcher\pattern_matcher.py", line 90, in pattern_matcher
data = file_path.read_text('utf-8', 'ignore')
File "C:\Python3\lib\pathlib.py", line 1266, in read_text
with self.open(mode='r', encoding=encoding, errors=errors) as f:
File "C:\Python3\lib\pathlib.py", line 1252, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "C:\Python3\lib\pathlib.py", line 1120, in _opener
return self._accessor.open(self, flags, mode)
OSError: [Errno 24] Too many open files: 'M:\uploads\b1436c59b24ebc3ab8632f24aaa16cb4\java_source\com\taobao\login4android\membercenter\security\SecurityEntranceActivity.java'

res = scanner.scan()

File "D:\venv\lib\site-packages\libsast\scanner.py", line 58, in scan
results['pattern_matcher'] = PatternMatcher(
File "D:\venv\lib\site-packages\libsast\core_matcher\pattern_matcher.py", line 58, in scan
results = pool.map(
File "C:\Python3\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Python3\lib\multiprocessing\pool.py", line 771, in get
raise self._value
libsast.exceptions.RuleProcessingError: Rule processing error.

Cannot detect "else:" condition in "choice.yaml"

ENVIRONMENT

OS and Version: Ubuntu 22.04.3 LTS (Jammy Jellyfish) on WSL2
Python Version: 3.10.12
MobSF Version: v3.7.9 beta

EXPLANATION OF THE ISSUE

Detection patterns like "-id: rule3" in "choice.yaml" are not working.
The "else:" condition in "choice_matcher.py" may not be working.

STEPS TO REPRODUCE THE ISSUE

Upload the Android app "AndroGoa.apk" to MobSF.
"NIAP ANALYSIS v1.3" only detects 10 locations
*MobSF Version: 13 locations detected in v3.7.6

ADDITIONAL INFORMATION

It may be cured by the following method.
current situation

def add_finding(self, results):
    """Add Choice Findings."""
    for res_list in results:
        if not res_list:
            continue
        for match_dict in res_list:
            all_matches = match_dict['all_matches']
            matches = match_dict['matches']
            rule = match_dict['rule']
            if all_matches:
                selection = rule['selection'].format(list(all_matches))
            elif matches:
                select = rule['choice'][min(matches)][1]
                selection = rule['selection'].format(select)
            elif rule.get('else'):
                selection = rule['selection'].format(rule['else'])
            else:
                continue
            self.findings[rule['id']] = self.get_meta(rule, selection)

Potential Issues

Evaluation of all_matches and matches:
all_matches and matches are evaluated as False even when they are empty, which should lead to the else condition being executed if there are no matching items.
However, if all_matches or matches are empty sets or lists, both elif matches: and elif rule.get('else'): may be evaluated as False, preventing the else condition from being executed.
Placement of elif Conditions:
The placement of elif rule.get('else'): after all_matches and matches might lead to situations where the else condition is not appropriately evaluated, even when they are empty.
Use of continue:
The continue statement following the else block is used to move to the next iteration if rule.get('else') is False (i.e., the else key doesn't exist). However, this might lead to scenarios where the else key exists but is still skipped.
Improvement proposal

def add_finding(self, results):
    """Add Choice Findings."""
    for res_list in results:
        if not res_list:
            continue
        for match_dict in res_list:
            all_matches = match_dict['all_matches']
            matches = match_dict['matches']
            rule = match_dict['rule']

            # Check the else condition if all_matches and matches are empty
            if all_matches:
                selection = rule['selection'].format(list(all_matches))
            elif matches:
                select = rule['choice'][min(matches)][1]
                selection = rule['selection'].format(select)
            else:
                # Use the else condition if both all_matches and matches are empty
                selection = rule['selection'].format(rule.get('else', ''))

            self.findings[rule['id']] = self.get_meta(rule, selection)

With this change, the else condition will be properly evaluated when both all_matches and matches are empty, ensuring the operation works as expected.

LOG FILE

none

api Scanner.scan() returns empty results randomly

Running exactly the same code returns sometimes results and sometimes and empty dict.
Tried with 1.5.2 version on Ubuntu-20.04 with python 3.8

libsast scan java file will throw error on Mac M1

the error stack :
semgrep_main.main(
File "/opt/homebrew/lib/python3.9/site-packages/semgrep/semgrep_main.py", line 202, in main
) = CoreRunner(
File "/opt/homebrew/lib/python3.9/site-packages/semgrep/core_runner.py", line 346, in invoke_semgrep
) = self._run_rules_direct_to_semgrep_core(rules, target_manager, profiler)
File "/opt/homebrew/lib/python3.9/site-packages/semgrep/core_runner.py", line 291, in _run_rules_direct_to_semgrep_core
core_run = sub_run(cmd, stdout=subprocess.PIPE, stderr=stderr)
File "/opt/homebrew/lib/python3.9/site-packages/semgrep/util.py", line 130, in sub_run
result = subprocess.run(cmd, **kwargs) # nosem: python.lang.security.audit.dangerous-subprocess-use.dangerous-subprocess-use
File "/opt/homebrew/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/homebrew/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/homebrew/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/opt/homebrew/lib/python3.9/site-packages/semgrep/bin/semgrep-core'