bearer / bearer Goto Github PK

View Code? Open in Web Editor NEW

1.7K 19.0 80.0 23.09 MB

Code security scanning tool (SAST) to discover, filter and prioritize security and privacy risks.

Home Page: https://docs.bearer.com

License: Other

Go 89.78% JavaScript 0.06% C 2.77% Shell 0.58% Open Policy Agent 0.95% Dockerfile 0.04% HTML 5.53% CSS 0.28%

appsec compliance devsecops devsecops-tools security security-tools dataflow gdpr privacy sast

bearer's Introduction

Scan your source code against top security and privacy risks.

Bearer CLI is a static application security testing (SAST) tool that scans your source code and analyzes your data flows to discover, filter and prioritize security and privacy risks.

Currently supporting: JavaScript/TypeScript (GA), Ruby (GA), PHP (GA), Java (GA), Go (GA), Python (Alpha) - Learn more

Getting Started - FAQ - Documentation - Report a Bug - Discord Community

Developer friendly static code analysis for security and privacy

bearer-cli-overview-cc.mp4

Bearer CLI scans your source code for:

Security risks and vulnerabilities using built-in rules covering the OWASP Top 10 and CWE Top 25, such as:
- A01: Access control (e.g. Path Traversal, Open Redirect, Exposure of Sensitive Information).
- A02: Cryptographic Failures (e.g. Weak Algorithm, Insecure Communication).
- A03: Injection (e.g. SQL Injection, Input Validation, XSS, XPath).
- A04: Design (e.g. Missing Encryption of Sensitive Data, Persistent Cookies Containing Sensitive Information).
- A05: Security Misconfiguration (e.g. Cleartext Storage of Sensitive Information in a Cookie or JWT).
- A07: Identification and Authentication Failures (e.g. Use of Hard-coded Password, Improper Certificate Validation).
- A08: Data Integrity Failures (e.g. Deserialization of Untrusted Data).
- A09: Security Logging and Monitoring Failures (e.g. Insertion of Sensitive Information into Log File).
- A10: Server-Side Request Forgery (SSRF).
Note: all the rules and their code patterns are accessible through the documentation.
Privacy risks with the ability to detect sensitive data flow such as the use of PII, PHI in your app, and components processing sensitive data (e.g. databases like pgSQL, third-party APIs such as OpenAI, Sentry, etc.). This helps generate a privacy report relevant for:
- Privacy Impact Assessment (PIA).
- Data Protection Impact Assessment (DPIA).
- Records of Processing Activities (RoPA) input for GDPR compliance reporting.

🚀 Getting started

Discover your most critical security risks and vulnerabilities in only a few minutes. In this guide, you will install Bearer CLI, run a security scan on a local project, and view the results. Let's get started!

Install Bearer CLI

The quickest way to install Bearer CLI is with the install script. It will auto-select the best build for your architecture. Defaults installation to ./bin and to the latest release version:

curl -sfL https://raw.githubusercontent.com/Bearer/bearer/main/contrib/install.sh | sh

Other install options

Homebrew

Using Bearer CLI's official Homebrew tap:

brew install bearer/tap/bearer

Update an existing installation with the following:

brew update && brew upgrade bearer/tap/bearer

Debian/Ubuntu

sudo apt-get install apt-transport-https
echo "deb [trusted=yes] https://apt.fury.io/bearer/ /" | sudo tee -a /etc/apt/sources.list.d/fury.list
sudo apt-get update
sudo apt-get install bearer

Update an existing installation with the following:

sudo apt-get update
sudo apt-get install bearer

RHEL/CentOS

Add repository setting:

$ sudo vim /etc/yum.repos.d/fury.repo
[fury]
name=Gemfury Private Repo
baseurl=https://yum.fury.io/bearer/
enabled=1
gpgcheck=0

Then install with yum:

  sudo yum -y update
  sudo yum -y install bearer

Update an existing installation with the following:

sudo yum -y update bearer

Docker

Bearer CLI is also available as a Docker image on Docker Hub and ghcr.io.

With docker installed, you can run the following command with the appropriate paths in place of the examples.

docker run --rm -v /path/to/repo:/tmp/scan bearer/bearer:latest-amd64 scan /tmp/scan

Additionally, you can use docker compose. Add the following to your docker-compose.yml file and replace the volumes with the appropriate paths for your project:

version: "3"
services:
  bearer:
    platform: linux/amd64
    image: bearer/bearer:latest-amd64
    volumes:
      - /path/to/repo:/tmp/scan

Then, run the docker compose run command to run Bearer CLI with any specified flags:

docker compose run bearer scan /tmp/scan --debug

The Docker configurations above will always use the latest release.

Binary

Download the archive file for your operating system/architecture from here.

Unpack the archive, and put the binary somewhere in your $PATH (on UNIX-y systems, /usr/local/bin or the like). Make sure it has permission to execute.

To update Bearer CLI when using the binary, download the latest release and overwrite your existing installation location.

Scan your project

The easiest way to try out Bearer CLI is with the OWASP Juice Shop example project. It simulates a realistic JavaScript application with common security flaws. Clone or download it to a convenient location to get started.

git clone https://github.com/juice-shop/juice-shop.git

Now, run the scan command with bearer scan on the project directory:

bearer scan juice-shop

A progress bar will display the status of the scan.

Once the scan is complete, Bearer CLI will output, by default, a security report with details of any rule findings, as well as where in the codebase the infractions happened and why.

By default the scan command use the SAST scanner, other scanner types are available.

Analyze the report

The security report is an easily digestible view of the security issues detected by Bearer CLI. A report is made up of:

The list of rules run against your code.
Each detected finding, containing the file location and lines that triggered the rule finding.
A stat section with a summary of rules checks, findings and warnings.

The OWASP Juice Shop example application will trigger rule findings and output a full report. Here's a section of the output:

...
HIGH: Sensitive data stored in HTML local storage detected. [CWE-312]
https://docs.bearer.com/reference/rules/javascript_lang_session
To skip this rule, use the flag --skip-rule=javascript_lang_session

File: juice-shop/frontend/src/app/login/login.component.ts:102

 102       localStorage.setItem('email', this.user.email)


=====================================

59 checks, 40 findings

CRITICAL: 0
HIGH: 16 (CWE-22, CWE-312, CWE-798, CWE-89)
MEDIUM: 24 (CWE-327, CWE-548, CWE-79)
LOW: 0
WARNING: 0

In addition of the security report, you can also run a privacy report.

Ready for the next step? Additional options for using and configuring the scan command can be found in configuring the scan command.

For more guides and usage tips, view the docs.

❓ FAQs

What makes Bearer CLI different from any other SAST tools?

SAST tools are known to bury security teams and developers under hundreds of issues with little context and no sense of priority, often requiring security analysts to triage issues manually.

The most vulnerable asset today is sensitive data, so we start there and prioritize findings by assessing sensitive data flows to highlight what is more critical, and what is not. This unique ability allows us to provide you with a privacy scanner too.

We believe that by linking security issues with a clear business impact and risk of a data breach, or data leak, we can build better and more robust software, at no extra cost.

In addition, by being Free and Open, extendable by design, and built with a great developer UX in mind, we bet you will see the difference for yourself.

What is the privacy scanner?

In addition of detecting security flaws in your code, Bearer CLI allows you to automate the evidence gathering process needed to generate a privacy report for your compliance team.

When you run Bearer CLI on your codebase, it discovers and classifies data by identifying patterns in the source code. Specifically, it looks for data types and matches against them. Most importantly, it never views the actual values—it just can’t—but only the code itself. If you want to learn more, here is the longer explanation.

Bearer CLI is able to identify over 120+ data types from sensitive data categories such as Personal Data (PD), Sensitive PD, Personally identifiable information (PII), and Personal Health Information (PHI). You can view the full list in the supported data types documentation.

Finally, Bearer CLI also lets you detect components storing and processing sensitive data such as databases, internal APIs, and third-party APIs. See the recipe list for a complete list of components.

Supported Language

Bearer CLI currently supports:

GA	JavaScript/TypeScript, Ruby, PHP, Java, Go
Beta	-
Alpha	Python

Learn more about language support.

How long does it take to scan my code? Is it fast?

It depends on the size of your applications. It can take as little as 20 seconds, up to a few minutes for an extremely large code base.

As a rule of thumb, Bearer CLI should never take more time than running your test suite.

In the case of CI integration, we provide a diff scan solution to make it even faster. Learn more.

What about false positives?

If you’re familiar with SAST tools, false positives are always a possibility.

By using the most modern static code analysis techniques and providing a native filtering and prioritizing solution on the most important issues, we believe we have dramatically improved the overall SAST experience.

We strive to provide the best possible experience for our users. Learn more about how we achieve this.

When and where to use Bearer CLI?

We recommend running Bearer CLI in your CI to check new PRs automatically for security issues, so your development team has a direct feedback loop to fix issues immediately.

You can also integrate Bearer CLI in your CD, though we recommend setting it to only fail on high criticality issues, as the impact for your organization might be important.

In addition, running Bearer CLI as a scheduled job is a great way to keep track of your security posture and make sure new security issues are found even in projects with low activity.

Make sure to read our integration strategy guide for more information.

✋ Get in touch

Thanks for using Bearer CLI. Still have questions?

Start with the documentation.
Have a question or need some help? Find the Bearer team on Discord.
Got a feature request or found a bug? Open a new issue.
Found a security issue? Check out our Security Policy for reporting details.
Find out more at Bearer.com

🤝 Contributing

Interested in contributing? We're here for it! For details on how to contribute, setting up your development environment, and our processes, review the contribution guide.

🚨 Code of conduct

Everyone interacting with this project is expected to follow the guidelines of our code of conduct.

🛡️ Security

To report a vulnerability or suspected vulnerability, see our security policy. For any questions, concerns or other security matters, feel free to open an issue or join the Discord Community.

🎓 License

Bearer CLI code is licensed under the terms of the Elastic License 2.0 (ELv2), which means you can use it freely inside your organization to protect your applications without any commercial requirements.

You are not allowed to provide Bearer CLI to third parties as a hosted or managed service without the explicit approval of Bearer Inc.

bearer's People

Contributors

Stargazers

Watchers

Forkers

attacker-codeninja 5l1v3r1 bunufi b1pb1p riencoertjens ombe-b opsbill vitalyford jermainlaforce mrarashel joaopfsilva jrcribb mszczodrak nanderoo bypasscc eltociear christopher-hayes halilertekin richardsonjf 1anisa1 sainadh06 nullze alkadia cvlabsio olegchumin mjp90 shasheen8 ye-muath woonhock abserari ansonchieng ravut8 x-tfrk bearxy123 ognz cereallkiller magnologan ramosslyz laozhudetui ericburden667 devgopher1990 wangoon gxx777 subratamal tayjustdeli mr305socal ihmeita pingani shanthshivam m00zh33 jeffmartson public-repos-backup rajanagori nipundev oneslash chenyansong1 rsohlot pks-os blazedfire0 kopp0ut h4r7w3l1 philippgoecke naimbiswas sneak71 cb-sl vibh1103 deltared1a elbae chasay aryanxk02 skandashield maarc lmrb-1968 geeknik secstarbot safe3 sec-fork willie-lin sixlettervariables

bearer's Issues

CONTRIBUTING.md missing note about env vars

CONTRIBUTING.md development section is missing a small note about the need to setup some basic env vars so tests run and the existance of .envrc.example.

Without this running tests as described in this file will fail and its confusing!

Docs: add config file overview

Add docs reference page on setting up a config file. May not be necessary given the mirroring of flags, but displaying the default file can provide value.

Spike architecture changes

Adding custom detectors currently requires work in the core more frequently than we'd like.

We've also been prioritising progress over refactoring/re-architecting, so there is also potential for improving the separation of concerns / readability of the codebase.

Integrate new detectors

Run custom detectors and convert output to legacy detections format.

Construct generic and ruby detectors
Create executor for ruby using those detectors
Call from detectors.ExtractWithDetectors ?
Translate detections to legacy format

Privacy Report

As the basics technical requirements of each laws are similar in their requirements (maintain an inventory of assets, know what data types are shared with 3rd parties, ensure data is adequately protected), we can make a Privacy Report that will tackle a large range of laws while automating the laborious tasks that legal teams require engineering to perform.

This feature will leverage Curio's detections capabilities and put it in a format that will make exploitation by legal teams easier. Our goal is to reduce the friction between engineering and legal teams.

1. Subjects and Data-types Inventory

This report will compute together all Subjects and Data Types detected through codebase scan.

To detect subject, we will use the "object" item when they are classified as a type of person. Our current list of Subjects is :

{TO-DO: retrieve the list of subjects we currently have embedded in the codebase}

Curio will also report the number of policies tested against the data found in the codebase. It will list how many are failing, their severity, and how many have passed.

Output format:

Subject	Data Types	Detection Count	Critical Policy Failure	High Policy Failure	Medium Policy Failure	Low Policy Failure	Passed Policy
Customer	Full Name	142	1	3	0	15	184
Customer	E-mail	84	0	0	4	1	184
Customer	Address	42	0	0	4	0	184
Customer	Password	21	1	0	0	0	184
Admin	Full Name	18	0	0	0	1	184
Admin	E-mail	12	0	0	0	1	184

Cardinality: one line per data-type per subject.

Must-Have output format is CSV. Other formats are nice to have.

2. Third-Parties Data Sharing Inventory

Curio has the ability to detect Third Party usage through codebase inspection. Read more about this here.

For each detected Third-Party, Curio will list all Subjects & Data Types determined to be exchanged through the integration. If Curio isn't capable of determining what Data are exchanged with a Third party, it will report an unknown state.

Curio will also report the number of policies tested against the data found in the codebase. It will list how many are failing, their severity, and how many have passed.

Output file:

Third Party	Subject	Data Types	Critical Policy Failure	High Policy Failure	Low Policy Failure	Passed Policy
Sentry	None	None	0	0	0	5
Stripe	Customer	Full Name, Email, Home Address, Credit Card Number	0	0	1	5
Algolia	Customer	Full Name, Address	0	1	0	3
Algolia	Employee	Full Name, Title, Salary, Home Address	1	0	0	3
Facebook	Unknown	Unknown	0	0	0	0

Cardinality: one line per Third-Party per Subject

Must-Have output format is CSV. Other formats are nice to have.

What Curio's privacy report won't provide

GDPR Processing Activities: to the best of our experience and knowledge, there isn't a one-size-fits-all for Processing Activities legal requirement. Each organization will have its own legal implementation of processing activities. As there isn't any clue in the codebase to build that part of the legal requirements of GDPR, we deliberately leave out of the feature that concept.

Add a policy about Secret leaking

Description

We have a secret detection feature but aren't showing these in the report nor have a policy setup for it.

https://github.com/Bearer/curio/blob/main/pkg/detectors/gitleaks/gitleaks.go

We should have them showing up in the dataflow report (in risk) and have them raise a critical policy

False positives from Rails JWT custom detection

Description

The Rails JWT custom detection currently detects the following code.

JWT.encode(ENV.fetch("SOME_SECRET"))

Expected Behavior

The above code should not be detected.

Actual Behavior

The above code is detected.

Your Environment

Version used: v0.19.0
Target codebase stack (e.g. Ruby): Ruby
Operating System and version:
Link to your project or code sample:

Extend Ruby file custom detection to cover IO.open

Description

The Ruby file custom detection currently detects CSV.open and File.open; it should support IO.open as well.

Expected Behavior

IO.open should be detected.

Actual Behavior

IO.open is not detected.

Your Environment

Version used: v0.19.0
Target codebase stack (e.g. Ruby): Ruby
Operating System and version:
Link to your project or code sample:

Curio doesn't run on vanilla CloudLinux, Centos & Debian vanilla setup with install script. (missing libs)

Description

Installation script isn't sufficient to have curio working on some Linux distributions

Expected Behavior

Actual Behavior

[centos@centos-brr-test ~]$ ./bin/curio scan bear-publishing/
./bin/curio: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by ./bin/curio)
./bin/curio: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./bin/curio)
./bin/curio: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./bin/curio)
./bin/curio: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./bin/curio)

[cloudlinux@archlinux-brr-test ~]$ ./bin/curio scan bear-publishing/
./bin/curio: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./bin/curio)
./bin/curio: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./bin/curio)

debian@debion-brr-test:~$ ./bin/curio scan bear-publishing/
./bin/curio: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./bin/curio)
./bin/curio: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./bin/curio)

Possible Fix

Make the install script install the required libs ?

Steps to Reproduce

Try installing curio on Debian / ArchiLniux / CentOS vanilla.

Context

QA Test

Your Environment

Version used: v0.19
Target codebase stack (e.g. Ruby): bear-publishing
Operating System and version: CentOS 7 , Debian 11, ArchLinux/CloudLinux 8

Empty scan results are not displayed properly

When a scan doesn't return any data found, the display is not very adapted.

Description

Here is an example:

Expected Behavior

Possible options:

Remove the line when nothing is found
Then, make sure, if nothing has been found at all, change the whole block.
Or change the sentence by "No data type found"

Actual Behavior

Bad display

Your Environment

Version used: 0.19

Include pattern in risks

We are currently setting the risk content to a dummy value. It should be the pattern that matched.

Non-supported language policy output suggests non-working command

When running policy scan on a project that doesn't support policy scans (JS), curio presents an incorrect command for running a dataflow scan.

Description

The final line of the output provides incorrect instructions

curio scan .
Scanning target .
 └ 100% [===============] (984/984, 56 files/s) [17s]
Running Detectors
Generating dataflow

The policy report is not yet available for your stack. Learn more at https://curio.sh/explanations/reports/

Though this doesn’t mean the curious bear comes empty-handed, it found:

- 13 unique data type(s), representing 66 occurrences, including PII, Personal Data (Sensitive).
- 2 database(s) storing 13 data type(s) including 0 encrypted data type(s).
- 3 external service(s).

Run the data flow report if you want the full output using: curio scan --report dataflow

Expected Behavior

curio scan --report dataflow [PATH]

Actual Behavior

curio scan --report data flow

Your Environment

Version used: 0.21.1
Target codebase stack (e.g. Ruby): javascript
Operating System and version:
Link to your project or code sample:

Allow `encrypted_` prefix to be marked as encrypted in SQL detections

detect_encrypted_ruby_class_properties is the verifier we use to mark a SQL detection as encrypted.

We could add this new processor here that would automatically flag the fields that are prefixed by encrypted_ as encrypted.

package bearer.db_encrypted

 import future.keywords

 default encrypted := false

 encrypted = true {
     startswith(lower(input.target.value.field_name),  "encrypted_")
 }

 verified_by := [
     {
         "detector": "db_encrypted",
     }

Add missing support to Ruby flow analysis

The flow analysis is currently fairly basic and doesn't handle all the cases the legacy "variable reconciliation" logic does.

Add missing support to Ruby object detector

The object detector from the spike doesn't handle all the cases we support in the legacy data types implementation.

Classes
Calls
Element reference

Support installation via the most common Linux package managers

Description

We would like to support "direct" installation — and, therefore, upgrades — via the most common Linux package managers.

The most common Linux package managers are those based on dpkg and rpm: specifically, apt-get and yum, respectively.

Proposed solution

We can use nFPM to build .deb and .rpm packages, and this tool integrates directly with GoReleaser. The build packages will be pushed into a separate curio-repo repository, to which users can point their package managers.

Trivy uses a setup from which we can draw inspiration:

User-facing interface to released packages

A dedicated repository is used to store the built packages; this repository is configured as a GitHub Pages site
The README file of the trivy-repo repository provides installation instructions for the supported package managers

Internal build mechanism for packages

The "release" entry point is a GitHub Action
The nfpms section of the GoReleaser configuration describes the specifics of the package build
Each package format has a dedicated script to update the released package in the trivy-repo repository: .deb, .rpm
Releases (and docker manifests) are signed using cosign

Option --workers is not respected

Description

The --workers option appears not to be respected.

Expected Behavior

The --workers option should control the number of workers that are spawned.

Actual Behavior

The --workers option appears to have no effect on the number of workers that are spawned.

Possible Fix

Perhaps we should remove the --workers option entirely, hiding it from the end user?

Your Environment

Version used: v0.19.0
Target codebase stack (e.g. Ruby): Ruby
Operating System and version:
Link to your project or code sample:

Support filters for ruby_password_length

We need to support OR and min/max integer constant value

Rework config file

The current config file is becoming way too big. It contains detectors and policies and its content.
We need to keep this folder as slim as possible

Let's add external folders for detectors and policies

current

policy:
    only-policy: []
    skip-policy: []

report:
    format: ""
    output: ""
    report: policies

scan:
    context: ""
    debug: false
    disable-domain-resolution: true
    domain-resolution-timeout: 3s
    force: false
    internal-domains: []
    custom_detector: # <--- to be removed
      # Long content
    policies: # <--- to be removed
      # Long content

after

policy:
    only-policy: []
    skip-policy: []

detector:
    only-detector: [] # <--- to add
    skip-detector: [] # <--- to add
  
report:
    format: ""
    output: ""
    report: policies

scan:
    context: ""
    debug: false
    disable-domain-resolution: true
    domain-resolution-timeout: 3s
    force: false
    internal-domains: []
    external-detector-dir: [] #  <--- to add
    external-policy-dir: [] # <--- to add

Add missing custom rule support

Implement custom rule settings/options that aren't yet supported

Ignored git folders/file are scanned while it shouldn't

Folders/file present /gitignore are scanned.

Description

Folders and files referenced in a .gitignore shouldn't be scanned.

Expected Behavior

Avoid scanning files and folder present in a .gitignore

Steps to Reproduce

Add a folder/file folder_to_ignore/test.txt in a sample app and add folder_to_ignore folder in the app .gitignore
Scan the sample app in debug mode .
Notice that it scan the files present in the folder_to_ignore folder.

Context

It slows down the scan a lot.

Your Environment

Version used: 0.19.0
Target codebase stack (e.g. Ruby): Ruby
Operating System and version: Mac OS 13
Link to your project or code sample: n/a

Increase test coverage on new internals

As this work is starting from a spike, there are no tests currently.

Do not classify `ajax.googleapis.com` as Google API

Description

Google Cloud APIs are matching with a wildcard *.googleapis.com but ajax.googleapis.com shouldn't be considered.

Expected Behavior

ajax.googleapis.com should not be classified and present in the component section

Actual Behavior

ajax.googleapis.com is classified as a Google API and is present in the component section

Build a docker image

Docker seems like an interesting option to help people get started with curio on any setup very easily.
In the end, it should be as simple as running docker pull bearer/curio:latest.

We should use a small image like alpine to run the binary

Links

Battle tests on Top 5k Ruby Projects and review policy results

We have the battle tests that we used to check the stats on the top 5k repositories per language.
It is being triggered by this workflow

Now that we have the policies in place, we would like to update the battle tests a bit to allow it to run the policies and to send the output of the policies on S3 (so that we can review the results).

NB: This means only run it on the Ruby projects only

Improve dataflow report command helper

Description

When the curio scan command is run on a project that is not supporting policies, it provides a summary of finding and an explainer on how to run the dataflow report - but the command provided is misleading users by not including the "repo" name to scan.

Possible Fix

Dynamically adapt the command reference curio scan --report dataflow by appending it the repo path used to run the original command. That way, the user can simply copy/paste the explainer line and it will work!

Steps to Reproduce

Run curio scan path-to-project on a project not support policies
Read the last line of the output that tell to run the report dataflow command

Context

We've seen quite a few people trusting the command explainer and see it failing because it didn't include their repo name into it, leading to user getting list.

Your Environment

Version used: 0.21
Target codebase stack (e.g. Ruby): anything else that Ruby

CR-025 .browserslistrc false positive

Description

.browserslistrc is a fairly common file used to indicate supported browsers in a webapp

Expected Behavior

Should not trigger CR-025

Actual Behavior

CRITICAL: Do not leak secrets. [CR-025]
https://curio.sh/reference/policies/#CR-025

Detected: Sensitive file name
File: .../.browserslistrc:1

file had the following content

defaults

Possible Fix

ignore this file

Steps to Reproduce

create a file called .browserslistrc
run curio scan

Your Environment

Version used: 0.22.0
Target codebase stack (e.g. Ruby): Ruby

Enhance policy output for variable policies

Currently, policy output displays a name, id, and section of code where the failure happened. For policies that may have multiple detection sub-types (like secret detection), it would be valuable to know specifically which sub-type caused the failure. Here are two possible proposals:

Option 1: Append details to the end of the name, without modifying the existing name.

CRITICAL: Do not leak secrets. AWS Access Token detected. [CR-025]
...

Option 2: Create new line in output block.

CRITICAL: Do not leak secrets. [CR-025]
https://curio.sh/reference/policies/#CR-025

Detected: AWS Access Token
File: integration/policies/testdata/leak/aws.js:1
...
...

Get rid of external dependencies in schema classification

we should decouple schema classification and its types.
Schema classification should have and use its own types instead of being tightly integrated with schema.Datatype

Wrong SHA when publishing the release on Homebrew tap

Description

I had to update the SHA based on the checksums.txt from the release after the latest release https://github.com/Bearer/curio/actions/runs/3750808752/jobs/6371000717

Bearer/homebrew-tap@07689b9

I downloaded the checksums.txt and I updated them.

Expected Behavior

brew update && brew upgrade curio doesn't raise an error for mismatch sha256

❯ brew upgrade curio
Warning: Treating curio as a formula. For the cask, use homebrew/cask/curio
==> Upgrading 1 outdated package:
bearer/curio/curio 0.19.0 -> 0.20.1
==> Fetching bearer/curio/curio
==> Downloading https://github.com/Bearer/curio/releases/download/v0.20.1/curio_0.20.1_darwin_arm64.tar.gz
Already downloaded: /Users/user/Library/Caches/Homebrew/downloads/adbf6450f6898eb4e4a9221bcca466404a0a3c2845ff13ba1740ad4ef6220782--curio_0.20.1_darwin_arm64.tar.gz
==> Upgrading bearer/curio/curio
  0.19.0 -> 0.20.1

🍺  /usr/local/homebrew/Cellar/curio/0.20.1: 5 files, 56.6MB, built in 2 seconds
==> Running `brew cleanup curio`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
Removing: /usr/local/homebrew/Cellar/curio/0.19.0... (5 files, 56.4MB)
Removing: /Users/user/Library/Caches/Homebrew/curio--0.19.0.tar.gz... (11.6MB)

Actual Behavior

❯ brew upgrade curio
Warning: Treating curio as a formula. For the cask, use homebrew/cask/curio
==> Upgrading 1 outdated package:
bearer/curio/curio 0.19.0 -> 0.20.1
==> Fetching bearer/curio/curio
==> Downloading https://github.com/Bearer/curio/releases/download/v0.20.1/curio_0.20.1_darwin_arm64.tar.gz
Already downloaded: /Users/user/Library/Caches/Homebrew/downloads/adbf6450f6898eb4e4a9221bcca466404a0a3c2845ff13ba1740ad4ef6220782--curio_0.20.1_darwin_arm64.tar.gz
Error: SHA256 mismatch
Expected: 5363835145d3dac64bc7dc6db6ee9c871fa0c3d343ed459f818adf856d4cdf9a
  Actual: 8098cf491f8f83843dfb2e6ae587aaadae099594f7727fa4c040669e74889db5

Context

This file is being used to generate and publish

https://github.com/Bearer/curio/blob/bf3b9056005881a2788e53fb40d5d742c8676470/goreleaser/publish_brew.yaml

Docs: add "what are recipes/how to add recipe" page

Add a page to the docs and/or contribution guide on how to add individual recipes.

Increase detections for third-party loggers

Proposal

We can improve our rules by adding some rules for the following monitoring tools, I've also looked at the number of downloads for their gems to give a rough idea of popularity.

OpenTelemetry

limited ruby support (just tracing) #520
Tracing we might want to monitor https://opentelemetry.io/docs/instrumentation/ruby/manual/#get-the-current-span

Used By

Appdynamics

Others to keep an eye on

Splunk
- Gem Deprecated (2017) https://github.com/splunk/splunk-sdk-ruby
- Integration now though REST API - covered by other policies?

Update GoReleaser config so that it announces

GoReleaser can be customized and has a particularly interesting option announce that we should be using.

It supports

Discord
LinkedIn
Mastodon
Mattermost
Reddit
Slack
SMTP
Teams
Telegram
Twitter
Webhook

Implement string detector

Create the string detector for Ruby.

string_content has literal value

interpolation:

identifier is string detection of the identifier (or * when there was no detection)

 a = "abc" # detection "abc"
 "#{a}sdaffds" # detection "abcsdaffds"

 a = call() # no detection
 "#{a}sdaffds" # detection "*sdaffds"

anything else is *

concatenation (+ operator, string node)

Cached detections shouldn't be reused if the binary has been killed

Description

We cache the detections based on the SHA of the repo being scanned and the version of curio to get faster results after the first scan.
Though, if the scan hasn't successfully completed (killed, ctrl-c, ...) we still reuse the cached version.

Expected Behavior

We should use the cached detections only if the previous scan has been successful.

Actual Behavior

We use the cached detections even when the previous scan has been killed.

Possible Fix

Use a temporary name and rename it once the process is complete.
- Check for the final name and not the temporary one
Add a line at the end of the report and only reuse this file if this line is included

`password` shouldn't show up in report for application missing encryption

Description

We should filter out the data category passwords from showing up in application missing encryption policy.

Expected Behavior

Considering the table below, you should not see an alert for password missing application encryption

CREATE TABLE public.users (
  -- ...
  first_name character varying,
  password character varying,
  -- ...
)

Actual Behavior

You see an alert

CI hangs when running the Unit Tests

Description

The CI seems to hang when running the integration tests.
This PR #250 attempted to address this but this behaviour seems to still be happening

See this actions https://github.com/Bearer/curio/actions/runs/3708056420/jobs/6285192136

Expected Behavior

The CI is reliable

Actual Behavior

The CI hangs

Possible Fix

Separate integration tests and Unit Tests from the CI
Fix the integration tests on GitHub Action

Steps to Reproduce

Issue a PR
Check the CI

Context

Only seems to happen on GitHub Actions.
I don't seem to be able to reproduce it locally for now.

No file for symlink causes curio to error

Description

Curio errors when processing symlink that does not exist in the project

Expected Behavior

The file is skipped

Actual Behavior

./curio scan  ~/development/mastodon/
Error: filesystem scan error: stat /Users/gotbadger/development/mastodon//public/500.html: no such file or directory
filesystem scan error: stat /Users/gotbadger/development/mastodon//public/500.html: no such file or directory

Possible Fix

Steps to Reproduce

clone https://github.com/mstdn/Mastodon
run curio scan

Context

Looks like https://github.com/mstdn/Mastodon/blob/main/public/500.html points to a file that does not exist initally in the project.

Your Environment

Version used: 0.21.1
Target codebase stack (e.g. Ruby): Ruby
Operating System and version: Darwin
Link to your project or code sample: https://github.com/mstdn/Mastodon

Unclear --timeout-file-second-per-bytes description

Description

It's unclear from the description what the --timeout-file-second-per-bytes does. Ideally, it would be clear without context of how other flags or functionality works.

Support multiple node types for pattern variables

Multiple node types are needed for the detect_rails_cookies rule

Streamline exports

At the moment we are exporting detections to report from inside composition in an ugly way, it feels very repetitive to do it for every type with type switches, It might be good that each detection handles its own export and that we introduce interface for exporting detection.

eg:

type Detection struct {
   MatchNode tree.Node
   ContextNode tree.Node
   Data Data
}

interface Data {
     Export() source.Source, schema.Schema 
}

// compositions parsefilefunc
func parseFile() {
  //....
   for _, detectorType := range []string{"class", "datatype"} {
	detections, err := evaluator.ForTree(tree.RootNode(), detectorType)
	for _, detection := range detection {
	   source, schema := detection.Export()
	   report.addDetection(detectorType, source, schema)
	}
   }
}

Encrypted cookies shouldn't be detected

Description

When a cookie is encrypted, we shouldn't raise an errors.

Expected Behavior

cookies.encrypted[:user_email] = user.email

Is not detected and doesn't throw a policy breach

Actual Behavior

cookies.encrypted[:user_email] = user.email

Is detected and throws a policy breach

Add support for Homebrew

Installation of curio requires to run a script https://github.com/Bearer/curio/blob/main/contrib/install.sh

This is good but MacOS user are more used to homebrew.

Provide GitHub Action

Provide a GitHub Action as a way to integrate to the CI.

Only performs a full scan for now.
Fail when any policies are detected

Add classification to datatype detector

Add classification logic to the datatype detector. The current implementation is a dummy one.

Implement insecure_url detector using string detector

Insecure URLs are strings which start with http:

This will be a generic detector (not language specific).

Rails cookies and session custom detections report Unique Identifiers as policy breaches

Description

The Rails cookies and session custom detectors currently return unique identifiers (author.user_id, for example) as policy breaches.

Expected Behavior

Unique Identifiers should be regarded as safe, and should not be reported as policy breaches.

Actual Behavior

Unique Identifiers are reported as policy breaches.

Possible Fix

Should we be triggering the policies at all for Unique Identifiers? Or should these just be regarded as not being risks?

Your Environment

Version used: v0.19.0
Target codebase stack (e.g. Ruby): Ruby
Operating System and version:
Link to your project or code sample:

`--only-policy` option doesn't check policy existence

Description

Running --only-policy (or --skip-policy) doesn't check if the policy IDs exist, leading to bad UX and misleading output.

Expected Behavior

Throw an error if using a policy ID that doesn't exist

Actual Behavior

Fail silently

Possible Fix

Check if policy IDs exist first

Steps to Reproduce

curio scan . --only-policy CR-000
curio scan . --skip-policy CR-000

Your Environment

Version used: 0.21.1
Target codebase stack (e.g. Ruby): Ruby
Operating System and version: Mac

Can't execute Curio on Ubuntu 22.10 without sudo permission

Description

Curio seems unable to create the json file at begining of scan without sudo permissions on Ubuntu 22.10.

Expected Behavior

Curio should be able to scan without super user permissions.

Actual Behavior

Works only with super user permissions .

ubuntu@d2-8-sbg5:~$ ./bin/curio scan bear-publishing/
{"level":"error","time":"2022-12-19T07:20:10Z","message":"failed to create path /tmpe566c0f8c7ffb00cb50b8bc7fa842e58801f054d-b1bbbdcccc1cda1137e82d75f34a6e7d8b134707.jsonl, open /tmpe566c0f8c7ffb00cb50b8bc7fa842e58801f054d-b1bbbdcccc1cda1137e82d75f34a6e7d8b134707.jsonl: permission denied, (*os.File)(nil)"}
Scanning target bear-publishing/
Error: filesystem scan error: open /tmpe566c0f8c7ffb00cb50b8bc7fa842e58801f054d-b1bbbdcccc1cda1137e82d75f34a6e7d8b134707.jsonl: permission denied

ubuntu@d2-8-sbg5:~$ sudo ./bin/curio scan bear-publishing/
Scanning target bear-publishing/
 └  26% [==>            ] (47/177, 28 files/s) [1s:4s]^C

Possible Fix

Maybe a yml path issue ?

Steps to Reproduce

Install curio with the script on Ubuntu 22.10
try to scan a repo in current user home
see it fail
retry with sudo

Context

Your Environment

Version used: v0.19
Target codebase stack (e.g. Ruby): ruby
Operating System and version: Ubuntu 22.10
Link to your project or code sample: https://github.com/Bearer/bear-publishing

Revisit structure for detectors/policies

Description

Curio works with detectors and policies. Detectors are there to enrich the dataflow report, policies are there to look at the dataflow report and apply policies based on the output.

For example:

a detector would be located in /pkg/commands/process/settings/custom_detectors/ (e.g. /pkg/commands/process/settings/custom_detectors/ruby_loggers.yml)
a policy would be located in pkg/commands/process/settings/policies/ (e.g. /pkg/commands/process/settings/policies/leakage.rego and an entry inside /pkg/commands/process/settings/policies.yml)

Whenever we implement a new detection, we either rely on an existing policy (e.g. leakage applies to both cookies and session), either we enrich the collection of policies.

It might be a bit difficult for a new comer to know where to start especially as we introduce the remediation layer on top.

Proposal

We could have a structure like this

/policies/ruby/shared/shared.rego
/policies/ruby/CR-001/detectors/loggers.yml
/policies/ruby/CR-001/leakage.rego
/policies/ruby/CR-001/rule.yml
/policies/ruby/CR-001/README.md

We could end up repeating the policies (like leakage.rego would be similar for both session and cookies) but that would definitely help any new comer to understand where to start and to contribute.