Giter Site home page Giter Site logo

mozilla-services / foxsec-pipeline Goto Github PK

View Code? Open in Web Editor NEW
25.0 15.0 9.0 13.87 MB

Log analysis pipeline utilizing Apache Beam

License: Mozilla Public License 2.0

Shell 0.25% Java 91.99% Go 7.71% Makefile 0.05%
apache-beam log-analysis security dataflow

foxsec-pipeline's Introduction

foxsec-pipeline

Build Status Documentation

Apache Beam pipelines for analyzing log data.

Documentation

javadoc documentation is currently updated manually and although should be up to date, may not be current with master.

Introduction to Beam

To get familiar with developing pipelines in Beam, this repository also contains a small workshop that provides some guidance on building basic pipelines. The introduction document can be found here.

Tests

Tests can be executed locally using Docker.

Run all tests

docker build -f Dockerfile-base -t foxsec-pipeline-base:latest .
bin/m test

Run a specific test

docker build -f Dockerfile-base -t foxsec-pipeline-base:latest .
bin/m test -Dtest=ParserTest

CLI Usage

Pipeline runtime secrets can be generated locally using the main method in the RuntimeSecrets class.

bin/m compile exec:java -Dexec.mainClass=com.mozilla.secops.crypto.RuntimeSecrets -Dexec.args='-i testkey -k dataflow -p my-gcp-dataflow-project -r dataflow'

Run the class with no options to see usage information. Note that in this case, the key ring name and key name are being specified as dataflow. The existing RuntimeSecrets class requires the keys to be accessible using these identifiers when the pipeline is executing.

The output of the command can be prefixed with cloudkms:// in an option to enable runtime decryption of the secrets during pipeline execution.

Interacting with Minfraud

Reputation data can be fetched from Minfraud locally using the main method in the Minfraud class.

You must provide the accountid and licensekey plus the IP and/or email you want to get reputation data for. --accountid and --licensekey can either be provided directly or provided as RuntimeSecrets (cloudkms://...).

bin/m exec:java \
  -Dexec.mainClass="com.mozilla.secops.Minfraud" \
  -Dexec.args="-p my-gcp-dataflow-project --accountid 'cloudkms://...' --licensekey 'cloudkms://...' --ip '8.8.8.8' --email '[email protected]'"

Creating Watchlist entries

Watchlist entries can be created locally using the main method in the Watchlist class.

You must also prefix your command with WITHOUT_DAEMONS=true so that the entry won't be submitted to the Datastore emulator running within the container.

usage: Watchlist
 -c,--createdby <arg>
 -ne,--neverexpires     Watchlist entry never expires (compared to default
                        of 2 weeks)
 -o,--object <arg>      Object to watch. Can be an IP or email.
 -p,--project <arg>     GCP project name (required if submitting to
                        Datastore)
 -s,--severity <arg>    Severity of Watchlist entry. Can be 'info',
                        'warn', or 'crit'
 -su,--submit           Submit Watchlist entry to Datastore rather than
                        emit json
 -t,--type <arg>        Type of object to watch. Can be 'ip' or 'email'

Example of creating entry without submitting to Datastore

$ bin/m exec:java -Dexec.mainClass="com.mozilla.secops.Watchlist" -Dexec.args="--object '127.0.0.1' --type 'ip' --createdby '[email protected]' --severity 'info'"

{"type":"ip","severity":"info","expires_at":"2020-02-26T17:45:01.399Z","created_by":"[email protected]","object":"127.0.0.1"}

Example of submitting to Datastore

$ WITHOUT_DAEMONS=true bin/m exec:java -Dexec.mainClass="com.mozilla.secops.Watchlist" -Dexec.args="--object '127.0.0.1' --type 'ip' --createdby '[email protected]' --severity 'info' --project foxsec-pipeline-nonprod --submit"

Feb 12, 2020 5:41:44 PM com.mozilla.secops.state.State initialize
INFO: Initializing new state interface using com.mozilla.secops.state.DatastoreStateInterface
Feb 12, 2020 5:41:45 PM com.mozilla.secops.state.StateCursor set
INFO: Writing state for 127.0.0.1
Feb 12, 2020 5:41:45 PM com.mozilla.secops.state.State done
INFO: Closing state interface com.mozilla.secops.state.DatastoreStateInterface
Successfully submitted watchlist entry to foxsec-pipeline-nonprod
{"type":"ip","severity":"info","expires_at":"2020-02-26T17:41:43.919Z","created_by":"[email protected]","object":"127.0.0.1"}

Contributing

See the contributing guidelines.

foxsec-pipeline's People

Contributors

adrianosela avatar ajvb avatar dependabot[bot] avatar jvehent avatar kkleemola avatar mozilla-github-standards avatar pwnbus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

foxsec-pipeline's Issues

Look into alerting subnet summarization

Where addresses within alerts can be reasonably correlated to belonging to the same subnet, for example the same /24, add support to alerting output to potentially generate a secondary alert indicating a probable bad subnet.

This would essentially be something like a reduction operation to output a subnet given an set of input elements.

Write integration tests between the pipeline and cloud functions in contrib/

We should write tooling to support integration tests between the cloud functions in contrib/ and the pipeline code.

Some examples:

  • Test that Guardduty findings get processed through Gatekeeper, get sent to bugzilla-alert-manager, and create a new bug
  • Test that a new auth event for an unknown IP gets an alert from Authprofile and gets sent to slackbot-background

There is currently not emulator tool for Cloud Functions written in Go, but these functions are written as a library, so wiring up our own runner for them should be easy enough (this is also how some of the unit tests work for the cloud functions).

Add EC2 instance name (app tag) whitelist support to Gatekeeper Pipeline

So far Guard Duty has been pretty good at classifying ALL network connection type (brute force attempts, port scans, etc) as low severity.

However, it generates many high severity "DNS REQUEST" type findings for EC2 instances querying bad domains (crypto, phishing, malware, etc).

These are NOT false positives, but in most cases we don't care about them because they are expected due to the nature of the running host.

For example: we run a ton of web crawlers to asses the state of the web in general, etc, these hit those bad domains all the time, and we dont care.

We should have a way of suppressing these type of alerts for these EC2 instances. That should not be hard given the fact that the ec2 instance tags are contained in the finding object.

CC:// @ameihm0912 @ajvb

Update nginx parser to support parsing raw nginx logs

The parser currently only supports processing nginx log data in the form of a Stackdriver jsonPayload entry. This should be expanded to also support raw nginx log lines (either in textPayload or on their own).

Include GCP Project Name and AWS Account Name in Gatekeeper alerts

It would be nice to include the AWS account name / the GCP project name in the Gatekeeper alerts for easier triaging.

For AWS, the IdentityManager can be used to translate the account id to account name:

IdentityManager mgr = state.getParser().getIdentityManager();
if (mgr != null) {
String resId = mgr.lookupAlias(getUser());
if (resId != null) {
n.setSubjectUserIdentity(resId);
}
Map<String, String> m = mgr.getAwsAccountMap();
String accountName = m.get(event.getRecipientAccountId());
if (accountName != null) {
n.setObject(accountName);
}
}

For GCP, projects.get (or similar) can be used during alert creation to grab the project's metadata.

Support fan out push in IprepdIO

IprepdIO currently only supports submission to a single instance

This should be modified so multiple iprepd instances can be configured, and the transforms will publish to all configured instances

Include local time zone in AuthProfile alerts

In addition to the UTC timestamp, also include the localized timestamp based on the source address in the alert metadata and the alert payload (e.g., render it in the template/Slack notifications)

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please reach out to [email protected].

(Message COC001)

Reduce volume of KMS decryption calls during pipeline scaling

When pipelines scale up, it will result in the creation of new worker nodes and additional threads calling setup in certain DoFns. In some cases when the number of new workers is large, it can result in exceptions being generated in the pipeline due to quota limits being hit.

 java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.google.api.gax.rpc.ResourceExhaustedException: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Quota exceeded for quota metric 'cloudkms.googleapis.com/crypto_requests' and limit 'CryptoRequestsPerMinutePerProject' of service 'cloudkms.googleapis.com' for consumer

Although it generally recovers eventually this causes errors to be propagated by the pipeline and likely slows down scaling responsiveness.

Implement escalation metadata and alert suppression in Gatekeeper pipeline

#174 adds a pipeline for monitoring output from ETD and GD.

In the current state, the pipeline will generate alerts but it does not currently add any escalation metadata to the alerts that will result in special handling in AlertIO.

Example from the authentication pipeline:

private void addEscalationMetadata(Alert a) {
if (critNotifyEmail != null) {
log.info(
"{}: adding direct email notification metadata route for critical object alert to {}",
a.getAlertId().toString(),
critNotifyEmail);
a.addMetadata("notify_email_direct", critNotifyEmail);
}
}

We will want to add a similar notification pipeline option to the gatekeeper pipeline and include this metadata option when we generate an alert if the option is set in the configuration, if the option is not set the pipeline will behave as it currently does.

We will also want to potentially look at hooking AlertSuppressor up to the end of the analysis transforms and suppressing possible repeated alerts to avoid generating a large number of escalation notifications under certain circumstances.

Create mock service using foxsec-pipeline-contrib to test interacting with cloud functions

https://github.com/mozilla-services/foxsec-pipeline-contrib contains a few cloudfunctions that it would be nice to include within the integration tests in this repo.

Since Cloudfunctions written in Go are just libraries, we could create a simple mock service that simulates both data ingestions (Duopull/Auth0pull) and interacting with the pipeline (SlackbotBackground).

Would be especially useful for testing parsing and data model changes.

Normalized event tags should be an array

Currently events can be normalized into a single category (e.g., AUTH), but in some cases a single event may represent more than one category type.

Ideal scenario would be the normalized category field being a bitmask or array of values that can represent more than one type.

Alert post-processing pipeline

New pipeline, intended to:

  • Use standard composite input (read input topic containing Alert JSON strings)
  • Use standard composite output (e.g., BQ)

Initially it would be ideal for this to consume a topic containing Alert JSON strings, parse them into Alert objects, and push these objects into an analysis step.

The analysis step will look for metadata fields that for example match a given regex, and where a match occurs a new alert will be generated. This alert will contain a metadata field containing the ID of the source alert which it matched on.

This new pipeline will also be useful for future correlation between pipeline alerting output.

Create new encapsulation class to contain Alert and other object types

The output path of the pipelines is currently limited to ingestion of Alert objects.

This makes it difficult to persist other types of data from the pipeline that are not neccessarily alerts but we may want to write to BigQuery. This is discussed a bit in #320.

The intent would be to modify the output path to consume this new container event, which could contain an alert or some other type of object (like a source summary, etc). Alert objects would be handled the way way they currently are once the encapsulation is stripped, but this would provide a means to produce other types of pipeline output.

Persist ALL events somewhere (e.g. BigQuery)

We aren't persisting anything outside of the built-in persistence options of the data sources we read from.

Fulfilling this issue involves storing all generic events somewhere

Improve exception filtering in HTTPRequest

The exception filtering currently filters any request of a given method and path; extend this to only filter if the request is either 200, or >= 500, but do not filter 4xx errors.

Make AlertMeta more strictly defined

Make AlertMeta more strictly defined rather than the current free form key strings everywhere.

Currently there are free form ("magic strings") everywhere for metedata key names. This is also present throughout the cloud functions (https://github.com/mozilla-services/foxsec-pipeline/blob/master/contrib/bugzilla-alert-manager/manager.go#L121).

We should make these keys more strictly defined and do this in a way that can be tested and checked for both the Java and Go code.

Related to #182

Remove legacy whitelisted IP's state code

There is code in IprepdIO to support the transition to a new Datastore format for whitelisted objects, i.e.:

/** Legacy Kind for whitelisted IP entry in Datastore */
public static final String legacyWhitelistedIpKind = "whitelisted_ip";
/** Legacy Namespace for whitelisted IP in Datastore */
public static final String legacyWhitelistedIpNamespace = "whitelisted_ip";

This can be removed now

Integrate code coverage tooling

It would be nice to be able to view metrics on how our code coverage is doing across our unit tests. If we have a tool that can support multiple languages to cover contrib, that would be nice, but making sure can see the % coverage of the Java code is the more important piece.

Use event timestamps everywhere

Within our parsing logic, we make use of the pubsub timestamp rather than the parsed events timestamp (Parser.stripStackdriverEncapsulation). This creates problems in the case of old messages getting backfilled into pubsub for whatever reason and causes a disconnect between our tests and production.

Instead, we should make use of the parsed out event timestamp.

Persist Guardduty alerts to either Bigquery or GCS

As we discussed, it would be helpful to persist all Guardduty alerts in there raw json to either a separate BigQuery table or to GCS.

The goal is to be able to dig into alerts easily during an investigation.

Replace default trigger in Customs

#289 modifies the windowing strategy used in Customs, but makes use of a default trigger.

This should be further enhanced with early pane firings, which could include potentially dealing with suppressing the on-time pane if needed to prevent duplicates.

Improve Gatekeeper's GuardDuty Finding metadata

Currently, we are only tagging gatekeeper alerts with the base metadata on guardduty findings. Some finding types contain a lot more relevant data.

For example, the ssh brute force finding type and recon types contain an "Actor" object, which contains the bad actor IP, location (latitude + longitude), the ISP, etc.

This issue encompasses adding metadata on a per-finding-type basis to alerts

AuthProfile should use email templates in all circumstances

The current version of the AuthProfile pipeline only uses email templates for a new source, but does not use it for informational alerts.

To be consistent it would be good to just support this across the board in the pipeline, but it would require an additional template.

See also #23

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.