Giter Site home page Giter Site logo

cisagov / admiral Goto Github PK

View Code? Open in Web Editor NEW
14.0 8.0 3.0 704 KB

Distributed certificate transparency log harvester

License: Creative Commons Zero v1.0 Universal

JavaScript 0.32% Python 81.83% Shell 17.85%
certificate-transparency celery cisa-directives

admiral's Introduction

admiral ๐Ÿ‘ฉโ€โœˆ๏ธ๐Ÿšขโš“๏ธ

GitHub Build Status CodeQL Coverage Status Known Vulnerabilities

This project implements a distributed certificate transparency log harvester.

Requirements

This project requires a Docker installation.

Installation and Execution

  • Build the docker image:
    • docker-compose build
  • Change the credentials in the following configuration files:
    • secrets/admiral.yml
    • secrets/redis.conf
    • docker-compose.yml
  • Start the composition:
    • docker-compose up
    • alternately it can be started in swarm mode: docker stack deploy admiral --compose-file docker-compose.yml
  • Monitor the system:
  • Optional: Run the code tests
    • docker-compose -f docker-compose-dev.yml run test

Development and Debugging

A separate docker-compose-dev.yml file is provided to support development and testing. Using this composition, a container can be started in a few different modes:

To start up an IPython session with a configured Celery app:

docker-compose -f docker-compose-dev.yml run celery-shell

To start up a development container with a bash shell:

docker-compose -f docker-compose-dev.yml run bash

To run all unit and system tests:

docker-compose -f docker-compose-dev.yml run test

Additional arguments can be passed to pytest when creating the container:

docker-compose -f docker-compose-dev.yml run test -vs tests/scan_test.py

To access a mongo shell:

docker-compose exec mongo mongo admin -u root -p

To get a shell in a stopped or crashed container:

docker run -it --rm --entrypoint=sh admiral

To protect against inadvertent commit of secrets to the repository:

git update-index --assume-unchanged secrets/*

Monitoring

The following web services are started for monitoring the underlying components:

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is in the worldwide public domain.

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

admiral's People

Contributors

arcsector avatar dav3r avatar dependabot[bot] avatar felddy avatar hillaryj avatar jasonodoom avatar jmorrowomni avatar jsf9k avatar king-alexander avatar mcdonnnj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

admiral's Issues

Implement functional testing

๐Ÿ› Summary

Once #8 is merged a large number of the tests defined in this repository will be disabled. This is due to the fact that they were written based on working out of a running Docker orchestration and not stand-alone package testing. Given how our GitHub Actions testing is written this is not readily implementable for this project. It is also desirable to have stand-alone testing for the Python package if at possible.

Expected behavior

Using our GitHub Actions workflow or running pytest locally results in all tests that are defined passing.

implementation notes

When testing trying to get them running as-is I first added pyfakefs to insert an appropriate configuration file at the default location (/run/secrets/admiral.yml). Once I was able to do that I then ran into the problem that it needs a running redis server.

Set up the admiral on one of the production CyHy instances

Summary

CISA operates a distributed certificate transparency log harvester (Admiral) to provide service in alignment with ED 19-01. This ticket tracks the need to host this scanner on a production CyHy instance.

Motivation and context

During this projects standup given timeline impacts, the Admiral was initially run on a host machine. Since the original design and solution, the number of customers signed up has increased dramatically. Setting up the Admiral on a production CyHy instance will be in alignment with the rest of the system design and allow the scan to run through completion on the allocated CyHy instance.

certificates.csv not showing all recent leaf certs from crt.sh

๐Ÿ› Summary

Looks like crt.sh has a more recent leaf cert for precision.fda.gov than the HHS CyHy certificates.csv is reporting. Does the code perhaps need to be updated or is it broken somewhere? I would understand it not being reported if it were only a pre-certificate, but it is a newer leaf certificate.

To reproduce

  1. Search "precision" in Column B (Subjects) of the certificates.csv attached within HHS' latest CyHy report (1/10/21)
  2. Note that it only lists one row, an older precision.fda.gov cert logged 2018-12-11 and valid 2018-12-11 to 2021-01-04 (now expired)
  3. Browse to https://crt.sh/
  4. Type "precision.fda.gov" and hit enter
  5. Click the top crt.sh ID hyperlink (3773461515) to confirm it is a leaf certificate, logged over a week ago (2020-12-13), and is currently valid (2020-12-10 to 2021-12-10) for the hostname in question (precision.fda.gov)

Expected behavior

While the HHS certificates.csv may continue to list the older cert, it should have picked up the newer leaf certificate within 2 Mondays of being logged (since we only pull certs once per week, typically Friday night-Saturday)

Upgrade to Celery 5.0

๐Ÿ’ก Summary

Celery 4.x is no longer supported. We need to use Celery 5.0, the latest major release.

Motivation and context

Breaking changes to the CLI interrupted certificate transparency log monitoring. To avoid this in the future, we need to keep Celery up-to-date and alter our implementation as necessary.

Implementation notes

Steps to upgrade from Celery 4.x are located here.

Crashing on save

๐Ÿ› Summary

The admiral crashes when it attempts to save certificate records to the database.

To reproduce

Steps to reproduce the behavior:

  1. Run the load_certs script without the --dry-run option (to save records to the database)

Expected behavior

I expected certificate records to save successfully.

Censys Search API

๐Ÿ’ก Summary

Explore the viability of using the Censys Search API for collecting certificate data.

Motivation and context

Emergency Directive 19-01 requires CISA to monitor Certificate Transparency logs and report new certificates issued to agency domains.

Implementation notes

There are two tasks involved:

  1. Fetch all certificate fingerprints associated with an agency domain.
  2. Fetch the certificate data from a certificate fingerprint.

Acceptance criteria

  • Using the API, we can discover new certificates issued to agency domains.

New, Month-Old Leaf Cert Not Seen in Certificates.csv

๐Ÿ› Summary

Looks like crt.sh has a more recent leaf cert for precision.fda.gov than the HHS CyHy certificates.csv is reporting. Does the code perhaps need to be updated or is it broken somewhere? I would understand it not being reported if it were only a pre-certificate, but it is a newer leaf certificate.

To reproduce

  1. Search "precision" in Column B (Subjects) of the certificates.csv attached within HHS' latest CyHy report (1/10/21)
  2. Note that it only lists one row, an older precision.fda.gov cert logged 2018-12-11 and valid 2018-12-11 to 2021-01-04 (now expired)
  3. Browse to https://crt.sh/
  4. Type "precision.fda.gov" and hit enter
  5. Click the top crt.sh ID hyperlink (3773461515) to confirm it is a leaf certificate, logged over a week ago (2020-12-13), and is currently valid (2020-12-10 to 2021-12-10) for the hostname in question (precision.fda.gov)

Expected behavior

While the HHS certificates.csv may continue to list the older cert, it should have picked up the newer leaf certificate within 2 Mondays of being logged (since we only pull certs once per week, typically Friday night-Saturday)

Missing key min_cert_id in group_update_domain task results

๐Ÿ’ฅ Regression Report

The min_cert_id is no longer present in the response from the group_update_domain task.

Last working version

Version didn't change

Worked up to date: 20200103 (last run)
Stopped working on date: 20200111 (today)

To Reproduce

Steps to reproduce the behavior:

  • Run the examples/load_certs.py as normal

Expected behavior

Certificates are loaded as usual.

Any helpful log output

Paste the results here:

Traceback (most recent call last):                                                                                          
  File "./load_certs.py", line 193, in <module>
    main()
  File "./load_certs.py", line 177, in main
    total_new_count += load_certs(domains, args["--skipto"], args["--verbose"])
  File "./load_certs.py", line 152, in load_certs
    new_count = group_update_domain(domain, EARLIEST_EXPIRED_DATE, verbose)
  File "./load_certs.py", line 105, in group_update_domain
    for log_id in get_new_log_ids(domain.domain, max_expired_date, verbose):
  File "./load_certs.py", line 67, in get_new_log_ids
    log_id = i["min_cert_id"]
KeyError: 'min_cert_id'

Update CT Search requests

๐Ÿ’ก Summary

As of January 3rd, 2023, the CT Search API includes a new field to retrieve DER-encoded certificates. We need to update the API call to use this new field.

Motivation and context

The old method to retrieve DER-encoded certificates still works, for now, but it would be prudent to update our code before it begins to break.

Implementation notes

  • Full details of the changes made to the API are available in SSLMate's changelog

Acceptance criteria

  • CT Search requests retrieve all relevant data using only non-deprecated fields

HTTP 429: Too Many Requests

๐Ÿ› Summary

The load_certs.py script crashes after receiving multiple HTTP 429 errors.

To reproduce

Steps to reproduce the behavior:

  1. Execute ./load_certs.py. The script will crash after processing approximately 70 domains.

Any helpful log output or screenshots

Screen Shot 2022-06-16 at 3 12 41 PM

Restart CT scans

๐Ÿ› Summary

Certificate Transparency (CT) scans stopped running sometime last June. As a result the section of VS reports that should contain CT logs are instead empty. This issue documents the need to investigate why scans stopped, and will be resolved by changes that get scans to start again.

Expedite queries for large result sets

๐Ÿ’ก Summary

Admiral struggles with domains that have large result sets (~1000+ certificates). Processing typically takes multiple hours, and will occasionally hang. We should refactor to improve performance.

Motivation and context

This improvement will make Admiral more robust, which in turn will make it easier for agencies to comply with ED 19-01.

Implementation notes

The crt.sh interface was not designed to handle large result sets. Yet we want to process these troublesome domains in roughly the same amount of time it takes to process the others. One path forward might be implementing CeRTSearcH as a task. But we should conduct more research on potential solutions first.

Upgrade MongoEngine

๐Ÿ’ก Summary

Update tests to use the mongo_client_class connection method introduced in MongoEngine 0.27.0.

Motivation and context

The syntax change in MongoEngine 0.27.0 breaks tests in this repository. We pinned a lower version in #49. But ideally, we want to use the latest stable version.

Implementation notes

  • Syntax examples are available in the MonoEngine guide
  • A draft of this work is available in #47

Acceptance criteria

How do we know when this work is done?

  • This repository uses MongoEngine >= 0.27.0
  • All existing tests pass

Make task rate limit a configuration option

๐Ÿ’ก Summary

The Certificate Search API offers 3 pricing tiers: small, medium, and large. The task rate limit should be configurable to our level of access.

Motivation and context

This would be useful because we want the ability to process our full list of domains in a week or less.

Acceptance criteria

  • I can configure admiral to work on each pricing tier with the appropriate rate limit in place

Optimize certificate queries

๐Ÿ’ก Summary

Explore opportunities to optimize our Certificate Searches. For instance, is there a way to leverage both full-domain queries and single-hostname queries?

Motivation and context

This would be useful because the less time it takes admiral to complete the better.

Implementation notes

  • It should be possible to collect and de-duplicate DNS names, then query those as single-hostnames under a domain.

Acceptance criteria

  • Completing an admiral run takes a considerably smaller amount of time when compared to the current baseline

Separate Docker components

๐Ÿ’ก Summary

We need to create an admiral-docker repository.

Motivation and context

We want to decouple the Python and Docker components to this project. This repository descends from skeleton-python, so it makes sense to move the Docker bits into a Docker repository.

Implementation notes

The new repository should descend from skeleton-docker.

Acceptance criteria

  • There exists an admiral-docker repository that descends from skeleton-docker.
  • The Docker components found in this repository are present in admiral-docker.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.