Giter Site home page Giter Site logo

aau-network-security / gollector Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 12.33 MB

Application for retrieving and storing domain names from various sources

License: GNU General Public License v3.0

Go 98.71% Dockerfile 1.29%
certificate-transparency dns zone-files

gollector's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gollector's Issues

cache add

add the item to the cache:

  • when i find them in the db
  • after i wrote them in the db (could be done in the posthook)

Separate cache from other processes

With some high-performant IPC communication between a process and a cache container (high throughput => 100,000 messages per second, low latency => (far) under ms), it would be possible to run multiple measurements simultaneously.

Current situation

+------------------+           +------------------+
|                  |           |                  |
|     Process 1    |           |     Process 2    |
|                  |           |                  |
|  +------------+  |           |  +------------+  |
|  |            |  |           |  |            |  |
|  |   Cache    |  |           |  |    Cache   |  |
|  |            |  |           |  |            |  |
+--+------+-----+--+           +--+-----+------+--+
          |                             |
          |                             |
          |                             |
          |                             |
          |                             |
          +--------------+--------------+
                         |
                 +-------+--------+
                 |                |
                 |   Persistent   |
                 |                |
                 +----------------+

Suggested alternative

+------------------+           +------------------+
|                  |           |                  |
|     Process 1    |           |     Process 2    |
|                  |           |                  |
+-------+----------+           +---------+--------+
        |                                |
        +--------------+    +------------+
                       |    |    High performant IPC
                       |    |
                 +-----+----+-----+
                 |                |
                 |     Cache      |
                 |                |
                 +-------+--------+
                         |
                         |  Asynchronous, but reliable
                 +-------+--------+
                 |                |
                 |   Persistent   |
                 |                |
                 +----------------+

Improve error reporting

Error messages are somewhat difficult to debug otherwise

  • Wrapping errors to make error prints more clear
  • Report messages to Sentry/Rollbar and send emails when errors are occuring

Mount volume in cache to persist issued certificates

When running the cache over TLS, the certmagic lib automatically obtains certificates. The issued certificates are stored on disk, but because we currently do not mount a volume to persist those certificates, they disappear whenever the container is closed, and with a restart a new cert is issued. As a result, the rate limit of Let's Encrypt may be hit, locking us out of running on TLS for a few days.

The docker-compose config should mount a volume to the correct location where the certs are stored by certmagic

DB query

skip query db if the cache size is not reached

Test FTP over SSH

I am not convinced the current implementation is correct. The only zone file accessible over FTP is the .com one

Index Start Date

I think there is a problem by retrieving the last stored certificate from the DB. I guess this function dosen't work very well (https://github.com/aau-network-security/gollector/blob/master/app/ct/main.go#L135).

The way ct should work: it should insert in the DB (n) certificates every time is run. the function linked above should get the last entered certificate from the DB in order to scan the next (n) certificates from that one.

The way ct is working right now: the first time i run ct it enters the first 100 certificates. Running it again it dosen't store the next 100 certificates in the DB.

The function linked above give back an error in the tests too
https://github.com/aau-network-security/gollector/pull/39/checks#step:6:111

Add status API call for cache

It might be useful to retrieve the current state of the entries contained in cache process via an gRPC. To go even further, a monitoring tool can read this state and visualize the growth of number of entries over time.

Collectors

Make sure the implementation about CT logs works for all the collectors component we have

DB Libraries

Use just a library to interact with the DB. pq should be the best on

Better handling of configuration

docker-compose requires all environment variables to be set, even when building the application, which is unnecessary. We should move the environment variable check to the be done in the application itself, rather than in docker-compose

Add the notion of a "dataset"

We must be able to distinguish between multiple measurements from different vantage points, knowing exactly which data point belongs to which data set/vantage point.

Pass tests on Github Actions

A couple of significant requirements:

  • Some tests are written with data sources (such as an zone file accessed via HTTP) in place, which must either (1) be mocked or (2) these tests must by skipped by GA.
  • Some tests require interaction with a postgres database in place, which should be provided by GA.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.