Giter Site home page Giter Site logo

jorgelbg / dashflare Goto Github PK

View Code? Open in Web Editor NEW
145.0 145.0 28.0 1.01 MB

๐Ÿ•ต๐Ÿผโ€โ™€๏ธ Open Source and privacy-focused analytics solution. ๐Ÿ“Š Advanced monitoring for your website behind Cloudflare

Home Page: https://jorgelbg.me/dashflare

License: Apache License 2.0

TypeScript 87.54% JavaScript 9.51% Makefile 1.79% Nix 1.16%
analytics cloudflare edge-worker google-analytics grafana grafana-loki hacktoberfest metrics privacy web-analytics wrangler

dashflare's Introduction

Hi there ๐Ÿ‘‹, I'm Jorge - aka jorgelbg

  • ๐Ÿ”ญ Iโ€™m currently working on dashflare
  • โœ๐Ÿป I ocassionally write about everything on my blog
  • ๐Ÿ“Š Passionate about observability/monitoring topics
  • ๐Ÿš€ Odd mix between SRE/DevOps ๐Ÿคฃ

Connect with me:

Personal blog Twitter LinkedIn



๐Ÿ“• Latest Blog Posts

dashflare's People

Contributors

dependabot[bot] avatar jorgelbg avatar mre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dashflare's Issues

Forward the CF-Connecting-IP header from the original request

When forwarding a request (case in point is a worker site) cloudflare will overwrite the original CF-Connecting-IP which breaks geolocation. The same thing happens for the Cf-Ipcountry header.

Cf-Ipcountry is less important because it is only used as a backup in case the geolocation feature is disabled.

Geolocation stopped working

The geolocation feature from the edge worker is no longer working (since 2021-06-09 from my logs). Apparently, the ipgeolocationapi.com service has been decommissioned (apilayer/geolocationapi#15).

We have two possible routes:

Compatibility with Grafana Cloud

Now that Grafana Cloud has a very generous free tier deploying your own Loki instance is no longer needed.

To make it easier we need to verify that the integration works as expected and also we should include instructions on how to set it up. Perhaps using Grafana Cloud should be the default path for testing.

Improve test coverage of the label set

With the last couple of updates to the edge worker, I've noticed that is easier to make mistakes in the set of labels that are sent to Loki than the logline. We should cover the label set in our tests for additional checks.

Currently, we assert using the content of the entire body without paying attention to which part matches (labels or line). Since both of these use different formats we should have dedicated tests.

Fix logfmt format of the payload sent to Loki

When using a query like:

{domain="jorgelbg.me"} | logfmt

We see some extracted labels (by the logfmt processor) that should not be there (America, AppleWebKit_537_36, etc.):

image

This happens because of a badly formatted line in the payload sent to Loki. The duplication seen in browser and browser_extracted is expected since we send duplicated data both as labels and as part of the payload. This was done in preparation for the support of multiple processors in Loki (at query time).

General ideas for data collection and/or dashboards

With the collected data (and a couple of small additions to the worker) we should be able to add the following info to the Dashboards:

  • Recent requests
  • Bytes fetched/sent from/to the origin
  • Top User Agents
  • % of requests per status code (the interesting ones, like 5xx)
  • Google bot requests (this could be a dedicated dashboard)

This doesn't mean that we will add all of those, it is just a general issue to track what may be useful to have either data collected by the worker or directly visible in the dashboard.

Remove mentions of ipinfo.io

PR #4 included (among other changes) the removal of ipinfo.io as a geolocation provider. We had several reasons for this:

  • We are able to get the country data even without any additional provider thanks to Cloudflare.
  • ipinfo.io required a subscription which meant one additional step for potential users.

With this change, the README needs to be updated.

TODO:

  • Remove the IPINFO local environment variable (.envrc.example)
  • Rename the IPINFO_TOKEN environment variable (cloudflare/wrangler) to IPINFO
  • Setup a toggle for the new geolocation provider

Expose the DEBUG_HEADERS configuration

Now that label cardinality is no longer an issue since the actual labels sent to Loki are limited. We can expose the DEBUG_HEADERS configuration option that allows logging all request/response headers.

The code has been there since the beginning but storing everything as labels had a big performance impact when querying. Also, it was quite common to hit the default maximum number of allowed labels by Loki.

  • Expose a DEBUG configuration to the worker settings.
  • Add a note to the README about this option.

Worker Consumption

This is more a question that doesn't appear to be answered - does this worker run under the limits of the free tier, or does anyone have a cost idea on the worker?

Thanks.

Filter empty values before pushing to the storage

In the data extraction step, it is usual for many keys to not contain a value, like type, network, referer_domain, etc. We should filter these values to make the matching at query time easier. For instance this is a subset of the log payload sent to Loki:

hash="" query="" os="Mac OS" os_version=11.2.0 device_type=desktop country=DE 
type="" network="" client="" referer_domain="" duration=105 level=warn session=8b31b331c9a2ed67

Remove duplicated information from the labels and update queries

Since we started using Loki very early, before the json and logfmt processors were introduced. We put information in the labels that shouldn't be there (due to cardinality concerns). This worked for small-medium websites but could've potentially caused issues on a larger setup. As a consequence when we use the logfmt processor we get duplicated labels (browser and broswer_extracted for instance).

For a proper cleanup we also need to:

  • Remove the duplicated information from the labels (keep only the useful and low cardinality ones)
  • Update the queries used in the dashboard

image

Integration with Worker Sites

Since a request can go through a single Cloudflare worker, it is not possible to run dashflare transparently. This is becoming more important after the release of Worker Sites that allow running a full website from the worker.

For this case, #7 introduced the possibility of forwarding the URL directly to the worker. This PR makes the worker stop behaving as a transparent proxy and just analyze the URL received via the x-original-url header.

Still pending:

  • Document how to set up dashflare with a Worker Site
  • Maybe publish a small npm module to do the forwarding
  • Allow the possibility of forwarding also a status code, now it is hardcoded to 200
  • Forwarding of the origin fetch time

See https://github.com/mre/endler.dev/blob/master/workers-site/index.js#L66-L73, is the only working example of this setup.

Country name missing from the geolocation data

The attribute country_name is missing since we remove the duplicated info from the labels (#17). This attribute is needed to power the Top Countries panel, which now it is only showing the country code (DE, US, etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.