Used to be in the README, but I want to break the content into an actionable ticket...
ELK - ElasticSearch, LogStash, Kibana
The ELK stack is a long-standing solution for logs and metrics.
LogStash has a well-established history of being deployed as an ETL pipeline.
LGTM/P - Loki, Grafana, Tempo, Mimir / Prometheus
The full Grafana stack requires a lot of operational experience. It effectively requires learning three new "databases"
for data that is largely the same. Loki is effectively a database for logs. Tempo, a database for traces. And finally,
Mimir / Prometheus, a database for metrics. Each of these systems have their own resource usage and scaling requirements.
In addition, this is a partial solution as it does not cover the business intelligence side of the world. An additional
database can be added to support your business analytics, but doing so will only add to the complexity.
XOG - ?, OpenTelemetry, Grafana
Because OpenTelemetry is so flexible, why not pick one of the many other databases?
For a starter or simplified deployment, this is a great option. Leveraging an existing database technology may simplify
complexity today, it will pose some interesting technical challenges later on. Importing data from an existing database
technology into a solution like Clickhouse will be relatively easy.
It would be really good to enumerate a handful of example references that we want to illustrate in the project. For example, I figured having some code samples for emitting metrics from applications would be useful (we can probably point people at the opentelemtry.io docs for this). Another example was how to write data enrichment pipelines to be able transform and enrich data within the ecosystem...
Scope of work
enumerate a list of useful code references that we want to be able to include in the repo
Once this repository is public, we should look at porting out the standalone Helm chart from Mya's charts repository (https://github.com/mjpitz/mjpitz). I initially put it there so I could leverage my existing signing infrastructure after hitting issues with chart-releaser running on private repositories. I'd also like to iron out issues with the bitnami chart so our chart repository contains more than one deployment option.
Also, once the chart is under the scope of this repository, we will be able to add additional capabilities such as automatically including the dashboards and alerts generated from this project.
The monitoring-mixin project under the Kubernetes space is a great tool for those looking to monitor and alert on events happening within the cluster. While this project is great, the queries that it uses are tied to the kube-prometheus-stack deployment.
The other thing I'm not a huge fan of with this project is jsonnet. Don't get me wrong, I thought it was really interesting at first, but every time I sit down to use it, I need to re-learn it again. Very few people out there have worked with it, making it harder to find contributors.
Scope of work (each should probably be their own task...)
research jsonnet alternatives
prototype using jsonnet alternative
triage dashboards from monitoring-mixin project
TODO: enumerate dashboards
triage alerts from monitoring-mixin project
TODO: enumerate dashboards
research playbooks as a possible inclusion into this project