Hello,
I'm reporting an issue that was faced while using this plugin with Bamboo in production.
The problem observed unfolds as follows:
Bamboo creates a new planKey
internally for every new git
branch and deletes those keys after a certain number of days. This means there exists potentially an unbounded number of observed planKeys
. This number is ever growing when using the system as intended.
The bamboo-prometheus-exporter
as it is implemented right now generates time series over all observed planKeys
with the bamboo_build_duration_*
histogram taking the lead in highest number of time series generated per planKey
(14). The fundamental issue is that in the exporter output persist after the planKeys
themselves have already been deleted by Bamboo. This leads to an accumulation of time series for Prometheus to scrape over time from the exporter endpoint. The growing HTTP response increases the Prometheus scraping duration with every new branch created in Bamboo beyond any conceivable timeout.
Since time series only accumulate in memory while the exporter is running, restarting it pushes down the number of time series and also the scraping time temporarily.
One mitigation for the Prometheus scraping time ever increasing would be a LRU-Mechanism for time series exported. However that does not solve the problem of unbounded time series to be stored in Prometheus later on after the scrape.
Any ideas on resolving this are appreciated because this a real problem that impacts the stability of any Prometheus instance connected to this plugin.