Giter Site home page Giter Site logo

SLA / service outage about ocaml-ci HOT 5 CLOSED

hannesm avatar hannesm commented on July 2, 2024
SLA / service outage

from ocaml-ci.

Comments (5)

hannesm avatar hannesm commented on July 2, 2024

NB: just when I opened this issue, the analysis job started to make progress. So please ignore the first paragraph, while the second still holds.

from ocaml-ci.

rikusilvola avatar rikusilvola commented on July 2, 2024

Hello @hannesm !

A public status page doesn't currently exist, though for significant outages we do post on the infra blog.

from ocaml-ci.

hannesm avatar hannesm commented on July 2, 2024

Thanks for your comment @rikusilvola. Since yesterday afternoon, there's again first an outage, and now temporary failures.

I'm still wondering what is the Service Level that you intend to deliver? What are "significant outages" that are getting posted to the "infra blog"?

from ocaml-ci.

rikusilvola avatar rikusilvola commented on July 2, 2024

Indeed, several minor outages were experienced for OCaml-CI in the past few days. With increased load, the service became unresponsive but was recovered within a couple of hours each time. Initial investigations point to lwt starvation leading to the web interface getting stuck.

The services are provided with best-effort support, meaning that once an issue is noticed, it is treated during business hours according to its relative criticality. Most of the time, what is perceived as an outage is a reduced quality of service due to a temporary spike in activity. These outages are commonly transient, and the service is restored without human intervention.

Here are some examples of posts for significant outages

I welcome you to report any outage you experience on ocaml/infrastructure.

from ocaml-ci.

hannesm avatar hannesm commented on July 2, 2024

Thanks for your reply. What I understand (please correct me if I'm wrong) that "during office hours [unclear where], the service is maintained as we see fits [with some priority]". There's no SLA, human intervention is required for restarting / restoring the service when there is a spike in activity.

Most of the time, what is perceived as an outage is a reduced quality of service

You mean the 500 - internal server error - I get at the moment are "reduced quality of service"?

In any case, thanks for providing the free service. I'll close my issues and hope you'll eventually find time and energy to setup monitoring and more reliability.

from ocaml-ci.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.