Giter Site home page Giter Site logo

healthchecks's Introduction

(Deprecated) Health checks for building services

This project was an experimental library for ASP.NET Core, as an initial out-of-band library and exploration of concepts and a way to get feedback.

Therefore, this is replaced by ASP.NET Core 2.2 (GA around end of 2018) with official Health Checks components here: https://github.com/aspnet/Diagnostics/tree/release/2.2/src

Check the ASP.NETCore 2.2 roadmap: aspnet/Announcements#307

healthchecks's People

Contributors

bradwilson avatar cesardelatorre avatar eilon avatar juergengutsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

healthchecks's Issues

Add overloads allowing CacheDuration to the AddUrlCheck, AddSqlCheck, etc. methods.

By default, the cache config set in the code to 5 min:
Microsoft.Extensions.HealthChecks\HealthCheckBuilder.cs
DefaultCacheDuration = TimeSpan.FromMinutes(5);

But we need a way to change that Cache duration from the outside (client), i.e. per each Health Check being added.
There are not CacheDuration as a parameter from the more external methods like:
HealthCheckBuilderExtensions:
AddUrlCheck() or AddSqlCheck() overload methods.

Need to ad those overloads.

The internal method AddCheck allows allows CacheDuration, however, so we just need to propagate that to the "external" methods:

public static HealthCheckBuilder AddCheck(this HealthCheckBuilder builder, string name, Func check, TimeSpan cacheDuration)
{
Guard.ArgumentNotNull(nameof(builder), builder);
return builder.AddCheck(name, HealthCheck.FromCheck(check), cacheDuration);
}

Enhance grouping feature

The group feature allow to aggregate differents health checks into one.
I would like to use HealthChecks for both application monitoring and as circuit breaker for some features of my applications.
i.e. I have a e-commerce web site with a payment feature. The payment feature is based on a payment service, a payment-related certificate and an azure storage. I will create a group for the payment with the three components.
By doing so, I will be able to disable the payment on the web site when any payment-related component
will be unavailable.

I have my ops teams that are responsible to monitor the low-level component.
I have a team responsible of the certificate management, a team responsible of the web service availability, ...

I would like to be able to have more than one group per IHealthCheck. For example the "web service" group & the "payment" group.
The IHealhCheckService.CheckHealthAsync() may require a new parameter giving the ability to desired group to check, or all the checks as currently.

@bradwilson @JuergenGutsch

Add check based on counters threshold

Checks based on counter are useful for detecting transient failures recurrence.
For example :

try
{
  // do some stuff that can fail
  // ...
  counter.Reset();
}
catch
{
  counter.Increment();
  throw;
}

Where the counter reach the threshold, the check is unhealthy.

SQL Connection resiliency by using SQL Connections with retries and exponential backoff

The current SQL Server check is pretty basic. I know it was in "pending to be improved".
It is like the following:
https://github.com/aspnet/HealthChecks/blob/dev/src/Microsoft.Extensions.HealthChecks.SqlServer/HealthCheckBuilderSqlServerExtensions.cs

                    //TODO: There is probably a much better way to do this.
                    using (var connection = new SqlConnection(connectionString))
                    {
                        connection.Open();
                        using (var command = connection.CreateCommand())
                        {
                            command.CommandType = CommandType.Text;
                            command.CommandText = "SELECT 1";
                            var result = (int)await command.ExecuteScalarAsync().ConfigureAwait(false);
                            if (result == 1)
                            {
                                return HealthCheckResult.Healthy($"SqlCheck({name}): Healthy");
                            }

                            return HealthCheckResult.Unhealthy($"SqlCheck({name}): Unhealthy");
                        }
                    }

Since this library will be pretty much used in microservices environments and cloud environments with for instance Azure SQL DB, there are many cases where you can have transient failures in the SQL connection that should be avoided with a retry strategy.
Entity Framework Core allows to implement a retry with exponential backoff pretty easily, so we might want to evolve the code above and use this mentioned approach that I explained at my blog:
https://blogs.msdn.microsoft.com/cesardelatorre/2017/03/26/using-resilient-entity-framework-core-sql-connections-and-transactions-retries-with-exponential-backoff/

Basically, with something like this:

          options.UseSqlServer(Configuration[“ConnectionString”],
                                           sqlServerOptionsAction: sqlOptions =>
                                           {
                                                 sqlOptions.EnableRetryOnFailure(maxRetryCount: 5,
                                                 maxRetryDelay: TimeSpan.FromSeconds(30),
                                                 errorNumbersToAdd: null);
                                           });

Healthcheck middleware must be protected

The healthcheck endpoint could lead to a DoS attack. This endpoint may be an hidden or an obscure endpoint.
The healthcheck endpoint must also be protected from malicious attacks.
This could be done by using an AuthorizationPolicyBuilder :

      app.UseHealthCheck(new HealthCheckOptions
      {
          Path = "/health", 
          AuthorizationPolicy = new AuthorizationPolicyBuilder()
                                  .RequireXxx()
                                  // More authorization requirements...
                                  .Build()
      });

And with an AuthorizationService in the middleware :

            if (_options.AuthorizationPolicy != null)
            {
                if (!await _authorizationService.AuthorizeAsync(principal, context, _options.AuthorizationPolicy))
                {
                    _logger.AuthorizationFailed();
                    await _next(context);
                    return;
                }
            }

Health check for Table Storage fails when table name is not provided

There is a bug in AzureHealthCheckBuilderExtensions.AddAzureTableStorageCheck where the health check fails if a table name is not provided because it attempts to get a table with a "null" name.

Likewise, if the name is passed in, the health check fails to actually check that the table exists.

Lines 73-78 currently read:

                    if (String.IsNullOrWhiteSpace(tableName))
                    {
                        var table = tableClient.GetTableReference(tableName);

                        result = await table.ExistsAsync();
                    }

Line 73 should be:
if (!String.IsNullOrWhiteSpace(tableName))

Parallelize checks

Check may be parallelized. Checks can be awaited outside of the foreach loop:

  var tasks = new List<Task<HealthCheckResult>>(_check.Count);
  foreach (var check in _checks)
  {
    tasks.Add(check.Value.CheckAsync());
  }

  var results = await Task.WhenAll(tasks.ToArray());
  for(int i=0; i<results.Length; i++)
  {
    // manage task completion & logs
  }

Be careful, this pseudo code do not take care of task scheduling...

HealthCheckTagHelper

I proposed a HealthCheckTagHelper in the old repository:
aspnet/AspLabs#17

I would recreate the PR for this repository too. But currently I'm not sure whether to place it in the Microsoft.Extensions.HealthChecks or in the Microsoft.AspNetCore.HealthChecks.
Or even better to introduce a new project (Microsoft.AspNetCore.Mvc.HealthChecks) because a TagHelper needs dependencies to MVC and Razor and a Web API project which is using Microsoft.AspNetCore.Mvc.Core shouldn't have those dependencies.
What do you think @bradwilson?

Add url to the data collection of UrlCheck when an exception is thrown

Basically, we're having an issue because part of the data returned when UrlCheck fails is different than the data returned when it is 200 OK. Especially, the URL is what we'd need.
In a simple report page we want to show something like the following, where I highlight where we have the issues.
A piece of data we'd like to show is the URL, in any case, even if it failed:

@model WebStatus.Viewmodels.HealthStatusViewModel
@{
ViewData["Title"] = "System Status";
}
 
<div class="row">
<div class="col-md-12">
<h2 class="overall-status-title">Overall Status: @Model.OverallStatus</h2>
</div>
</div>
<div class="list-group-status">
@foreach (var result in Model.Results)
{
<div class="row list-group-status-item">
<div class="col-md-10">
<h4 class="list-group-status-item-title">@result.Data["url"]</h4>
<p class="list-group-item-text">@result.Description</p>
</div>
<div class="col-md-2 list-group-status-item-label">
@if (@result.CheckStatus == Microsoft.Extensions.HealthChecks.CheckStatus.Healthy)
{
<span class="label label-success">@result.CheckStatus</span>
}
else if (@result.CheckStatus == Microsoft.Extensions.HealthChecks.CheckStatus.Unhealthy)
{
<span class="label label-danger">@result.CheckStatus</span>
}
else if (@result.CheckStatus == Microsoft.Extensions.HealthChecks.CheckStatus.Warning)
{
<span class="label label-warning">@result.CheckStatus</span>
}
else
{
<span class="label label-default">@result.CheckStatus</span>
}
</div>
</div> 
}
</div>

But when returning the info about the healthCheck from the Library, I see the following:

public class UrlChecker
{
....
private async Task CheckUrlAsync(HttpClient httpClient, string url, Action<string, IHealthCheckResult> adder)
{
   var name = $"UrlCheck({url})";
   try
   {
     var response = await httpClient.GetAsync(url).ConfigureAwait(false);
     var result = await _checkFunc(response);
     adder(name, result);
   }
   catch (Exception ex)
   {
     adder(name, HealthCheckResult.Unhealthy($"Exception during check: {ex.GetType().FullName}"));
   }
}
....
}

result, when the url check was 200 OK has more data, like the URL, etc., around 4 fields, but when HttpClient gets the exception, you are returning just the error without the URL, so it is not consistent with the same returned structure when it was successful. At least, I think the URL should be returned as part of the data. Don't you think so?

Would it be possible to return the same values or at least the URL which is always interesting to know about when the check fails or is unhealthy?

With the current approach, when trying to get the URL with @result.Data["url"]
we get an exception because it is, logically, null.

Thoughts?
Thanks for your help,
Cesar.

Healthcheck middleware should allow sub policy

Healthcheck middleware should be able to provide health status of specifics checks.
For example you have 3 http endpoint check (one for web payment, one for order management & one for user management), 2 certificates (one for payment http service, one for DataProtection), 1 database for cart management, ...
It should be possible to request health of the payment sub-system, for the order management sub-system, for the certificates, for the http endpoints, ...

GET /health/certificates HTTP/1.1

Reply

HTTP/1.1
[
  {
    "name": "payment certificate",
    "status": "healthy"
  },
  {
    "name": "data protection certificate",
    "status": "warning",
    "description": "Certificate is about to expire."
  }
]
GET /health/databases HTTP/1.1

Reply

HTTP/1.1
[
  {
    "name": "cart database",
    "status": "unhealthy",
    "description": "Unable to reach the database"
  }
]

Sub policies could be easily identified by tags on the ckecks.

UrlChecker can exhaust sockets

In the current implementation of the UrlChecker for every check a new HttpClient is created. This can cause problems.

On MSDN it is stated under the remarks section:

HttpClient is intended to be instantiated once and re-used throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. This will result in SocketException errors.

Also more information about this issue can be found here: https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/

I will try to create a minimal verifiable test project to demonstrate the problem soon, as well as perhaps a pull request to fix it. I am currently at work and don't have the time to do that now.

Add a way of creating extension packages

It would be nice if you could create extensions for this library so that third party extensions can be created and used, something like hw the loggers do this. Example Data extensions packages for MySQL, PostgreSQL could be created or OPC DA, XML, UA entensions as an example.

Implement a way to get the Health check services from Repository

Hi
From this blog https://scottsauber.com/2017/05/22/using-the-microsoft-aspnetcore-healthchecks-package/ I could see that those services that needs to be watched for health check needs to be registered at startup. Can that be modified such that I will be able to take those services from repository just before the health check happens and perform the health check.

This features helps me to dynamically add more services to be watched. Under the above stated implementation (in the URL) after adding a new service for health check watch I need to restart my app.

HealthChecks functions should be passed a ServiceScope

I am trying to implement a DbContext health check. The Sql health check doesn't work for me as I could be using any of a number of database providers such as PostgreSQL.

However, the check function needs a scoped instance of the DbContext to test, and this is not available during the Startup.ConfigureServices method.

If the HealthCheckServices passed in the IServiceScope, then the check method could get any services it required using IServiceScope.ServiceProvider.

Add default timeout on check

Checks could be very long, like http request taking several seconds for responding.
Each check should be cancelled after a default timeout, and considered as unhealthy.

Is this dead?

There are multiple open PRs with great new useful features dating back months. Has this project been abandoned?

Healthcheck middleware may expose checks results

The healthcheck middleware may expose the following informations :

  • check name
  • check status
  • elapsed time
  • check status description
  • additional data (property bag, like dns lookup time, ssl handshake time, ...)

Exposing such informations allow to a monitoring tool to aggregate health of the application over the time.

Avoid usage of ConcurrentDictionary

In the CompositeHealthCheckResult, a Concurrent Dictionary has replaced a basic Dictionary.
It is not necessary to be thread safe in this class if the Health Service do not add the results in different threads.

An I wrong?
@bradwilson

review items

Split interfaces into an abstractions package
Potentially remote IHealthCheckResult, an interface with all properties is a smell we should understand and make sure of
Does IServiceProvider need to be exposed on the IHealthCheckService interface? Can we remove it?

Implements Canary endpoint

The Canary endpoint provides a binary health status of the application. Dead or alive. This endpoint is designed for HTTP load balancer, responding 200 HTTP status or 503 HTTP status.

This endpoint may be controlled either by the IHealthcheckService or by an application switch.
For example :
The application must be upgraded. On the current node , the operator will switch off the application.
Even with the application off, the application will continue to respond to others requests.
The load balancer will remove the current node from its healthy nodes list and will reroute requests to others nodes.
The application will not receive any request and can be stopped/killed/upgraded.

The same case can occur with the healthcheck service. If the application is unhealthy, the canary endpoint will respond 503 and the load balancer will remove the current node from the healthy nodes list

This is similar to the app_offline.htm file, but app_offline.htm has the drawback to reply 502 to the clients.

Better README

Currently is terrible. One has to browse and look through the sample to get an idea of what this lib does.

Azure Queue Storage check fails if queue name is not provided

There is a bug in AzureHealthCheckBuilderExtensions.AddAzureQueueStorageCheck where the health check fails if a queue name is not provided because it attempts to get a table with a "null" name.

Likewise, if the name is passed in, the health check fails to actually check that the table exists.

Lines 153-158 currently read:

                    if (String.IsNullOrWhiteSpace(queueName))
                    {
                        var queue = queueClient.GetQueueReference(queueName);

                        result = await queue.ExistsAsync();
                    }

Line 153 should be:
if (!String.IsNullOrWhiteSpace(queueName))

NuGet package of HealthChecks

As soon as we have a first stable preview version, it would be good to have a NuGet package of it so samples or real usage of it rely on the packages instead of on GitHub code.
Even if it is a 0.2 version, I think it is worth to start having a NuGet package, soon.

Apply Service Unavailable globally if healthcheck fails

If I understand this line correctly, the server will return a HTTP 503 if the path/port matches what has been registered. However if it doesn't then it will invoke the next middleware regardless of health. This seems odd, I would expect the server to return 503s if the healthchecks have failed. In my case, despite the "unhealthy" status, the pipeline continues and the controller inevitably fails for the same reasons as the health checks.

I think that the middleware should issue HTTP 503 in this case, or at least let this be configurable. More than happy to give a PR, if this feels sensible?

Implement HealthCheck for Azure Service Bus dependency

In a similar fashion than the RabbitMQ HealthCheck, having it for Azure Service Bus, at these levels of testing:

  • Azure SB topic (usually, each topic in your Azure SB account will be owned by a different business application)
  • Testing an Azure SB Subscriptions list which are dependencies for a particular service/application
  • Testing a message Publish() into the Service Bus with a "Test" Event/Message Type, so it is ignored by other subscriptors
    In any case, the scope should be defined in further details.

Url/Check association

Hello,

It seems their is no way to associate an url with a specific check

/url1 => check1
/url2 => check2

Is it something planned ?

Thank you

Rename generic-named Extensions containers to more feature-specific names

Implement HealthCheck for RabbitMQ dependency

This probably could be done at two levels:

  1. Checking just the RabbitMQ service/server availability which is a dependency for your application or microservice, like just based on the address and port, like:
    rabbitmq_address:5672
    This might be too limited, though...

  2. Checking a collection of specific queues and RoutingKeys (Like testing subscriptions connectivity) based on parameters like:
    queue: _queueName
    exchange: _brokerName
    routingKey: eventName

Probably, being able to check for a list of queues that your app/service depend on looks important for a RabbitMQ HealthCheck.
Doing that only at the server/service level seems too high level, initially...

HealthCheck class design

The HealthCheck class is not not DI friendly. The constructor has values parameters like the check name, and check behavior with the Func<CancelationToken, ValueTask<IHealthCheckResult>> check parameter (ouch....)

As the HealthCheck class is responsible of the caching behavior, it should be the principal way to implement IHealthCheck.
Related to #23.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.