dotnet-architecture / healthchecks Goto Github PK

Experimental Health Checks for building services, such as with ASP.NET Core

License: Other

C# 100.00%

healthchecks's Introduction

(Deprecated) Health checks for building services

This project was an experimental library for ASP.NET Core, as an initial out-of-band library and exploration of concepts and a way to get feedback.

Therefore, this is replaced by ASP.NET Core 2.2 (GA around end of 2018) with official Health Checks components here: https://github.com/aspnet/Diagnostics/tree/release/2.2/src

Check the ASP.NETCore 2.2 roadmap: aspnet/Announcements#307

healthchecks's People

Contributors

Stargazers

Watchers

Forkers

juergengutsch nicolastarzia lurumad eiximenis cesardelatorre ycrumeyrolle herecydev gilmond wangshifeng joaomajesus taomylife521 snys98 christiansparre jsklenka qumeta ian-zh-fang karlosrivera hungcuongsoftware protossyk codeinpeace sarfarajlastgithub codesee polys johnduhart seven1986 dotnetframework franklin89 ifbaltics rendlelabs liujiekm kolowoyeye azhe127 jej666 ruurd-jan bkaid onestar1 fredrikk carvrodrigo jhrendon xorecs githubhaikuang mthamil vhatuncev snw55 huaryliu stantoxt mdormann chunlei lanluusc cesarcastrocuba bymyslf dima1034 lulzzz togglebrain brianokanga elishagreenwald cupegui mdryden ruelala farfush srigurukkal thecoder87 fastec auroramo winntxp asampledone ancile mckanpolat jansen-consulting gkalp2000 somayarlagadda silverness migrap saqibhussain singhal-p aston22 caothetoan achehre eswarchennupati 216giorgiy newdygo irontoby reddogaw chewel611 ma1f ethol stevejgordon thiagolunardi mirsaeedi gurmeetatweb kengranderson shenglol samrat-ghose feluxmaj micheldosprazeres tabull zeroinfinite waynemunro iericzheng gssrini23

healthchecks's Issues

Healthcheck services should cache the results

Each check should be cacheable for a period of time.
For example http endpoint checks may be cached for 10 seconds, database checks may be cached for 2 seconds, ...

What is the Proposed Release Date

Will this get released as part of 2.0?

Add overloads allowing CacheDuration to the AddUrlCheck, AddSqlCheck, etc. methods.

By default, the cache config set in the code to 5 min:
Microsoft.Extensions.HealthChecks\HealthCheckBuilder.cs
DefaultCacheDuration = TimeSpan.FromMinutes(5);

But we need a way to change that Cache duration from the outside (client), i.e. per each Health Check being added.
There are not CacheDuration as a parameter from the more external methods like:
HealthCheckBuilderExtensions:
AddUrlCheck() or AddSqlCheck() overload methods.

Need to ad those overloads.

The internal method AddCheck allows allows CacheDuration, however, so we just need to propagate that to the "external" methods:

public static HealthCheckBuilder AddCheck(this HealthCheckBuilder builder, string name, Func check, TimeSpan cacheDuration)
{
Guard.ArgumentNotNull(nameof(builder), builder);
return builder.AddCheck(name, HealthCheck.FromCheck(check), cacheDuration);
}

Enhance grouping feature

The group feature allow to aggregate differents health checks into one.
I would like to use HealthChecks for both application monitoring and as circuit breaker for some features of my applications.
i.e. I have a e-commerce web site with a payment feature. The payment feature is based on a payment service, a payment-related certificate and an azure storage. I will create a group for the payment with the three components.
By doing so, I will be able to disable the payment on the web site when any payment-related component
will be unavailable.

I have my ops teams that are responsible to monitor the low-level component.
I have a team responsible of the certificate management, a team responsible of the web service availability, ...

I would like to be able to have more than one group per IHealthCheck. For example the "web service" group & the "payment" group.
The IHealhCheckService.CheckHealthAsync() may require a new parameter giving the ability to desired group to check, or all the checks as currently.

@bradwilson @JuergenGutsch

Add check based on counters threshold

Checks based on counter are useful for detecting transient failures recurrence.
For example :

try
{
  // do some stuff that can fail
  // ...
  counter.Reset();
}
catch
{
  counter.Increment();
  throw;
}

Where the counter reach the threshold, the check is unhealthy.

SQL Connection resiliency by using SQL Connections with retries and exponential backoff

The current SQL Server check is pretty basic. I know it was in "pending to be improved".
It is like the following:
https://github.com/aspnet/HealthChecks/blob/dev/src/Microsoft.Extensions.HealthChecks.SqlServer/HealthCheckBuilderSqlServerExtensions.cs

                    //TODO: There is probably a much better way to do this.
                    using (var connection = new SqlConnection(connectionString))
                    {
                        connection.Open();
                        using (var command = connection.CreateCommand())
                        {
                            command.CommandType = CommandType.Text;
                            command.CommandText = "SELECT 1";
                            var result = (int)await command.ExecuteScalarAsync().ConfigureAwait(false);
                            if (result == 1)
                            {
                                return HealthCheckResult.Healthy($"SqlCheck({name}): Healthy");
                            }

                            return HealthCheckResult.Unhealthy($"SqlCheck({name}): Unhealthy");
                        }
                    }

Since this library will be pretty much used in microservices environments and cloud environments with for instance Azure SQL DB, there are many cases where you can have transient failures in the SQL connection that should be avoided with a retry strategy.
Entity Framework Core allows to implement a retry with exponential backoff pretty easily, so we might want to evolve the code above and use this mentioned approach that I explained at my blog:
https://blogs.msdn.microsoft.com/cesardelatorre/2017/03/26/using-resilient-entity-framework-core-sql-connections-and-transactions-retries-with-exponential-backoff/

Basically, with something like this:

          options.UseSqlServer(Configuration[“ConnectionString”],
                                           sqlServerOptionsAction: sqlOptions =>
                                           {
                                                 sqlOptions.EnableRetryOnFailure(maxRetryCount: 5,
                                                 maxRetryDelay: TimeSpan.FromSeconds(30),
                                                 errorNumbersToAdd: null);
                                           });

Healthcheck middleware must be protected

The healthcheck endpoint could lead to a DoS attack. This endpoint may be an hidden or an obscure endpoint.
The healthcheck endpoint must also be protected from malicious attacks.
This could be done by using an AuthorizationPolicyBuilder :

      app.UseHealthCheck(new HealthCheckOptions
      {
          Path = "/health", 
          AuthorizationPolicy = new AuthorizationPolicyBuilder()
                                  .RequireXxx()
                                  // More authorization requirements...
                                  .Build()
      });

And with an AuthorizationService in the middleware :

            if (_options.AuthorizationPolicy != null)
            {
                if (!await _authorizationService.AuthorizeAsync(principal, context, _options.AuthorizationPolicy))
                {
                    _logger.AuthorizationFailed();
                    await _next(context);
                    return;
                }
            }

Create MySql HealthCheck

Implement HealthCheck for REDIS dependency

This would be a nice to have health check in addition to the SQL Server health check. :)

Health check for Table Storage fails when table name is not provided

There is a bug in AzureHealthCheckBuilderExtensions.AddAzureTableStorageCheck where the health check fails if a table name is not provided because it attempts to get a table with a "null" name.

Likewise, if the name is passed in, the health check fails to actually check that the table exists.

Lines 73-78 currently read:

                    if (String.IsNullOrWhiteSpace(tableName))
                    {
                        var table = tableClient.GetTableReference(tableName);

                        result = await table.ExistsAsync();
                    }

Line 73 should be:
if (!String.IsNullOrWhiteSpace(tableName))

Parallelize checks

Check may be parallelized. Checks can be awaited outside of the foreach loop:

  var tasks = new List<Task<HealthCheckResult>>(_check.Count);
  foreach (var check in _checks)
  {
    tasks.Add(check.Value.CheckAsync());
  }

  var results = await Task.WhenAll(tasks.ToArray());
  for(int i=0; i<results.Length; i++)
  {
    // manage task completion & logs
  }

Be careful, this pseudo code do not take care of task scheduling...

HealthCheckTagHelper

I proposed a HealthCheckTagHelper in the old repository:
aspnet/AspLabs#17

I would recreate the PR for this repository too. But currently I'm not sure whether to place it in the Microsoft.Extensions.HealthChecks or in the Microsoft.AspNetCore.HealthChecks.
Or even better to introduce a new project (Microsoft.AspNetCore.Mvc.HealthChecks) because a TagHelper needs dependencies to MVC and Razor and a Web API project which is using Microsoft.AspNetCore.Mvc.Core shouldn't have those dependencies.
What do you think @bradwilson?

Add ASP.NET equivalent to ASP.NET Core middleware

Add url to the data collection of UrlCheck when an exception is thrown

Basically, we're having an issue because part of the data returned when UrlCheck fails is different than the data returned when it is 200 OK. Especially, the URL is what we'd need.
In a simple report page we want to show something like the following, where I highlight where we have the issues.
A piece of data we'd like to show is the URL, in any case, even if it failed:

@model WebStatus.Viewmodels.HealthStatusViewModel
@{
ViewData["Title"] = "System Status";
}
　
<div class="row">
<div class="col-md-12">
<h2 class="overall-status-title">Overall Status: @Model.OverallStatus</h2>
</div>
</div>
<div class="list-group-status">
@foreach (var result in Model.Results)
{
<div class="row list-group-status-item">
<div class="col-md-10">
<h4 class="list-group-status-item-title">@result.Data["url"]</h4>
<p class="list-group-item-text">@result.Description</p>
</div>
<div class="col-md-2 list-group-status-item-label">
@if (@result.CheckStatus == Microsoft.Extensions.HealthChecks.CheckStatus.Healthy)
{
<span class="label label-success">@result.CheckStatus</span>
}
else if (@result.CheckStatus == Microsoft.Extensions.HealthChecks.CheckStatus.Unhealthy)
{
<span class="label label-danger">@result.CheckStatus</span>
}
else if (@result.CheckStatus == Microsoft.Extensions.HealthChecks.CheckStatus.Warning)
{
<span class="label label-warning">@result.CheckStatus</span>
}
else
{
<span class="label label-default">@result.CheckStatus</span>
}
</div>
</div> 
}
</div>

But when returning the info about the healthCheck from the Library, I see the following:

public class UrlChecker
{
....
private async Task CheckUrlAsync(HttpClient httpClient, string url, Action<string, IHealthCheckResult> adder)
{
   var name = $"UrlCheck({url})";
   try
   {
     var response = await httpClient.GetAsync(url).ConfigureAwait(false);
     var result = await _checkFunc(response);
     adder(name, result);
   }
   catch (Exception ex)
   {
     adder(name, HealthCheckResult.Unhealthy($"Exception during check: {ex.GetType().FullName}"));
   }
}
....
}

result, when the url check was 200 OK has more data, like the URL, etc., around 4 fields, but when HttpClient gets the exception, you are returning just the error without the URL, so it is not consistent with the same returned structure when it was successful. At least, I think the URL should be returned as part of the data. Don't you think so?

Would it be possible to return the same values or at least the URL which is always interesting to know about when the check fails or is unhealthy?

With the current approach, when trying to get the URL with @result.Data["url"]
we get an exception because it is, logically, null.

Thoughts?
Thanks for your help,
Cesar.

Healthcheck middleware should allow sub policy

Healthcheck middleware should be able to provide health status of specifics checks.
For example you have 3 http endpoint check (one for web payment, one for order management & one for user management), 2 certificates (one for payment http service, one for DataProtection), 1 database for cart management, ...
It should be possible to request health of the payment sub-system, for the order management sub-system, for the certificates, for the http endpoints, ...

GET /health/certificates HTTP/1.1

HTTP/1.1
[
  {
    "name": "payment certificate",
    "status": "healthy"
  },
  {
    "name": "data protection certificate",
    "status": "warning",
    "description": "Certificate is about to expire."
  }
]

GET /health/databases HTTP/1.1

HTTP/1.1
[
  {
    "name": "cart database",
    "status": "unhealthy",
    "description": "Unable to reach the database"
  }
]

Sub policies could be easily identified by tags on the ckecks.

UrlChecker can exhaust sockets

In the current implementation of the UrlChecker for every check a new HttpClient is created. This can cause problems.

On MSDN it is stated under the remarks section:

HttpClient is intended to be instantiated once and re-used throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. This will result in SocketException errors.

Also more information about this issue can be found here: https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/

I will try to create a minimal verifiable test project to demonstrate the problem soon, as well as perhaps a pull request to fix it. I am currently at work and don't have the time to do that now.

Add a way of creating extension packages

It would be nice if you could create extensions for this library so that third party extensions can be created and used, something like hw the loggers do this. Example Data extensions packages for MySQL, PostgreSQL could be created or OPC DA, XML, UA entensions as an example.

Rename Microsoft.Extensions.HealthChecks.[Azure => AzureStorage]

Offer middleware option to use path instead of custom port

Implement a way to get the Health check services from Repository

Hi
From this blog https://scottsauber.com/2017/05/22/using-the-microsoft-aspnetcore-healthchecks-package/ I could see that those services that needs to be watched for health check needs to be registered at startup. Can that be modified such that I will be able to take those services from repository just before the health check happens and perform the health check.

This features helps me to dynamically add more services to be watched. Under the above stated implementation (in the URL) after adding a new service for health check watch I need to restart my app.

When do you release the package?

Hi guys, do you have plans to release the nuget package soon?

Add ConfigureAwait to async API endpoints

Already implemented in an Open Source project

Looks like this come from a Metrics.Net extension written for dotnet core.

My implementation written a while back looks very similar, feel as though my efforts in open sourcing will be wasted :(

Here's the Repo

HealthChecks functions should be passed a ServiceScope

I am trying to implement a DbContext health check. The Sql health check doesn't work for me as I could be using any of a number of database providers such as PostgreSQL.

However, the check function needs a scoped instance of the DbContext to test, and this is not available during the Startup.ConfigureServices method.

If the HealthCheckServices passed in the IServiceScope, then the check method could get any services it required using IServiceScope.ServiceProvider.

Health Checks should use transient fault handling

HealthChecks should use similar logic to transient fault handling to avoid false negative health reporting of transient faults

https://docs.microsoft.com/en-us/aspnet/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/transient-fault-handling

Rename Microsoft.Extensions.HealthChecks.Data to Microsoft.Extensions.HealthChecks.SqlServer

Add default timeout on check

Checks could be very long, like http request taking several seconds for responding.
Each check should be cancelled after a default timeout, and considered as unhealthy.

Is this dead?

There are multiple open PRs with great new useful features dating back months. Has this project been abandoned?

Healthcheck middleware may expose checks results

The healthcheck middleware may expose the following informations :

check name
check status
elapsed time
check status description
additional data (property bag, like dns lookup time, ssl handshake time, ...)

Exposing such informations allow to a monitoring tool to aggregate health of the application over the time.

Avoid usage of ConcurrentDictionary

In the CompositeHealthCheckResult, a Concurrent Dictionary has replaced a basic Dictionary.
It is not necessary to be thread safe in this class if the Health Service do not add the results in different threads.

An I wrong?
@bradwilson

review items

Split interfaces into an abstractions package
Potentially remote IHealthCheckResult, an interface with all properties is a smell we should understand and make sure of
Does IServiceProvider need to be exposed on the IHealthCheckService interface? Can we remove it?

Support of DI

Check may support DI on ctor.

Convert to csproj and VS 2017

@bradwilson you were going to start this and @Eilon was going to get someone to finish it, right?

Bug with exception handling introduced by parallelization

The parallelization code in HealthCheckService has a bug because Task.WhenAll will throw if any Task results in an exception. The try/catch for handling these exceptions is in the wrong place.

Implements Canary endpoint

The Canary endpoint provides a binary health status of the application. Dead or alive. This endpoint is designed for HTTP load balancer, responding 200 HTTP status or 503 HTTP status.

This endpoint may be controlled either by the IHealthcheckService or by an application switch.
For example :
The application must be upgraded. On the current node , the operator will switch off the application.
Even with the application off, the application will continue to respond to others requests.
The load balancer will remove the current node from its healthy nodes list and will reroute requests to others nodes.
The application will not receive any request and can be stopped/killed/upgraded.

The same case can occur with the healthcheck service. If the application is unhealthy, the canary endpoint will respond 503 and the load balancer will remove the current node from the healthy nodes list

This is similar to the app_offline.htm file, but app_offline.htm has the drawback to reply 502 to the clients.

Better README

Currently is terrible. One has to browse and look through the sample to get an idea of what this lib does.

Implement HealthCheck for ServiceFabric cluster

Please consider implementing a ServiceFabric cluster HealthCheck

Target .NET Standard instead of netcoreapp v1

Consider adding Retry-After header on 503 response

According to HTTP 1.1, a 503 http response MAY add a After-Retry header.

This header may contains the value of the nearest expiration of the cache.

Azure Queue Storage check fails if queue name is not provided

There is a bug in AzureHealthCheckBuilderExtensions.AddAzureQueueStorageCheck where the health check fails if a queue name is not provided because it attempts to get a table with a "null" name.

Likewise, if the name is passed in, the health check fails to actually check that the table exists.

Lines 153-158 currently read:

                    if (String.IsNullOrWhiteSpace(queueName))
                    {
                        var queue = queueClient.GetQueueReference(queueName);

                        result = await queue.ExistsAsync();
                    }

Line 153 should be:
if (!String.IsNullOrWhiteSpace(queueName))

Have builds of this repo go to a nightly feed

Add the build stuff to this repo for automated builds
Add stuff like NuGet Package Verifier
Code signing

README.md - "This project is part of ASP.NET Core."

The README.md says "This project is part of ASP.NET Core."
What does this mean?
Has this project moved?

NuGet package of HealthChecks

As soon as we have a first stable preview version, it would be good to have a NuGet package of it so samples or real usage of it rely on the packages instead of on GitHub code.
Even if it is a 0.2 version, I think it is worth to start having a NuGet package, soon.

Apply Service Unavailable globally if healthcheck fails

If I understand this line correctly, the server will return a HTTP 503 if the path/port matches what has been registered. However if it doesn't then it will invoke the next middleware regardless of health. This seems odd, I would expect the server to return 503s if the healthchecks have failed. In my case, despite the "unhealthy" status, the pipeline continues and the controller inevitably fails for the same reasons as the health checks.

I think that the middleware should issue HTTP 503 in this case, or at least let this be configurable. More than happy to give a PR, if this feels sensible?

Implement HealthCheck for Azure Service Bus dependency

In a similar fashion than the RabbitMQ HealthCheck, having it for Azure Service Bus, at these levels of testing:

Azure SB topic (usually, each topic in your Azure SB account will be owned by a different business application)
Testing an Azure SB Subscriptions list which are dependencies for a particular service/application
Testing a message Publish() into the Service Bus with a "Test" Event/Message Type, so it is ignored by other subscriptors
In any case, the scope should be defined in further details.

Url/Check association

Hello,

It seems their is no way to associate an url with a specific check

/url1 => check1
/url2 => check2

Is it something planned ?

Thank you

Rename generic-named Extensions containers to more feature-specific names

This is to avoid having the same namespace+type name in multiple assemblies. (The specific name doesn't matter that much, but something like HealthChecksServiceCollectionExtensions would be fine.)

See:

Implement HealthCheck for RabbitMQ dependency

This probably could be done at two levels:

Checking just the RabbitMQ service/server availability which is a dependency for your application or microservice, like just based on the address and port, like:
rabbitmq_address:5672
This might be too limited, though...
Checking a collection of specific queues and RoutingKeys (Like testing subscriptions connectivity) based on parameters like:
queue: _queueName
exchange: _brokerName
routingKey: eventName

Probably, being able to check for a list of queues that your app/service depend on looks important for a RabbitMQ HealthCheck.
Doing that only at the server/service level seems too high level, initially...

HealthCheck class design

The HealthCheck class is not not DI friendly. The constructor has values parameters like the check name, and check behavior with the Func<CancelationToken, ValueTask<IHealthCheckResult>> check parameter (ouch....)

As the HealthCheck class is responsible of the caching behavior, it should be the principal way to implement IHealthCheck.
Related to #23.

Implement HealthCheck for Azure DocumenDB

Please consider an Implementation HealthCheck for Azure DocumenDB