ks888 / lambstatus Goto Github PK

View Code? Open in Web Editor NEW

1.3K 31.0 121.0 6.43 MB

[Maintenance mode] Serverless Status Page System

Home Page: https://lambstatus.github.io

License: Apache License 2.0

JavaScript 96.49% HTML 0.12% Shell 0.27% CSS 3.12%

serverless aws-lambda statuspage react redux nodejs javascript cloudformation lambda

lambstatus's People

Contributors

Stargazers

Watchers

Forkers

knakayama pavelnikolov cinderellagarage neo4reo kindlyops jdrew1303 ezanltd rgaidot zer0id0l pilgrim2go aavileli izogain ajohnstone kinlane reallukemartin vgmartinez trybeapps jpike88 adamclark-dev liamleane artbikes gustavoesser kbariotis rmoreira neuroradiology joelferrier johnmcdowall bkrukowski volkancakil intfrr aswinsatish nodomain iskto ulan08 ha-king xyntrix wmnnd j-collier burtcorp looper25 hc747 mijdavis2 rcdexta joshuatonga rniedosmialek frodeaa ms4720 salekseev daniloalsilva tied ralucas ruie jvsantana kudobuzz dakotabenjamin saurabhdevops aarym felipes digital360 jtszalay pgrzesik hhy5277 satnami tundaware orez- gibbsie beck luckylinkteam panacea101 alexanderiakovlev schuringa heinicke ik-networking univrs jpbostic engineal mobilustechnologies giladno roastlog fireball1725 ik-monitoring jacobjohansen wangruhua rehanvdm vfondevilla pks-os michaelcosby devopsutils doc22940 rtulke sprohaska orquestracd bitpesa apollusehs-oss modulexcite anton-mesnyankin gothamtommy massimoselvi maniacs-oss radekl

lambstatus's Issues

Integrate with New Relic Metrics

It is able to import metrics from CloudWatch Metrics so far. New Relic is widely used monitoring SaaS. So let's support this.

Ability to edit Incident timeline

As a Status Admin I want the ability to edit incident timelines so that I can make updates to the timeline notes (Like inaccurate spelling or updates on a url) without having to delete the entire incident.

[Metrics] Add an option to choose the statistics

From @stephencornelius' comment on gitter:

stephencornelius @stephencornelius Jul 14 19:36
Hi, ive just setup lambstatus and really liking it so far, very easy to get working. However I’ve got a question about metrics, im adding cloudwatch metrics and was wondering if theres anyway to specify the statistic? e.g. for ELB 2xx status code the only statistic that makes sense is sum, all others just report 1, im guessing average is chosen by default?

Kishin Yagami @ks888 22:53
@stephencornelius Thank you for using LambStatus. You're right. On collecting the datapoints from CloudWatch, 'Average' statistic is used. Unfortunately there is no option to choose the other statistics. I guess it is better to add an option to choose them.

Add 'Allowed Pattern' property to User Parameters of CloudFormation template

UserEmail and UserName parameters of CloudFormation template accept any values, and it will cause the errors like #6 later.

Add AllowedPattern property to parameters so that we can notice the wrong values as soon as possible.
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html

Change the color of the header according to the service status

So far, the color of the header is always green:

A user will notice an incident immediately if this color changes according to the service status, like the red color for major outage. GitHub Status page actually changes the title logo when the incident happens.

Fetch cloud watch metrics from other aws accounts

Would be good to also specify different accounts to fetch metrics from too.

S3 website endpoints

The concatenation used to form the S3 website endpoints used by CloudFront is broken for regions that use the .region instead of the -region format (Ohio, Canada, Mumbai, Seoul, Frankfurt and London)

http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html
The two general forms of an Amazon S3 website endpoint are as follows:
bucket-name.s3-website-region.amazonaws.com (dash region)
bucket-name.s3-website.region.amazonaws.com (dot region)

Listing of all region endpoints
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_website_region_endpoints

Including a mapping lookup as a possible fix?
https://blog.doismellburning.co.uk/pointing-aws-cloudfront-at-an-s3-website-with-cloudformation/

Admin and Status page CloudFront distribution origins
https://github.com/ks888/LambStatus/blob/v0.3.0/cloudformation/lamb-status.yml#L2524
https://github.com/ks888/LambStatus/blob/v0.3.0/cloudformation/lamb-status.yml#L2573

Hide empty incident cards

This looks ugly:

Couple of suggestions

Hello,

When adding a Service Name on the settings page, it combines whatever you enter there with the green Status text. I.E. My AWS MonitoringStatus . There needs to be a space between the service name entered and the word Status.

The other thing, is when adding scheduled Maintenance. The word currently shown on the Status page is Maintenances. It should say Maintenance.

Nothing huge, but it helps with the presentation :)

Link cloudwatch alarms to component status

From @maximede's comment on gitter:

Maxime Deravet @maximede 00:16
Great! Thanks! Also, is there any plan to link cloudwatch alarms to component status?

Kishin Yagami @ks888 09:53
Such an integration sounds interesting. I don't have a specific plan to support it, though.
Anyway, it would be good to create the issue at first.

[Metrics] Do not connect the points if there is the missing data in between

From the antonivs's comment on reddit.

One other issue that I haven't investigated yet is that the site I'm testing with
happened to be down for a number hours on Friday, but for some reason I
don't see any gap on the LambStatus metrics chart, although there's a clear
gap in the charts on CloudWatch. If I find out more, I'll let you know the details.

The metrics chart of LambStatus connects the points by the straight line, and it doesn't check whether there is the missing data. It seems CloudWatch Metrics connects the points only when there is no missing data in between. So maybe that's why the appearance of the charts are different. This issue fixes this.

Have separate URL for latest commit on master

So you and I don't have to wait for a release to take advantage of the latest commit, and this template must be based in S3, it's probably worth having another 'Create Stack' button that allows people to use a template based on the latest commit (like a 'beta' button or something). That way people are less likely to feel the need to fork or do something manual.

Introduce Wercker for testing

Introduce LambCI, the nice alternative to SaaS CI.

Also, it may reveal some advantages (and disadvantages) of this kind of tools.

Support scheduled maintenance

Sometimes the scheduled maintenance is inevitable. In that case, it is important to announce the maintenance schedule in advance. This issue implements the functions to support this.

Support custom CSS

Suggestion from @kaustubhmenon #14 (comment)

Suggestion:
Integrate a new CSS/SCSS file for manipulating the UI, without touching any core SCSS files. Could also be similar to Edit CSS in Wordpress (https://en.support.wordpress.com/custom-design/editing-css/) . While inspecting the structure I noticed that all class names are appended with a random string in the end which could be a possible blocker for targeting CSS classes?

My reply:
StatusPage.io also has a similar feature https://help.statuspage.io/knowledge_base/topics/using-custom-css . Though the css class has a random prefix since CSS modules are used, maybe we can let the principal UI components have the id attribute and customize the design like this:

#container {
    width: 90%;
    max-width: 850px;
}

Integrate with CloudWatch Metrics

To tell the recent performance of the web service, it is good to have the graph which shows the response time, uptime, and so on. Although LambStatus will not have the ability to monitor the service, it is possible to integrate with other monitoring SaaS.

I'm going to start with CloudWatch Metrics since we don't need another account to try it.

TODOs:

List available metrics
Register the metrics whose graph will be shown on the StatusPage
Periodically collect the data of registered metrics
Show the collected data on the StatusPage

Created the show-cloudwatch-metrics branch for this purpose.

Launch Stack failed

I try to click a button launch stack but it failed.
Here are some errors:

...
21:49:36 UTC+1100	DELETE_IN_PROGRESS	AWS::IAM::Role	CognitoSMSCallerRole	
21:49:36 UTC+1100	DELETE_IN_PROGRESS	AWS::S3::BucketPolicy	StatusPageS3BucketPolicy	
21:49:29 UTC+1100	ROLLBACK_IN_PROGRESS	AWS::CloudFormation::Stack	StatusPage	The following resource(s) failed to create: [AdminPageFrontend, StatusPageDistribution, AdminPageDistribution, LambdaRoleInstanceProfile, StatusPageFrontend]. . Rollback requested by user.
21:49:27 UTC+1100	CREATE_FAILED	AWS::CloudFront::Distribution	StatusPageDistribution	Resource creation cancelled
21:49:27 UTC+1100	CREATE_FAILED	AWS::IAM::InstanceProfile	LambdaRoleInstanceProfile	Resource creation cancelled
21:49:27 UTC+1100	CREATE_FAILED	AWS::CloudFront::Distribution	AdminPageDistribution	Resource creation cancelled
21:49:27 UTC+1100	CREATE_FAILED	Custom::S3SyncObjects	AdminPageFrontend	Data returned must be an object
21:49:26 UTC+1100	CREATE_COMPLETE	AWS::DynamoDB::Table	ServiceComponentTable	
21:49:26 UTC+1100	CREATE_FAILED	Custom::S3SyncObjects	StatusPageFrontend	Data returned must be an object
21:49:26 UTC+1100	CREATE_COMPLETE	AWS::DynamoDB::Table	IncidentTable	
21:49:26 UTC+1100	CREATE_COMPLETE	AWS::DynamoDB::Table	IncidentUpdateTable	
21:49:23 UTC+1100	CREATE_COMPLETE	AWS::Lambda::Permission	GetComponentsLambdaInvokePermission	
...

Integrate uptime testing

https://github.com/streaka/lambda-ping

The problem is there is no 'uptime' metric naturally occurring on cloudwatch. It would be nice to package with this an optional lambda function that pings on a fixed schedule, making that available in cloudwatch for metric selection (and it can be a 'recommended metric' or something in the UI).

I'm using the above script to add 'uptime' metrics and whatever else might be handy... i'll see if I can integrate this in.

Could you make me a contributor and I can publish on another branch instead of a fork?

Customize the header and footer html

Suggestion from @kaustubhmenon #14 (comment)

Suggestion:
Ability to add external links to the footer or header. Would love to add links for documentation, support, etc.
Manipulate the structure of Header, Main Container and Footer would be a added bonus if possible. Something similar to https://www.statuspage.io/features/customization. 

My reply:
Maybe we can support these customizations by having the feature to customize the header html and footer html.

Auto update?

At the moment, to my knowledge, the only way to update is through a whole new AWS CF Stack. I was wondering if Lambda could be used somehow to auto update when a new update is pushed on github.

Fetch CloudWatch metrics from other regions

From @maximede's comment on gitter:

Maxime Deravet @maximede 08:29
Hey there !
I'm a bit confused by what the github repo states :
Choose the AWS region different from your service's region. If both your service and its status page rely on the same region, the region outage may stop both.
I'm not sure how I can achieve that as, when I try to add a metric, I don't have access to the metrics coming from a different region. Am I missing something?

Kishin Yagami @ks888 09:52
Oops, that’s a bug. Only the metrics from the LambStatus' region are fetched. The metrics from other regions should be fetched, too. I will fix this on the weekend and release the new version.

Feature suggestions

I have a lot of suggestions, so I thought it would be a good idea to combine them all here.

Component Grouping #10
Two factor authentication #13
AWS SNS intergration #11
Custom colours/ styling changeable via a settings page #12
- This could be done by having the S3 page trigger a lambda even in order to modify the CSS in S3
SSO
- E.g. With google/ facebook

Component Groups

Allow grouping of components.

E.g:

The password email may be sent to the spam folder

From the antonivs's comment on reddit.

Edit: Forgot to mention, I installed it with no trouble except that I was dutifully
waiting 20+ minutes for the password email, when I realized that it had gone
into my junk folder like 15 minutes before. I guess that's because the from
address is invalid.

Settings page

Add a settings page for things like custom CSS, colours and custom domains.

Update stack fails

The bucket doesn't exist...

Event model?

Hi!

I'm really interested in LambStatus, particularly because all of our applications are using the serverless framework.

Nonetheless, I feel like LambStatus could use an "Event" model. e.g. A client would ping the Event API at the end of a successful deployment.

Currently we are doing this with Cachet as a "incident". Though to my eyes, a deployment is not an incident.

The bigger question is are status pages meant to be a repository of events? Taking a quick look around at the public status pages, no one seems to be doing this.

Format CloudWatch bytes into kb, MB, GB, TB, etc.

CloudWatch data comes back as bytes. Being able to convert 4352360851042 into 3.958 TB would be useful and more human-friendly.

Delete stack fails

Status reason:
The following resource(s) failed to delete: [AdminPageS3, StatusPageS3].

Do we need doc on stack delete? Or can this be covered in stack delete? I'll try to just delete the buckets and run stack delete again..

Broken on latest commit

Settings page is breaking showing:

And status page is stuck on fetching data. It looks like the get-settings and get-public-settings functions are both not working properly

Settings table in DynamoDB just contains this (not sure if good or bad):

Logs say this for /aws/lambda/myapp-GetSettings in CloudWatch

Logs say this for /aws/lambda/myapp-GetPublicSettings in CloudWatch

Because the code is minified and web packed etc, no idea how to begin to fix this myself. You may need to provide some info on how to load up the lambda projects and debug them etc as I don't know how to set up a similar development workflow

Adjust the range of Y-axis as well when the metrics are averaged

From the antonivs's comment on reddit.

Hourly is fine. However, the issue is that when weekly or monthly is selected,
the range of the Y axis is not adjusted to match the range of the data, which
I think is the reason that the lines then appear "squashed" almost flat. I saw
this with my own site data, as well as with your demo.

Show the uptime percentage by component

Feature suggestion from @GustavoEsser:

Show the uptime percentage of components. Maybe the view like the screenshot below looks good.

API to create and update an incident

From the J4cku's comment on reddit:

[–]J4cku

Hi, one question, does it have API so that my internal monitoring
can push info when one of my components go down and remove
it when it's up again?

[–]kyagami[S]

Thank you for asking! So far, there is no API to post an incident
and change components' status. However, it seems some users
need such an API and I'm going to implement it within one month.

I think the API will (partially?) solve the issue #41 and so it will help some users.

Email notification to users

When incidents are created/updated, send the notification to users.

Set CloudFront in front of API gateway

So far the clients directly access the API gateway. It works, but the API gateway and corresponding lambda functions often do same computations again and again, which generate useless costs. If the API is public which can be safely shared among users, the responses should be cached.

The API gateway implements the cache mechanism but it's costly. So I think to set the CloudFront in front of API gateway and use the cache mechanism of CloudFront is the most cost effective.

Let the external-metrics API support the pagination.

So far the GET external-metrics API returns all the metrics. If someone has the large number of metrics, it may return timeout errors.

This issue lets the external-metrics API support the pagination.
(And change the frontend's action so that all the metrics are fetched at last)

Publish rss feeds to tell the incidents

It is important to send incident notifications, and we already have some issues to handle this (#17, #18). As the 1st step, I'm going to support rss feeds to tell the incidents.

Git ignore .env

Folks will try to clone this repo and they will change .env file. The change can't be commited and it will make it hard for them to keep their repo in sync with this one.

What if we gitignore .env and add a .env-example for reference?

Update of CF UserEmail param after stack create

I noticed that I had put the wrong domain for my email address when creating the email address :(
Thats my bad, but I was wondering if it might be possible that a stack upgrade with the correct email address might send the email out to me as I need instead of re-creating the stack?

Its minor, but a consideration..

SSL Certificate reject if not in US-east region

The rounded value in the tooltip causes the problem

From @blw9u2012's comment on gitter:

Brandon Walton @blw9u2012 06:02
@ks888 i'm having issues with displaying metrics from our target groups. It displays the correct graph but the Y axis is only showing a range between 0 and 1. Am i goofing somewhere?
i want to display the Y axis in milliseconds
i've added a metric from the AWS/ApplicationELB namespace and added the TargetResponseTime metric name and dimension for one of our load balancers

Kishin Yagami @ks888 22:35
@blw9u2012 Thank you for asking. The data points are averaged over 5 minutes when the 'Day' time frame is selected. Perhaps the maximum value of your data points is higher than 1.0, but the maximum value of the AVERAGED values is lower than 1.0.
Or, the rounded value in the tooltip may be the problem. Even if the actual value is 0.01, the value in the tooltip is rounded and will be 0. Such a behavior seems incovenient in your case, so it should be fixed.

Brandon Walton @blw9u2012 23:09
@ks888 thanks for responding! I also looked into the metrics and the unit that the metric TargetResponseTime returns is in seconds vs the latency metric in the demo site being milliseconds. I don't think that you can specify another unit to be returned other than seconds. I thought that this possibly could be related but it sounds like the latter issue you described. In any event thanks for responding!

Change the background color of the status page

The suggestion from VTHokie2015 at reddit:

[–]VTHokie2015 2 points 10 hours ago 
I think you could make the background a light grey and make the cards have white background
to add more distinction

[–]kyagami 1 point 2 minutes ago 
Wow, I tried your idea and it surely improved the page design! I will change the background color
or maybe enable a user to change that color in the settings page! Thank you!

When the color is #FAFAFA, the status page looks like this:

My concern is the conflict with the background of the logo image, though LambStatus does not support the feature to change the logo image of the status page header for now. So maybe it's better to enable a user to specify the background color in the settings page.

Support user authentication

So far admin page is not protected by user authentication. Any person who knows the URL of an admin page can change service status. To stop this, support user authentication (maybe using Amazon Cognito User Pools).

At least these functions are necessary:

The admin user can invite a new user (OK to use AWS Console)
The invited user can do an initial setup.
The user can sign in/out.
Save a user who forgets the password.
Protect API Gateway so that only authenticated users can call its APIs.
Create Cognito User Pools using CluodFormation (Note CloudFormation does not have Cognito resource)

LambStatus API to post the metric data

From @ccannell's comment on gitter:

Christopher Cannell @ccannell 05:33
I've deployed LambStatus and now I'm a bit lost on the next step. I'd like to check an HTTP server status regularly. What is the best way to approach that? Is there documentation on the LambStatus API? Should I run a periodic lambda to poll my HTTP server and then post to LambStatus API on changes?

Kishin Yagami @ks888 10:42
Thank you for asking. Since LambStatus has few features for monitoring the service, such as alarm, it's better to post your server's status to your monitoring service at first. Then, integrate the monitoring service with LambStatus. If your monitoring service is CloudWatch, the integration is easy. If not, it is harder because there is no LambStatus API to post the data.
LambStatus API to post the data sounds nice. Maybe it's better to support it.

Ability to add notes with at a specific time rather than just the current time.

As a Status Admin I want the ability to inject notes at a specific time into the timeline so that when I get additional information about an event that occurred in the past I can add when that event occurred with an accurate timestamp.

Use case: I am tracking an Incident on a web page outage. I find out that DNS was changed at 4:10 PM and I was not notified until 4:30 PM. I want to add an event at a specific time (4:10 PM) it occurred 4:10 rather than when I had a chance to post the event. 4:30 PM

separate Lambda function that actually does the checking (I would make it modular as there will be a multitude of ways that users want their systems checked)
display/import stats from influxdb/grafana
embed a twitter feed of selected user

If I get some time I might throw together a PR or 2 with these implemented

ks888 / lambstatus Goto Github PK

lambstatus's People

Contributors

Stargazers

Watchers

Forkers

lambstatus's Issues

Recommend Projects

Recommend Topics

Recommend Org