etcd-io / discovery.etcd.io Goto Github PK

Kubernetes manifests powering discovery.etcd.io

License: Apache License 2.0

HCL 72.21% Shell 7.67% Mustache 20.13%

discovery.etcd.io's Issues

discovery.etcd.io not resolving

$ dig discovery.etcd.io

; <<>> DiG 9.10.3-P4-Debian <<>> discovery.etcd.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 39690
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;discovery.etcd.io.             IN      A

;; AUTHORITY SECTION:
etcd.io.                1748    IN      SOA     dns1.p06.nsone.net. hostmaster.nsone.net. 1559080105 43200 7200 1209600 3600

;; Query time: 41 msec
;; SERVER: 100.115.92.193#53(100.115.92.193)
;; WHEN: Tue May 28 15:45:42 PDT 2019
;; MSG SIZE  rcvd: 111

discovery.etcd.io expired cert 2024-05-10

Thanks for maintaining this service.

If this is not already known, https://discovery.etcd.io/ has an expire cert at the moment

$ openssl s_client -servername discovery.etcd.io -connect 35.225.64.149:443 2>/dev/null | openssl x509 -noout -dates
notBefore=Feb 10 12:43:27 2024 GMT
notAfter=May 10 12:43:26 2024 GMT

SSL certificate expired 2022-02-20

discovery.etcd.io service is not accessible for creating new etcd clusters as the domain certificate has expired.

expire date: Feb 20 09:35:52 2022 GMT

curl, verbose output showing (and ignoring) cert errors.

$ curl -vk  https://discovery.etcd.io/new
*   Trying 35.225.64.149...
* TCP_NODELAY set
* Connected to discovery.etcd.io (35.225.64.149) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=discovery.etcd.io
*  start date: Nov 22 09:35:53 2021 GMT
*  expire date: Feb 20 09:35:52 2022 GMT
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify result: certificate has expired (10), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x55ef5cc77620)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> GET /new HTTP/2
> Host: discovery.etcd.io
> User-Agent: curl/7.58.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
< HTTP/2 200 
< date: Sun, 20 Feb 2022 20:55:57 GMT
< content-type: text/plain; charset=utf-8
< content-length: 58
< strict-transport-security: max-age=15724800; includeSubDomains
< 
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* Connection #0 to host discovery.etcd.io left intact
https://discovery.etcd.io/ad9d3e1202fcef9afe39a62610d08509

OpenSSL output.

$ openssl s_client -connect discovery.etcd.io:443 -showcerts
CONNECTED(00000005)
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = discovery.etcd.io
verify error:num=10:certificate has expired
notAfter=Feb 20 09:35:52 2022 GMT
verify return:1
depth=0 CN = discovery.etcd.io
notAfter=Feb 20 09:35:52 2022 GMT
verify return:1
---
<removed>
---
SSL handshake has read 4603 bytes and written 399 bytes
Verification error: certificate has expired
---
<removed>

Default TTL of token?

@philips @idvoretskyi Could you please let me know the expiry time of a token? Since I'm trying to document it in OpenStack Magnum so that we can rely on this service with much confidence. Thank you.

Implement Cloud Build for CI/CD

Once etcd-io/etcd#10627 is merged and #9 is finished, we should implement Cloud Build to automatically build and deploy discoveryserver to the dev environment.

By invalidating all old etcd discovery tokens you have broken my workflow.

etcd 3.3.9

In response to coreos/discovery.etcd.io#64 (comment)

Previous workflow.

Create discovery url, create 5 node etcd cluster in asg.

When I want to roll over one of those nodes I do the following:

Delete old node (shut down etcd first)
on current cluster member remove old member
one current cluster member add new member
on replacment node create a dropin systemd unit without discovery and with current members and self, daemon-reload, restart etcd
confirm success, remove restore droping, daemon-reload

That doesn't work anymore on clusters with 'old' discovery tokens.

instead the new node, with the old discovery token starts up and then crashes:

May 16 16:08:20 ip-172-27-187-218 etcd-wrapper[1170]: 2019-05-16 16:08:20.689784 E | etcdmain: failed to join discovery cluster (discovery: bad discovery endpoint)
May 16 16:08:20 ip-172-27-187-218 etcd-wrapper[1170]: 2019-05-16 16:08:20.689816 I | etcdmain: discovery token https://discovery.etcd.io/9a63c50f66e2803fb5ad005643cb7e60 was used, but failed to bootstrap the cluster.
May 16 16:08:20 ip-172-27-187-218 etcd-wrapper[1170]: 2019-05-16 16:08:20.689822 I | etcdmain: please generate a new discovery token and try to bootstrap again.

discovery.etcd.io is down (for maintenance)

The services is unavailable since: https://grafana.prod.discovery.etcd.io/d/uiLwPyPWk/discoveryserver?orgId=1&from=1640079655667&to=1640092338476&var-instance=10.128.0.37:9100

$ curl https://discovery.etcd.io
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.17.7</center>
</body>
</html>

Access for sig-k8s-infra-leads to `etcd-io-dev` and `etcd-io` gcp projects

Following the creation of sig-etcd and in advance of upcoming pricing changes for google workspaces the etcd project are aiming to move existing gcp projects we own under the kubernetes shared GCP org to reduce costs and improve the oversight and management of these projects.

Refer:

The initial stage of this requires us to add [email protected] to the gcp projects with the Owner role so they can assess resource usage and confirm we can proceed absorbing these under the kubernetes org.

This has been completed for etcd-development however as per the terraform modules in this repository we also have projects etcd-io-dev and etcd-io which we don't seem to have access to in order to be able to grant access.

@victortrac can you please confirm if these projects are still in use? If so can you please grant access as described above?

Any questions please feel free to ping me on kubernetes slack 🙏

CNCF Handoff

Add CNCF on-call to stackdriver discovery.etcd.io/health alert
Ensure CNCF has access to the GKE cluster and stackdriver
Answer any questions about the architecture
Create an SLO on upgrades

Backport VPC and GKE to terraform

Currently the GCP environment that runs discovery.etcd.io is manually built. It'd be nice to turn it into infrastructure-as-code so that it becomes reproducible and have a change audit-log.

cncf: need s3 bucket and credentials

I need the CNCF to provide an S3 bucket and credentials for the etcd backups.

Certificate issue w/ discovery.etcd.io

Hi,

Since September 30th, we've got some certificate issue while starting our kubernetes infrastructure [1]. curl command return a valid certificate [2] but openssl s_client a unvalid one [3].

[1]

2021-10-05 10:47:47.258 1756 ERROR magnum.drivers.heat.template_def [req-d7d8ccc7-4fb6-46bb-b19a-c5d0850456c5 - - - - -] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618): SSLError:
 [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618)
2021-10-05 10:47:47.304 1756 ERROR oslo_messaging.rpc.server [req-d7d8ccc7-4fb6-46bb-b19a-c5d0850456c5 - - - - -] Exception during message handling: GetDiscoveryUrlFailed: Failed to get discovery url fro
m 'https://discovery.etcd.io/new?size=1'.

[2]

$ curl -v 'https://discovery.etcd.io/new?size=1'
[...]
* Server certificate:
*  subject: CN=discovery.etcd.io
*  start date: Sep 23 10:35:28 2021 GMT
*  expire date: Dec 22 10:35:27 2021 GMT
*  subjectAltName: host "discovery.etcd.io" matched cert's "discovery.etcd.io"
*  issuer: C=US; O=Let's Encrypt; CN=R3
[...]

[3]

$ openssl s_client -showcerts -connect discovery.etcd.io:443 -servername discovery.etcd.io
CONNECTED(00000005)
depth=1 O = Digital Signature Trust Co., CN = DST Root CA X3
verify error:num=10:certificate has expired
notAfter=Sep 30 14:01:15 2021 GMT
verify return:0
depth=1 O = Digital Signature Trust Co., CN = DST Root CA X3
verify error:num=10:certificate has expired
notAfter=Sep 30 14:01:15 2021 GMT
verify return:0
depth=3 O = Digital Signature Trust Co., CN = DST Root CA X3
verify error:num=10:certificate has expired
notAfter=Sep 30 14:01:15 2021 GMT
verify return:0
---

production burn down

This is a list of things that need to happen to ensure long term production stability:

Rollout branch with token garbage collection
Hook-up https://discovery.etcd.io/health to pingdom, etc
Configure and deploy S3 backups (xref #1)
Automatic building of container

Emit etcd and discovery service metrics to prometheus & build dashboards with grafana

It'd be nice to have some metrics on system usage and performance:

cpu/memory/network of etcd and discovery pods
etcd key statistics
response times

investigate: discovery service flapping alerts

Starting in the evening of 2019-04-26 the discovery.etcd.io/health endpoint was flapping according to stackdriver. Investigate.

Implement k8s cluster backups

It'd be nice to install and configure velero on the k8s cluster to automatically backup workload data and cluster configuration into Google Cloud Storage.

Create a dev/staging environment

There's only a single environment for discovery.etcd.io, which means that there's not an environment to test upgrades to GKE, etcd, or the discovery service itself. Once #8 is done, we should use that template to generate a pre-prod environment to enable testing of changes to the service.

https://discovery.etcd.io/new?size=3 gives back http url

At some point recently, it appears discovery.etcd.io has started giving back http urls instead of https urls when requesting a new token url for discovery. The http url gives a 308 back to https when used so I'm guessing this was not intentional.

Upgrade GKE to latest stable version

It'd be nice to upgrade the GKE cluster to the latest stable version of k8s to take advantage of new GKE features like VPC aliasing, workload identity, regional masters, private API endpoint, and to get the latest security updates.

etcd-io / discovery.etcd.io Goto Github PK

discovery.etcd.io's Issues

curl, verbose output showing (and ignoring) cert errors.

OpenSSL output.

Recommend Projects

Recommend Topics

Recommend Org