Giter Site home page Giter Site logo

siderolabs / talos Goto Github PK

View Code? Open in Web Editor NEW
5.3K 68.0 442.0 73.29 MB

Talos Linux is a modern Linux distribution built for Kubernetes.

Home Page: https://www.talos.dev

License: Mozilla Public License 2.0

Shell 1.15% Go 92.27% Dockerfile 0.66% Makefile 0.39% Jsonnet 0.67% JavaScript 1.28% CSS 0.09% Jinja 1.55% SCSS 0.63% HTML 1.23% JSONiq 0.08% jq 0.01%
linux linux-distribution kubernetes kubernetes-distribution go musl grpc cloud-native containerd

talos's Introduction

Talos Linux

A modern OS for Kubernetes.

Release Pre-release


Talos is a modern OS for running Kubernetes: secure, immutable, and minimal. Talos is fully open source, production-ready, and supported by the people at Sidero Labs All system management is done via an API - there is no shell or interactive console. Benefits include:

  • Security: Talos reduces your attack surface: It's minimal, hardened, and immutable. All API access is secured with mutual TLS (mTLS) authentication.
  • Predictability: Talos eliminates configuration drift, reduces unknown factors by employing immutable infrastructure ideology, and delivers atomic updates.
  • Evolvability: Talos simplifies your architecture, increases your agility, and always delivers current stable Kubernetes and Linux versions.

Documentation

For instructions on deploying and managing Talos, see the Documentation.

Community

If you're interested in this project and would like to help in engineering efforts or have general usage questions, we are happy to have you! We hold a weekly meeting that all audiences are welcome to attend.

We would appreciate your feedback so that we can make Talos even better! To do so, you can take our survey.

Office Hours

You can subscribe to this meeting by joining the community forum above.

Note: You can convert the meeting hours to your local time.

Contributing

Contributions are welcomed and appreciated! See Contributing for our guidelines.

License

GitHub

Some software we distribute is under the General Public License family of licenses or other licenses that require we provide you with the source code. If you would like a copy of the source code for this software, please contact us via email: info at SideroLabs.com.

talos's People

Contributors

aleksi avatar andrewrynhard avatar bradbeam avatar budimanjojo avatar dependabot[bot] avatar dmitriymv avatar dsseng avatar eirikaskheim avatar flokli avatar frezbo avatar jonkerj avatar kvaps avatar nberlee avatar oscr avatar patatman avatar rgl avatar ro11net avatar rsmitty avatar salkin avatar sergelogvinov avatar smira avatar steverfrancis avatar tgerla avatar timjones avatar twelho avatar uhthomas avatar ulexus avatar unix4ever avatar utkuozdemir avatar yoctozepto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

talos's Issues

feat: a CLI for interacting with the API

A CLI would be useful for things like:

  • Process logs
  • Process status
  • Retrieving the kubeconfig

and other features as this project grows. It should probably be gRPC based after #19 is implemented.

fix: install conntrack

The following warning is showing up in the kubelet logs. We should install conntrack.

W0322 04:14:51.391333    1339 hostport_manager.go:68] The binary conntrack is not installed, this can 
cause failures in network connection cleanup.

refactor: pull out the gRPC server into a dedicated service

The init logic needs to be robust and as simple as possible. Currently, the gRPC service and init are one and the same. We should pull out the gRPC server code into a dedicated service. This would ensure that if something goes bad within the gRPC service, that we don't take out the whole node with it in the case of a bug that could trigger a kernel panic. I propose we call it osd to compliment osctl.

feat: generate /etc/os-release

Instead of generating /etc/os-release at build time, generate it at runtime. We set a version variable at build time.

fix: CRI-O errors

Seeing the following in the CRI-O logs:

time="2018-03-22 04:26:51.229159744Z" level=warning msg="hooks path: "/usr/share/containers/oci/hooks.d" does not exist"
time="2018-03-22 04:26:51.229230416Z" level=warning msg="hooks path: "/etc/containers/oci/hooks.d" does not exist"
time="2018-03-22 04:26:51.229386000Z" level=error msg="error updating cni config: Missing CNI default network"
time="2018-03-22 04:26:51.229418768Z" level=error msg="Missing CNI default network"
ERROR: logging before flag.Parse: W0322 04:26:51.312486    1291 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.
time="2018-03-22 04:26:51.330119568Z" level=error msg="watcher.Add("/usr/share/containers/oci/hooks.d") failed: no such file or directory"

We should address all of these.

feat: detailed process information

We should provide info from /proc/<pid>/stat (and perhaps other places in procfs) that would give users detailed info on the running processes.

fix: hierarchical accounting and reclaim

Since #59 was merged, the following shows up in the kubelet logs:

E0502 13:37:38.488869    1386 remote_runtime.go:209] StartContainer "c9d4b9a2cf132d74f3d3d5ca3b36763f88fab8e7dcaf8d37bfbb84ba6cc7371e" from runtime service failed: rpc error: code = Unknown desc = failed to start container "c9d4b9a2cf132d74f3d3d5ca3b36763f88fab8e7dcaf8d37bfbb84ba6cc7371e": Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"failed to set memory.kmem.limit_in_bytes, because either tasks have already joined this cgroup or it has children\\\"\"\n"

This breaks kubeadm init.

See kubernetes/kubernetes#61937.

feat: user data should use raw kubeadm config

Currently we use text/template to render MasterConfiguration and NodeConfiguration YAML files that are then fed into kubeadm. This is suboptimal since we are duplicating work done in kubeadm. We should instead support specifying the files in their raw format:

kubernetes:
  kubeadm: |
    kind: NodeConfiguration
    apiVersion: kubeadm.k8s.io/v1alpha1
    token: abcd.1234567898765432

fix: port-forward socat getaddrinfo: Name or service not known

Terminal 1

kubectl port-forward etcd-master-0 2379:2379 -n kube-system
Forwarding from 127.0.0.1:2379 -> 2379

Terminal 2

curl localhost:2379/v2
curl: (52) Empty reply from server

Terminal 1

kubectl port-forward etcd-master-0 2379:2379 -n kube-system
Forwarding from 127.0.0.1:2379 -> 2379
Handling connection for 2379
E0402 21:26:31.026140   78292 portforward.go:331] an error occurred forwarding 2379 -> 2379: error forwarding port 2379 to pod 67354788607ba8b9caad4f69c7a4256581b46655281b6fd182ec0fd21ef0591c, uid : exit status 1: 2018/04/03 04:26:29 socat[11867] E getaddrinfo("localhost", "NULL", {1,2,1,6}, {}): Name or service not known

fix: missing kernel modules

I0705 03:19:00.863140    1170 kernel_validator.go:96] Validating kernel config
	[WARNING Service-Kubelet]: no supported init system detected, skipping checking for services
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

feat: HA cluster

The tasks outlined in the official HA guide can be automated. I think Option 2 is the best route to go, at least to start with. We can always change this later, or even add support for both. So that leaves us with having to:

  1. Copy over the PKI directory from master0
  2. Create the load balancer
  3. Run kubeadm init
  4. Configure kube-proxy

For 1, we can add an RPC call. We can use either a push or pull model. In the pull model, master nodes, excluding master0, would request the PKI directory upon initialization. This would require us to verify that requesters of the PKI directory are authorized to do so. The alternative is to push the PKI directory to other master nodes when they come up. This would require that a well known set of machines will participate as master nodes. I think the push model is better, as we can tie it in to the notion of a RoT we have established already.

For 2, we can deploy an ingress controller that runs only on the master nodes. This model offers a generic solution that can handle cloud and bare-metal deployments. The only issue then becomes the management of the DNS record pointing to the master nodes. How should we add/remove healthy/unhealthy nodes to/from the list?

For 3, this ties in to 2. Whatever we decide in 2, we would need to be sure to add the node to the load balancer.

For 4, we can use the admin.conf on master0 to update kube-proxy.

Leveled logging

Using either the standard library, or something like logrus, we should provide more robust logging.

feat: self-managed user data

Creating images is relatively simple now due to #114, and opens up the idea of encouraging users to build images with sensitive files baked in. The user data file is one example. Right now, the cluster depends on an external source of truth depending on the platform that the cluster is running on.

For example, in AWS, the user data is sourced from http://169.254.169.254/latest/user-data. The problem with this is that any pod in the cluster can reach this endpoint, which is a huge security risk considering what is in it.

In bare metal, there is the added burden of maintaining an http server for the purpose of serving the user data. In this example, you could secure the endpoint, but you still have to get the credentials onto the nodes somehow.

This proposal leverages the existing RoT features we have by further extending the responsibility of such a node. Nodes that serve as roots of trust are trusted by their very definition and therefore makes a perfect candidate for hosting a user data service on.

The workflow would be something like the following:

  1. A user builds master nodes with baked in configs for master and worker nodes, including the RoT credentials.
  2. A user builds worker nodes with baked in configs, including the RoT credentials.
  3. The master nodes start a service that uses the RoT credentials to securely serve the user data files stored on them.
  4. A worker node uses the baked in RoT credentials to make a request to a master node for the user data.

This method simplifies things operationally, and improves security in the process. It does move some of the complexity into the image building requirement, but the security and day-to-day operational value it brings, outweighs the complexity it adds.

feat: join a node to an existing cluster

There are a couple of ways we can implement this:

  • Pass in a token via user data with a TTL set to never expire.
  • Implement a kubeadm token management interface that integrates with the various tools we anticipate users will use to manage a cluster.

Option one has security risks and does not align with our goals.

EDIT:
Option one was chosen, and option two will be implemented in time.

fix: processes are blocked by io.Pipe

The way in which we are handling stdout and stderr in the process manager is blocking the process itself. The problem is that io.Pipe blocks until something reads and writes from it. This means that process are blocked until a user accesses the process logs via the API.

fix: setting up CRI-O network breaks Flannel

The network config causes pods to come up with an IP from the 10.88.0.0/16 CIDR block. This breaks DNS resolution in pods.

I haven't been able to find much documentation around it, but it seems we should not create a network configuration and leave it up to the network plugin to handle that.

fix: open /proc/self/fd: no such file or directory

The CRI-O logs have a lot of logs like the following:

time="2018-04-03 04:07:18.909401344Z" level=info msg="Received container exit code: -1, message: exec failed: container_linux.go:348: starting container process caused "open /proc/self/fd: no such file or directory"

feat: automated CSR workflow

Since #64 was merged, we now enforce security without the possibility of opting out. CSRs need to be automated for nodes as they come up. One way to achieve this would be to define a CustomResourceDefinition (or more than one) along with a controller that would automate the CSR workflow.

feat: use gRPC and protocol buffers

Instead of a RESTful API, I would prefer we use gRPC for a number of reasons. Using protocol buffers, we can generate the client and server API that can be used by init and a CLI.

Verify user data

The user data fields should be verified. Clear and early errors in the initramfs relating to the user data would provide a better development and user experience.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.