Giter Site home page Giter Site logo

playfab / thundernetes Goto Github PK

View Code? Open in Web Editor NEW
301.0 15.0 45.0 5.77 MB

Thundernetes makes it easy to run your game servers on Kubernetes

Home Page: https://playfab.github.io/thundernetes

License: Apache License 2.0

Dockerfile 0.51% Makefile 1.50% Go 96.96% Shell 0.82% PowerShell 0.21%
game servers multiplayer kubernetes kubernetes-controller gameservers game-development hacktoberfest

thundernetes's Introduction

e2e unit-tests Software License GitHub release CodeQL

Thundernetes

Thundernetes makes it easy to run your game servers on Kubernetes.

ℹ️ Description

Thundernetes is a project originating from the Azure PlayFab Multiplayer Servers team and other teams in Azure/XBOX that enables you to run both Windows and Linux game servers on your Kubernetes cluster. Thundernetes can be useful in the following scenarios:

  • host your game servers on a Kubernetes cluster, either on a public cloud provider or on-premise and allow your users to connect from everywhere
  • pre-warm game servers so that they are ready to accept players within seconds, when the game is about to start
  • as part of your iterative development process, you can use Thundernetes locally to test your game server code

Thundernetes offers:

📚 Documentation

Check 🔥our website🔥 for more information.

📦 Video presentation

Check out our video presentation for GDC 2022!

What is Project Thundernetes? How Kubernetes Helps Games Scale

💬❓Feedback - Community

As mentioned, Thundernetes is in beta stage. If you find a bug or have a feature request, please file an issue here and we will try to get back to you as soon as possible. You can also reach us directly on Game Dev server on Discord.

thundernetes's People

Contributors

abbasahmed avatar allenlsy avatar amiedd avatar andressaldanaaguilar avatar dependabot[bot] avatar dgkanatsios avatar dsmith111 avatar emmayspark avatar javier-op avatar kalonj-msft avatar khaines avatar nottagg avatar onlyralphie avatar pamir avatar rnjohn avatar shrayrastogi avatar snobu avatar vachillo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

thundernetes's Issues

0.1.0

This issue tracks what is needed to be done for the 0.1.0 release:

  • troubleshooting guide #45
  • prerequisites for thundernetes #46
  • ConnectedPlayers tracking #35
  • add in FAQ that the deallocate/shutdown method is actually kubectl delete gs myGameServerName (this can be part of #45)
  • structured log output #36
  • complete #59 FAQ on PodAffinity
  • complete #42 Monitoring and Grafana updates
  • and, last but not least, upgrade the container images to 0.1.0

Add custom Labels and Annotations to the game server Pods

Currently there is no way for the user to add custom Labels or Annotations when they are creating the GameServer Pod in the GameServerBuild YAML document. This happens because the GameServerBuild definition has PodSpec instead of PodTemplateSpec. We should

  • replace PodSpec with PodTemplateSpec in both the GameServerBuild and the GameServer CRDs
  • modify the code so that when the game server Pod is created, existing Labels and Annotations from the PodTemplateSpec are copied to the new Pod
  • modify all our samples to have PodTemplateSpec instead of PodSpec

The last one would be a breaking change for all our sample files, so recommendation is to do it just before the release of 0.2.0.

Write unit tests for initcontainer

We should write unit tests for initcontainer and have them run on the CI pipeline as well. We should be careful to cleanup after the tests finish, since the initcontainer creates a GsdkConfig.json file. We should also add that file to a .gitignore in the folder.

Add CHANGELOG

We should automate the creation of a CHANGELOG for thundernetes.

Populate public IP address on initcontainer

Currently we're not populating the public IP address field on the initcontainer, so the GSDK (and thus, the game server) doesn't have access to this information.

PublicIpV4Address: "N/A",

The main reason is that during the time we create the env variables for the init container, we don't know in which Node the Pod will land. Thus, we can't know the External IP of the Node. Also, it doesn't look that there is a way of getting the Node's External IP through the downward API.

We have provided a hacky workaround here but it would be nice if we found a way to populate the field.

validate GameServerBuild objects

Currently, the only validation we're doing on GameServerBuild CRD instance is via OpenAPI v3 schema. However, this provides only some naive validation and we should work on validating more things like:

  • standingBy number is <= max
  • when creating a new GameServerBuild, there is NOT an existing GameServerBuild with a different name but the same BuildID. We can check for existing GameServerBuilds with the same BuildID using the metadata-only client as described here
  • validate that hostPort is empty on all containers on the GameServerBuild for ports that are in the portsToExpose array
  • validate that name is included for ports that are on the portsToExpose array
  • validate that all ports on the portsToExpose array exist in the pod container spec

We could use validation webhooks.

Windows container support

Investigate what would be needed for thundernetes to work with Windows game servers (containers).

Stress test

We should run a stress test on thundernetes and document our findings.

Expose Game Servers via a cloud Load Balancer

In cloud environments, it might be unrealistic to have a Public IP per Node. For these scenarios, we should examine using a vendor-specific Load Balancer with a Public IP that would act as a reverse proxy-gateway for the game server ports. Rough design includes an additional controller created as a separate deployment that would:

  • set up a Kubernetes watch to all GameServer CR instances
  • every time there is a new one, it would create a mapping to the LB
  • every time one is deleted, it would delete the mapping

On Azure specifically, we'd need to have this controller having an Azure identity (either with a service principal or preferably with a managed identity) and do ARM operations to the LB.

Make sidecar update the GameServer CR before StandingBy

Currently, the sidecar updates the GameServer.Status when

  • GameServer health has changed
  • GameServer has transitioned to StandingBy

There are no updates when the GameServer state is set to Initializing. We should find a way to do that.

thundernetes allocator

Azure PlayFab Multiplayer Servers service has a sample which facilitates easy allocation of game servers for test/dev purposes. We could make a similar one for thundernetes.

Action when the GameServer is Unhealthy

A GameServer instance can turn Unhealthy on two occasions.

  • The game server process can set itself in Unhealthy from inside GSDK
  • The NodeAgent can detect an absence in heartbeats.

Currently we don't do anything when this happens. We should think if we can do something (like log on the GameServer controller and delete the GameServer instance?).

Use watches for the sidecars

Currently sidecar is notified for a change in the allocation state by having its HTTP server accessible in the cluster. We should change to use a Kubernetes watch to prevent security issues. Proposed workflow:

  1. sidecar starts
  2. sidecar creates a Watch on the Kubernetes API server for the particular CRD instance
  3. when the sidecar/GameServer gets allocated (transitions to Active), we kill the Watch to decrease pressure on the Kubernetes API server.

Documentation - Create Troubleshooting guide for thundernetes

Is your feature request related to a problem? Please describe.
A common troubleshooting guide would be helpful for people using thundernetes for common scenarios for specific information pertaining to thundernetes.

Describe the solution you'd like
Add a Troubleshooting guide section for thundernetes.

Describe alternatives you've considered
N/A

Additional context
N/a

Structured Log output

Currently output to console is primarily done with fmt.Printf calls or error/panic output. Changing this approach to instead output structured log entries may improve readability of outputs for people, but especially for machines if this data is input into a log processor.

https://github.com/Sirupsen/logrus , is a very popular structured log library for golang and can be easily brought in to replace the current output calls.

Refactor sidecar

#31 introduced some changes to the sidecar, in the way that the sidecar finds out about the GameServer state change. As a result of that, we'll need to refactor the httphandler.go file.

  • Maybe split it into two structs with different responsibility?
  • rename the file/struct?

Large game server container images

Some game server container images can be pretty large, i.e. several GB. This can result in container images being slow to download and extract. Thus, Pods that reference large containers on AKS will take some time to download and start. Azure Container Registry has introduced project Teleport which is coming soon to AKS in public preview and will greatly decrease time needed to start Pod off a large container image.
We should

  • document Teleport once it becomes available for AKS
  • document alternative ways, e.g.
    • having the game server container image containing only the executable binaries and have the rest of the files located on an Azure File Share and mounted on each Pod
    • having game server asset files copied from a central location to every Node and mounted by each GameServer Pod. We could do this with a DaemonSet but we'd need a way to signal that the Node is unschedulable as it downloads and extracts the files (so we don't schedule GameServer Pods before the files are there).

DaemonSet - see if we can make it more secure

Currently the DaemonSet operates on HostPort 56001 on the Node. Whereas there is a Network Security Group that protects the cluster from outside access (only 10000-12000 range is allowed), we should examine if we can do something to disable communication between the DaemonSet Pod on a Node and Pods on different Nodes.

Add 2 new prerequisite and troubleshooting docs to README.md

Is your feature request related to a problem? Please describe.

  • Add prerequisite document to README.md #46
  • Add troubleshooting document to README.md #45

Describe the solution you'd like
Update the main README.md document to include the new prerequisite and troubleshooting document guides

Describe alternatives you've considered
N/A

Additional context
N/A #

Replace sidecar with DaemonSet

Currently thundernetes uses a sidecar to

  • handle GSDK heartbeats
  • update the GameServer/GameServerDetail CRs with the updated status of the game server process
  • create a Watch towards the K8s API server to get a notification when the game server has been allocated

The sidecar is dynamically created by thundernetes when a new game server Pod is created

We will change this implementation to use a DaemonSet instead. The DaemonSet process will listen to heartbeats coming from the same Node (game server Pods scheduled on the same Node as the corresponding DaemonSet Pod) and will create a Watch only for the GameServer CRs which have Pods on that Node. The benefits to using a DaemonSet over a sidecar are:

  • more lightweight approach on the GameServer Pod, since we're going to have one less container (faster to start, smaller consumption of resources)
  • less pressure on the K8s API server since we will have less Watches (just one for each Node)
  • ability to support hostNetwork by the GameServer Pods (#22)
  • easier way to grab prometheus metrics for connected players (#28)

Version 1.20.5 Kubernetes is not supported in WestUS2.

Describe the bug
Create an Azure Kubernetes Service cluster with a Public IP per Node documentation example .

To Reproduce
Steps to reproduce the behavior:

  1. Run az aks create --resource-group thundernetesrg --name thundernetes --ssh-key-value ~/.ssh/id_rsa.pub --kubernetes-version 1.20.5 --enable-node-public-ip
  2. See error
  3. run az aks get-versions --location westus2' to view supports AKS version in westus2 region

Expected behavior
Create a AKS cluster following the documentation using Kubernetes version 1.20.5 for region westus2.

Screenshots
image
image

Desktop (please complete the following information):

  • OS: Windows Command Prompt and Azure CLI

Additional context
Add any other context about the problem here.

Remove the testing branding

We should remove the "thundernetes is only for testing MPS branding" but at the same time keep the message that thundernetes is not meant to be used in production. We should also mention that support is provided only by opening an issue on this repository.

Allocation re-architecture

Thundernetes allocation service currently works by finding a random StandingBy server and transitioning it to Active. We should consider the pros/cons of this approach and decide on alternative allocation methods, taking into account the environment that thundernetes is running (cloud provider/on-prem).

Moreover, the use of .Update to transition the game server to Active will return errors in case the GameServer has been deleted or crashed and the cache has not been updated. We should retry ourselves (at least once) in the allocation method.

Better track ConnectedPlayers

Currently, when a user allocates a GameServer, they pass a string array of Player IDs. This is persisted in the InitialPlayers string slice on the GameServer.Status field and passed in the GameServer via GSDK Heartbeat response. However, GameServer can call UpdateConnectedPlayers on the GSDK to update the Connected Players list. Currently, the sidecar gets informed of the change but does nothing. It would be helpful if the sidecar tracked this change and persisted it somewhere, since on some GameServers you may have users jumping in and out during the game session. Some options to solve this problem:

  • rename the InitialPlayers on the GameServerStatus to ConnectedPlayers and keep everything there. This list can be big sometimes though (we've seen games with 64 players). I don't think we'll have storage issues (I doubt that the CRD instance on etcd will grow more than 1.5 MB - link) but I'm not sure about perf issues on transferring CRD instance data from the K8s API server to the controller and vice versa.
  • keep the InitialPlayers and store subsequent players on a ConfigMap, unique per GameServer. GameServer has a finalizer so we could do the cleanup of the ConfigMap there, when the GameServer gets deleted.
  • remove the InitialPlayers on the CRD and just keep a ConnectedPlayers string slice on a ConfigMap. This will be a breaking change but we consider it acceptable at this stage of the project.

Thoughts @khaines?

GameServers visualization

It would be cool if we could visualize the GameServers and their state on the VMs in our cluster. Think of it as a web page that displays boxes (VMs) and each one of them has colored rectangles inside it, with each color representing standingBy or Active state.

Since we cannot use Kubernetes APIs from a browser (Kubernetes JS client works only with Node) we could use kubectl proxy and then talk to Kubernetes API. One potential implementation (thanks to Darin) could be to use .NET Core web server that talks to Kubernetes API and sends the results back to the browser via SignalR.

Pod scheduling

Thundernetes currently doesn't do anything special on Pod scaling, instead it relies on default Kubernetes scheduler to schedule the Pods on the Nodes. When hosting thundernetes on a cloud provider, one would want the Pods to be distributed into as few Nodes as possible, to save on costs. One way to do this is using Pod Affinity.
We should improve the documentation here and potentially include a sample.

Documentation - Create Prerequisite guide for thundernetes

Is your feature request related to a problem? Please describe.
N/A

Describe the solution you'd like
Create a Prerequisite documentation guide for knowledge, tools, and anything pertaining to getting thundernetes up and running.

Describe alternatives you've considered
N/A

Additional context
N/A

Consider Host Networking mode for pods

Currently the game server pods are using host port mapping to accept communication. This process incurs performance overhead from the NAT translations from the host network to the container network. Switching the game server pods to use the host network directly will remove this overhead.

However, given the current architecture there are some complications that need to be handled. For example the side car container would also be on the host network, thus exposing it externally on the node's IP. As currently the controller communicates to the side car for state changes, this will need to be secured first.

Enhance documentation

A generic work item for various documentation enhancements we need to do

  • Add more GSDK related links. Describe GSDK lifecycle, Initializing->StandingBy->Active->Process Exit
  • Include thundernetes on official PlayFab docs
  • Add LocalMultiplayerAgent as a debugging tool
  • Mention that user should never modify the .spec of a GameServerBuild/GameServer as soon as its deployed. Builds are immutable (like on MPS).
  • Document how to publish a new version of your GameServer (canary releases etc.). Mention that Builds should be immutable
  • Describe the GameServer states and the GameServerBuild states as well as respective kubectl output

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.