Kubernetes Guide


Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications originally designed by Google.
While Release dramatically simplifies operating Kubernetes for our users, it is important to have at least a basic understanding of some of its key concepts and terms as they relate to Release:
  • Pod - one or more containers that run with shared storage and networking
  • Node - the cloud computing instances that run your Pods (e.g. an AWS EC2 instance)
  • Cluster - manages Pods and related resources across a set of Nodes
  • Ingress - routes HTTP(S) traffic from outside the cluster to services within it
Release manages the creation and administration of Kubernetes clusters within your cloud account(s). Configuration of your cluster such as node instance types, autoscaling of nodes, Kubernetes version upgrades, and deployment of resources into your cluster are all managed within the Release platform.
You can learn more about Kubernetes from their documentation.

Node Sizing

To ensure the best performance and to keep your cloud costs in check, it’s important to size the nodes within your Kubernetes cluster appropriately for the applications they’ll be running. This involves selecting a cloud instance type that fits the needs of your workloads.
Some factors to consider include:
  • How much memory does your application need under your expected usage?
  • Does your application tend to be more memory or CPU-bound?
  • Does your application need access to high performance local storage?
  • How sensitive are the environments within this cluster to variance in performance?
  • What are the minimum and maximum number of environments that will be running at a time?
Overlaying these considerations onto your budget will help you choose an appropriate instance type for your nodes. This is also something that can and should be tuned as your needs or usage patterns change.
For more information on node sizing, see the AWS EKS documentation. This third-party EC2 instance comparison tool can also be helpful, particularly with regards to pricing.


At a high level, there are two major categories of autoscaling to consider within a Kubernetes cluster:
  • Node Autoscaling
  • Pod Autoscaling

Node Autoscaling

Release manages node autoscaling on your behalf. If the Kubernetes cluster does not have sufficient resources available to run a workload, then new nodes will be automatically provisioned and added to the cluster up to some maximum. Likewise, nodes will be automatically terminated and removed (down to some minimum) from the cluster as environments are removed.
Typically, most end users will set a comfortable minimum and maximum settings for the number and type of nodes which will change slowly over time. This is not something that end users find they spend a lot of time worrying about.

Pod Autoscaling

Pod autoscaling is a more advanced topic because many application-specific factors need to be taken into account. To give a few examples:
  • How can we tell that the system is in a state that it needs more or fewer pods?
  • Which types of pods do we need more or fewer of to respond to the changing load?
  • How can we avoid overwhelming limited resources (e.g. DB connections?)
Release can support most popular pod autoscaling solutions that exist in the Kubernetes ecosystem but they are configured at the Kubernetes layer and require additional effort and expertise to effectively integrate them into your application. If you’d like more information on pod autoscaling, contact us.
Typically, pod autoscaling is something that most end users will not worry about or even consider until they deploy to production environments. Some customers may also want to configure and test a production-like environment for performance testing and tuning. These are the most common use cases for pod autoscaling.

Resource Management

Kubernetes provides capabilities that ensure when an application environment is created, its services are run on nodes that have sufficient memory and CPU available. Kubernetes generally refers to this type of configuration as a resource request. It can also monitor ongoing resource usage and restart pods if they exceed the configured limit. Requests are typically thought of as a minimum set of guaranteed resources, while limits are typically thought of as the maximum advisable values.
Tuning resource requests and limits is key to efficiently utilizing your cluster’s resources and to maintaining the stability and performance of your applications. Since resource configuration is heavily dependent on your application’s unique needs, this is something that should be carefully considered and adjusted over time. For more information on service resource tuning, refer to our documentation.