Step 8: Scale the applications in the cluster

Step 8.1: Pod-level Scaling with HorizontalPodAutoscaler (HPA)

<aside> ℹ️

Kubernetes Metrics Server Installation Guide: https://kubernetes-sigs.github.io/metrics-server/

</aside>

Before creating the HPAs, deploy the Metrics Servers in your cluster to collect metrics for scaling:

Download the manifest file:
- https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Create the necessary private ECR repositories according to the manifest file through console/CLI, then reference them if needed:

kubernetes-sigs/metrics-server (in components.yaml)

/* modules/**ecr**/main.tf */

## Metrics Server
data "aws_ecr_repository" "metrics_server" {
  name = "metrics-server/metrics-server"
}

/* modules/**ecr**/outputs.tf */
****
output "helper_urls" {
  value = {
    # ...
    metrics_server = data.aws_ecr_repository.metrics_server.repository_url
  }
}

terraform validate && terraform fmt
terraform plan -out tf.plan
terraform apply "tf.plan"

From the local machine pull the images from what the manifest file indicated, then push to your own private repositories:

docker pull --platform linux/arm64 registry.k8s.io/metrics-server/metrics-server:v0.8.0

docker tag registry.k8s.io/metrics-server/metrics-server:v0.8.0 **<aws_account_id>**.dkr.ecr.us-east-1.amazonaws.com/kubernetes-sigs/metrics-server:v0.8.0

docker push **<aws_account_id>**.dkr.ecr.us-east-1.amazonaws.com/kubernetes-sigs/metrics-server:v0.8.0

Modify the components.yaml manifest file:

# components.yaml
image: **<aws_account_id>**.dkr.ecr.us-east-1.amazonaws.com/kubernetes-sigs/metrics-server:v0.8.0

Then apply the Metrics Servers to the cluster:

# From the bastion host
mkdir helpers/metrics-server

# From local machine, copy the file through SCP to the bastion host
scp -i <**path_to_access_key**> ./components.yaml ec2-user@<**bastion_eks_instance_id**>:/home/ec2-user/helpers/metrics-server

# From bastion host
kubectl apply -f ./helpers/metrics-server/components.yaml

# Verify the metrics server
kubectl get deployment metrics-server -n kube-system
kubectl top nodes

Creating the manifest to scale the deployment of each app:

# hpa-frontend.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coffeeshop-frontend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coffeeshop-frontend-deploy
  minReplicas: 2
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70

Whenever average CPU or RAM usage of the deployment > 70%, the HPA will start adding pods to make sure that usage drops below 70% or until it reaches the maximum numbers of pod = 5
Minimum = 2 → EKS cluster always make sure to deploy it onto different subnets on different AZs → Multi-AZ by default, no need to worry about high availability as long as minimum > 1

# From local machine
scp -i <**path_to_access_key**> ./hpa-*.yaml ec2-user@<**bastion_eks_instance_id**>:/home/ec2-user/manifests

# From bastion host
kubectl apply -f './manifests/hpa-*.yaml'

Step 8.2: Node-level Scaling with Cluster Autoscaler (CA)

<aside> ℹ️

Cluster Autoscaler on AWS Installation Guide: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md

</aside>

Create the necessary IAM permissions for CA to access AWS resources:

Create the role with inline policy for the CA:

/* modules/**eks**/main.tf */

# Cluster Autoscaler
## IAM Role for CA
resource "aws_iam_role" "cluster_autoscaler_role" {
  name = "${var.project_name}-cluster-autoscaler-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.eks_cluster.arn
        }
        Condition = {
          StringEquals = {
            "${replace(aws_iam_openid_connect_provider.eks_cluster.url, "https://", "")}:sub" = "system:serviceaccount:kube-system:cluster-autoscaler"
            "${replace(aws_iam_openid_connect_provider.eks_cluster.url, "https://", "")}:aud" = "sts.amazonaws.com"
          }
        }
      }
    ]
  })
}

resource "aws_iam_role_policy" "cluster_autoscaler_policy" {
  name = "EKSClusterAutoscaler"
  role = aws_iam_role.cluster_autoscaler_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "autoscaling:DescribeAutoScalingGroups",
          "autoscaling:DescribeAutoScalingInstances",
          "autoscaling:DescribeLaunchConfigurations",
          "autoscaling:DescribeScalingActivities",
          "ec2:DescribeImages",
          "ec2:DescribeInstanceTypes",
          "ec2:DescribeLaunchTemplateVersions",
          "ec2:GetInstanceTypesFromInstanceRequirements",
          "eks:DescribeNodegroup"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "autoscaling:SetDesiredCapacity",
          "autoscaling:TerminateInstanceInAutoScalingGroup"
        ]
        Resource = "*"
      },
    ]
  })
}

Output the ARN of the role so we can annotate it in the k8s service account later (for IRSA purposes):

/* modules/**eks**/outputs.tf */
****
output "cluster_autoscaler_role_arn" {
  value = aws_iam_role.cluster_autoscaler_role.arn
}

/* outputs.tf (root) */

output "cluster_autoscaler_role_arn" {
  value = module.eks.cluster_autoscaler_role_arn
}

Apply Terraform changes:

terraform validate && terraform fmt
terraform plan -out tf.plan
terraform apply "tf.plan"

We will create the k8s ServiceAccount for the Cluster Autoscaler later (modifying the existing manifest file from the official guide)

Create a VPC Endpoint to connect the EC2 Auto Scaling service to the private cluster:

Interface Type: autoscaling
- Subnets: EKS 1, EKS 2
- Security Group: the EKS cluster’s computed security group
- Private DNS: Enabled

/* modules/**eks**/main.tf */

resource "aws_vpc_endpoint" "autoscaling" {
  vpc_id            = var.vpc_id
  service_name      = "com.amazonaws.${var.region_primary}.autoscaling"
  vpc_endpoint_type = "Interface"
  subnet_ids = [
    var.subnet_ids.eks1,
    var.subnet_ids.eks2
  ]
  security_group_ids  = [aws_eks_cluster.main.vpc_config[0].cluster_security_group_id]
  private_dns_enabled = true

  tags = {
    Name = "${var.project_name}-endpoint-autoscaling"
  }
}

Install the Cluster Autoscaler (Auto-Discovery Setup method):

Download the manifest files:
- https://raw.githubusercontent.com/kubernetes/autoscaler/refs/heads/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Create the necessary private ECR repositories according to the manifest file through console/CLI, then reference them if needed:

autoscaling/cluster-autoscaler (in cluster-autoscaler-autodiscover.yaml)

/* modules/**ecr**/main.tf */

## Cluster Autoscaler
data "aws_ecr_repository" "cluster_autoscaler" {
  name = "autoscaling/cluster-autoscaler"
}

/* modules/**ecr**/outputs.tf */
****
output "helper_urls" {
  value = {
    # ...
    cluster_autoscaler = data.aws_ecr_repository.cluster_autoscaler.repository_url
  }
}

terraform validate && terraform fmt
terraform plan -out tf.plan
terraform apply "tf.plan"

From the local machine pull the images from what the manifest file indicated, then push to your own repository:

docker pull --platform linux/arm64 registry.k8s.io/autoscaling/cluster-autoscaler:v1.32.1

docker tag registry.k8s.io/autoscaling/cluster-autoscaler:v1.32.1 **<aws_account_id>**.dkr.ecr.us-east-1.amazonaws.com/autoscaling/cluster-autoscaler:v1.32.1

docker push **<aws_account_id>**.dkr.ecr.us-east-1.amazonaws.com/autoscaling/cluster-autoscaler:v1.32.1

Modify the cluster-autoscaler-autodiscover.yaml manifest file, there are 3 IMPORTANT things you need to also do beside changing the image path:

Add the created IAM role’s ARN to the CA’s service account metadata
Make sure the cluster’s name is in the Auto Scaling Group’s tags. This will help the CA to find the correct ASGs to auto-scale.
Add a cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation to the CA’s deployment template, so make sure the CA does not accidently “evict” or remove the node where the CA pod is in.

# cluster-autoscaler-autodiscover.yaml
# Change the image to the private ECR image
image: **<aws_account_id>**.dkr.ecr.us-east-1.amazonaws.com/autoscaling/cluster-autoscaler:v1.32.1

# Change the cluster name in Auto Scaling Group's tag
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/**eks-demo-cluster**

## Add annotation to the ServiceAccount
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system
  **annotations:
    eks.amazonaws.com/role-arn: <cluster_autoscaler_role_arn>**
---

## Add annotation to the Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
	# ...
  template:
    metadata:
			# ...
      annotations:
				# ...
        **cluster-autoscaler.kubernetes.io/safe-to-evict: 'false'**
# ...

Then apply the CA to the cluster:

# From the bastion host
mkdir helpers/cluster-autoscaler

# From local machine, copy the file through SCP to the bastion host
scp -i <**path_to_access_key**> ./cluster-autoscaler-autodiscover.yaml ec2-user@<**bastion_eks_instance_id**>:/home/ec2-user/helpers/cluster-autoscaler

# From bastion host
kubectl apply -f ./helpers/cluster-autoscaler/cluster-autoscaler-autodiscover.yaml

# Verify the metrics server
kubectl get deployment -n kube-system cluster-autoscaler

Just for extra measures, check if the current node group’s Auto Scaling group (ASG) have these two tags, so CA knows which ASG to scale. If not, add them:
- k8s.io/cluster-autoscaler/enabled: true
- k8s.io/cluster-autoscaler/eks-demo-cluster: owned
You can test scaling:
- Temporarily removing the HPA, then manually scale one of the apps’ deployment up:
```
kubectl delete hpa coffeeshop-frontend-hpa
kubectl scale --replicas=10 deployment/coffeeshop-frontend-deploy
```
- Some pods should be pending because there are no computing resources left for those pods with only 2 nodes
  
  → The system will scale up gradually by adding more nodes (instances), and stop when reaching to the maximum size of the node group (= 6 according to the cluster configuration back in step 3.2), or there are no more pending pods in the cluster
```
# You can check the CA logs to verify
kubectl get pods -n kube-system
kubectl logs -f <cluster_autoscaler_pod_name> -n kube-system
```