Step 8: Scale the applications in the cluster

Step 8.1: Pod-level Scaling with HorizontalPodAutoscaler (HPA)

Before creating the HPAs, deploy the Metrics Servers in your cluster (https://kubernetes-sigs.github.io/metrics-server/) to collect metrics for scaling:

Create a private ECR repository called kubernetes-sigs/metrics-server

From the local machine pull the image from what the manifest file indicated, then push to your own repository → The private EKS cluster can fetch through the ECR VPC endpoint.

docker pull --platform linux/arm64 registry.k8s.io/metrics-server/metrics-server:v0.7.2

docker tag registry.k8s.io/metrics-server/metrics-server:v0.7.2 <aws_id>.dkr.ecr.us-east-1.amazonaws.com/kubernetes-sigs/metrics-server:v0.7.2

docker push <aws_id>.dkr.ecr.us-east-1.amazonaws.com/kubernetes-sigs/metrics-server:v0.7.2

Modify the components.yaml manifest file:

# components.yaml
# Change the deployment's image to the private ECR image
image: <aws_id>.dkr.ecr.us-east-1.amazonaws.com/kubernetes-sigs/metrics-server:v0.7.2

Then apply the Metrics Servers to the cluster:

# From local machine, copy the file through SCP to the bastion host
scp -i <path_to_access_key> ./components.yaml ec2-user@<instance_id>:/home/ec2-user/helpers/metrics-components.yaml

# From bastion host
kubectl apply -f ./helpers/metrics-components.yaml
# Verify the metrics server
kubectl get deployment metrics-server -n kube-system
kubectl top nodes

Creating the manifest to deploy the deployment for each app:

# hpa-frontend.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coffeeshop-frontend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coffeeshop-frontend-deploy
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70

Whenever average CPU or RAM usage of the deployment > 70%, the HPA will start adding pods to make sure that usage drops below 70% or until it reaches the maximum numbers of pod = 10
Minimum = 2 → EKS cluster always make sure to deploy it onto different subnets on different AZs → Multi-AZ by default, no need to worry about high availability as long as minimum > 1

# From local machine
scp -i <path_to_access_key> ./hpa-*.yaml ec2-user@<instance_id>:/home/ec2-user/manifests

# From bastion host
kubectl apply -f './manifests/hpa-*.yaml'

Step 8.2: Node-level Scaling with Cluster Autoscaler (CA)

Create the necessary IAM permissions for CA to access AWS resources (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md):

Create an IAM policy callled ClusterAutoscalerEKSDemoPolicy for the CA:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "ec2:DescribeImages",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:GetInstanceTypesFromInstanceRequirements",
        "eks:DescribeNodegroup"
      ],
      "Resource": ["*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup"
      ],
      "Resource": ["*"]
    }
  ]
}

From the bastion host, create the service account for CA with eksctl:

eksctl create iamserviceaccount \\
	--cluster eks-demo-cluster \\
	--namespace kube-system \\
  --name cluster-autoscaler \\
  --attach-policy-arn arn:aws:iam::<AWS_ACCOUNT_ID>:policy/ClusterAutoscalerEKSDemoPolicy \\
  --override-existing-serviceaccounts \\
	--region us-east-1 \\
  --approve

If the cluster is private-only, create this VPC Endpoint to connect the CA to the private cluster:
- Service:
  - com.amazonaws.us-east-1.autoscaling
- VPC & Subnet: EKS subnet 1 & EKS subnet 2
- Security group: ClusterSharedNodeSecurityGroup

Install the Cluster Autoscaler:

Download the manifest files:
- https://raw.githubusercontent.com/kubernetes/autoscaler/refs/heads/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Create a private ECR repository called: autoscaling/cluster-autoscaler

From the local machine pull the images from what the manifest file indicated, then push to your own repository:

docker pull --platform linux/arm64 registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.2

docker tag registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.2 <aws_id>.dkr.ecr.us-east-1.amazonaws.com/autoscaling/cluster-autoscaler:v1.26.2

docker push <aws_id>.dkr.ecr.us-east-1.amazonaws.com/autoscaling/cluster-autoscaler:v1.26.2

Modify the cluster-autoscaler-autodiscover.yaml manifest file:

# cluster-autoscaler-autodiscover.yaml
# Change the images to the private ECR images
image: <aws_id>.dkr.ecr.us-east-1.amazonaws.com/autoscaling/cluster-autoscaler:v1.26.2

# Change the cluster name in container's commands
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eks-demo-cluster

Then apply the CA to the cluster:

# From local machine, copy the file through SCP to the bastion host
scp -i <path_to_access_key> ./cluster-autoscaler-autodiscover.yaml ec2-user@<instance_id>:/home/ec2-user/helpers/cluster-autoscaler-autodiscover.yaml

# From bastion host
kubectl apply -f ./helpers/cluster-autoscaler-autodiscover.yaml

# Verify the metrics server
kubectl get deployment -n kube-system cluster-autoscaler

Make sure to add this annotation, to prevent CA from removing nodes where its own pod is running:

kubectl -n kube-system \\
    annotate deployment.apps/cluster-autoscaler \\
    cluster-autoscaler.kubernetes.io/safe-to-evict="false"

Check if the current node group’s Auto Scaling group (ASG) have these two tags, so CA knows which ASG to scale. If not, add them:

k8s.io/cluster-autoscaler/enabled: true

k8s.io/cluster-autoscaler/eks-demo-cluster: owned
You can test scaling:
- Temporarily removing the HPA, then scale a deployment up:
```
kubectl delete hpa coffeeshop-frontend-hpa
kubectl scale --replicas=10 deployment/coffeeshop-frontend-deploy
```
- The cluster configuration back in step 3.2, should show that max size of the node group = 6 → The system will scale gradually, and stop when reaching to size of 6, or no more pending pods in the cluster
```
kubectl logs -f <cluster_autoscaler_pod_name> -n kube-system
```