Create a VPC Endpoint to connect the CloudWatch service to the private cluster:
logs
/* modules/**eks**/main.tf */
resource "aws_vpc_endpoint" "logs" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.${var.region_primary}.logs"
vpc_endpoint_type = "Interface"
subnet_ids = [
var.subnet_ids.eks1,
var.subnet_ids.eks2
]
security_group_ids = [aws_eks_cluster.main.vpc_config[0].cluster_security_group_id]
private_dns_enabled = true
tags = {
Name = "${var.project_name}-endpoint-logs"
}
}
Install the CloudWatch Observability EKS add-on to enable CloudWatch Container Insights:
Create an IAM role for CloudWatch agent:
CloudWatchAgentServerPolicy to the role# CloudWatch Observability Add-on
## IAM Role & Policy
resource "aws_iam_role" "cloudwatch_agent_role" {
name = "${var.project_name}-cloudwatch-agent-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRoleWithWebIdentity"
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.eks_cluster.arn
}
Condition = {
StringEquals = {
"${replace(aws_iam_openid_connect_provider.eks_cluster.url, "https://", "")}:sub" = "system:serviceaccount:amazon-cloudwatch:cloudwatch-agent"
"${replace(aws_iam_openid_connect_provider.eks_cluster.url, "https://", "")}:aud" = "sts.amazonaws.com"
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "cloudwatch_agent_policy" {
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
role = aws_iam_role.cloudwatch_agent_role.name
}
Create the add-on to enable the CloudWatch agent on every node, make sure to attach the service account role:
/* variables.tf (root) */
variable "enable_cloudwatch" {
description = "Enable CloudWatch Observability Add-on"
type = bool
default = false
}
/* main.tf (root) */
module "eks" {
# ...
enable_cloudwatch = var.enable_cloudwatch
}
/* modules/**eks**/variables.tf */
variable "enable_cloudwatch" {
description = "Enable CloudWatch Observability Add-on"
type = bool
default = false
}
/* modules/**eks**/main.tf */
resource "aws_eks_addon" "cloudwatch" {
count = var.enable_cloudwatch ? 1 : 0
cluster_name = aws_eks_cluster.main.name
addon_name = "amazon-cloudwatch-observability"
service_account_role_arn = aws_iam_role.cloudwatch_agent_role.arn
}
Output the ARN of each role so we can annotate it in the k8s service account later (for IRSA purposes):
/* modules/**eks**/outputs.tf */
****
output "cloudwatch_agent_role_arn" {
value = aws_iam_role.cloudwatch_agent_role.arn
}
/* outputs.tf (root) */
output "cloudwatch_agent_role_arn" {
value = module.eks.cloudwatch_agent_role_arn
}
/* terraform.tfvars (root) */
enable_cloudwatch = true
terraform validate && terraform fmt
terraform plan -out tf.plan
terraform apply "tf.plan"
Create a custom service account in the cluster for the CloudWatch agent service account, through the bastion host, either through a manifest file or other means:
# sa-cloudwatch-agent.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloudwatch-agent
namespace: amazon-cloudwatch
annotations:
eks.amazonaws.com/role-arn: **<cloudwatch_agent_role_arn>**
# From local machine, copy all deployment and HPA manifests
scp -i <path_to_access_key> ./sa-cloudwatch-agent.yaml
# From bastion host
kubectl apply -f './manifests/sa-cloudwatch-agent.yaml'
After a few minutes, verify the add-on in the cluster:
kubectl get pods -n amazon-cloudwatch
Check Container Insights in the console:
You can see all the clusters that have been created in your account. In this example, I have only the cluster which just created earlier

You can find the metrics for individual node or pod of the cluster in performance monitoring dashboard

Checking application metrics and logs:
On the CloudWatch Observability Add-on which has just been added to the cluster, we can see that there are two types of observability: Granular container-level with Container Insights and application-level with Application Signals

To check the application metrics, select a service to monitor and create an Application Signal for each, then wait for a few minutes for the agent to restart the containers related to that service in your cluster


Then for each service, you can check its service operations to see the API requests’ health & latency

You can also check application logs at a certain point by clicking on the traces in the graph, then CloudWatch will handle the query for you

