• Create a VPC Endpoint to connect the CloudWatch service to the private cluster:

    • Interface Type: logs
      • Subnets: EKS 1, EKS 2
      • Security Group: the EKS cluster’s computed security group
      • Private DNS: Enabled
    /* modules/**eks**/main.tf */
    
    resource "aws_vpc_endpoint" "logs" {
      vpc_id            = var.vpc_id
      service_name      = "com.amazonaws.${var.region_primary}.logs"
      vpc_endpoint_type = "Interface"
      subnet_ids = [
        var.subnet_ids.eks1,
        var.subnet_ids.eks2
      ]
      security_group_ids  = [aws_eks_cluster.main.vpc_config[0].cluster_security_group_id]
      private_dns_enabled = true
    
      tags = {
        Name = "${var.project_name}-endpoint-logs"
      }
    }
    
  • Install the CloudWatch Observability EKS add-on to enable CloudWatch Container Insights:

    • Create an IAM role for CloudWatch agent:

      • Attach the policy CloudWatchAgentServerPolicy to the role
      # CloudWatch Observability Add-on
      ## IAM Role & Policy
      resource "aws_iam_role" "cloudwatch_agent_role" {
        name = "${var.project_name}-cloudwatch-agent-role"
      
        assume_role_policy = jsonencode({
          Version = "2012-10-17"
          Statement = [
            {
              Action = "sts:AssumeRoleWithWebIdentity"
              Effect = "Allow"
              Principal = {
                Federated = aws_iam_openid_connect_provider.eks_cluster.arn
              }
              Condition = {
                StringEquals = {
                  "${replace(aws_iam_openid_connect_provider.eks_cluster.url, "https://", "")}:sub" = "system:serviceaccount:amazon-cloudwatch:cloudwatch-agent"
                  "${replace(aws_iam_openid_connect_provider.eks_cluster.url, "https://", "")}:aud" = "sts.amazonaws.com"
                }
              }
            }
          ]
        })
      }
      
      resource "aws_iam_role_policy_attachment" "cloudwatch_agent_policy" {
        policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
        role       = aws_iam_role.cloudwatch_agent_role.name
      }
      
    • Create the add-on to enable the CloudWatch agent on every node, make sure to attach the service account role:

      /* variables.tf (root) */
      
      variable "enable_cloudwatch" {
        description = "Enable CloudWatch Observability Add-on"
        type        = bool
        default     = false
      }
      
      /* main.tf (root) */
      
      module "eks" {
      	# ...
        enable_cloudwatch = var.enable_cloudwatch
      }
      
      /* modules/**eks**/variables.tf */
      
      variable "enable_cloudwatch" {
        description = "Enable CloudWatch Observability Add-on"
        type        = bool
        default     = false
      }
      
      
      /* modules/**eks**/main.tf */
      
      resource "aws_eks_addon" "cloudwatch" {
        count = var.enable_cloudwatch ? 1 : 0
        
        cluster_name             = aws_eks_cluster.main.name
        addon_name               = "amazon-cloudwatch-observability"
        service_account_role_arn = aws_iam_role.cloudwatch_agent_role.arn
      }
      
    • Output the ARN of each role so we can annotate it in the k8s service account later (for IRSA purposes):

      /* modules/**eks**/outputs.tf */
      ****
      output "cloudwatch_agent_role_arn" {
        value = aws_iam_role.cloudwatch_agent_role.arn
      }
      
      /* outputs.tf (root) */
      
      output "cloudwatch_agent_role_arn" {
        value = module.eks.cloudwatch_agent_role_arn
      }
      
    • Apply Terraform changes:

      /* terraform.tfvars (root) */
      enable_cloudwatch = true
      
      terraform validate && terraform fmt
      terraform plan -out tf.plan
      terraform apply "tf.plan"
      
    • Create a custom service account in the cluster for the CloudWatch agent service account, through the bastion host, either through a manifest file or other means:

      # sa-cloudwatch-agent.yaml
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: cloudwatch-agent
        namespace: amazon-cloudwatch
        annotations:
          eks.amazonaws.com/role-arn: **<cloudwatch_agent_role_arn>**
      
      # From local machine, copy all deployment and HPA manifests
      scp -i <path_to_access_key> ./sa-cloudwatch-agent.yaml
      
      # From bastion host
      kubectl apply -f './manifests/sa-cloudwatch-agent.yaml'
      
    • After a few minutes, verify the add-on in the cluster:

      kubectl get pods -n amazon-cloudwatch
      
  • Check Container Insights in the console:

    • You can see all the clusters that have been created in your account. In this example, I have only the cluster which just created earlier

      image.png

    • You can find the metrics for individual node or pod of the cluster in performance monitoring dashboard

      image.png

  • Checking application metrics and logs:

    • On the CloudWatch Observability Add-on which has just been added to the cluster, we can see that there are two types of observability: Granular container-level with Container Insights and application-level with Application Signals

      image.png

    • To check the application metrics, select a service to monitor and create an Application Signal for each, then wait for a few minutes for the agent to restart the containers related to that service in your cluster

      image.png

      image.png

    • Then for each service, you can check its service operations to see the API requests’ health & latency

      image.png

  • You can also check application logs at a certain point by clicking on the traces in the graph, then CloudWatch will handle the query for you

    image.png

    image.png