signoz
Setup

SigNoz on EKS — Full Integration Guide


Phase 1: Prerequisites & EBS CSI Driver Setup

SigNoz uses persistent volumes heavily (ClickHouse + Zookeeper). For EKS >= 1.23, the EBS CSI driver is mandatory.

Step 1 — Add IAM Policy for EBS CSI

# Get your cluster's OIDC provider URL
aws eks describe-cluster --name <your-cluster-name> --region <region> \
  --query "cluster.identity.oidc.issuer" --output text
 
# Create IAM service account for EBS CSI driver
eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster <your-cluster-name> \
  --region <region> \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve \
  --role-only \
  --role-name AmazonEKS_EBS_CSI_DriverRole

Step 2 — Install EBS CSI Driver Add-on

# Get your AWS account ID
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
 
# Add the EBS CSI driver addon
aws eks create-addon \
  --cluster-name <your-cluster-name> \
  --region <region> \
  --addon-name aws-ebs-csi-driver \
  --service-account-role-arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/AmazonEKS_EBS_CSI_DriverRole
 
# Verify the addon is active
aws eks describe-addon \
  --cluster-name <your-cluster-name> \
  --region <region> \
  --addon-name aws-ebs-csi-driver \
  --query "addon.status"

Phase 2: Create the ebs-sc StorageClass

cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer   # Important for multi-AZ clusters
reclaimPolicy: Retain                     # Protects data on pod/PVC deletion
parameters:
  type: gp3                               # gp3 is cheaper & faster than gp2
  encrypted: "true"                       # Enable EBS encryption (recommended)
allowVolumeExpansion: true
EOF
 
# Confirm it's created
kubectl get storageclass ebs-sc

Why WaitForFirstConsumer? It ensures the EBS volume is created in the same AZ as the pod, avoiding cross-AZ scheduling failures — critical for stateful apps like ClickHouse.


Phase 3: Install SigNoz via Helm

Step 1 — Add Helm Repo

helm repo add signoz https://charts.signoz.io
helm repo update

Step 2 — Create values.yaml

This is your production-grade config using ebs-sc:

# signoz-values.yaml
 
global:
  storageClass: ebs-sc
 
clickhouse:
  installCustomStorageClass: true
 
  # ClickHouse storage sizing
  persistence:
    enabled: true
    storageClass: ebs-sc
    size: 50Gi               # Increase for production (docs recommend 80GB+)
 
  # ClickHouse resource limits
  resources:
    requests:
      memory: "4Gi"
      cpu: "2"
    limits:
      memory: "8Gi"
      cpu: "4"
 
zookeeper:
  persistence:
    enabled: true
    storageClass: ebs-sc
    size: 10Gi
 
  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "500m"
 
# SigNoz frontend + backend
signoz:
  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "2Gi"
      cpu: "1"
 
# OTel Collector
otelCollector:
  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "1Gi"
      cpu: "500m"

Step 3 — Install SigNoz

helm install signoz signoz/signoz \
  --namespace monitoring \
  --create-namespace \
  --wait \
  --timeout 1h \
  -f signoz-values.yaml

Replace monitoring with your preferred namespace.


Phase 4: Verify the Installation

# 1. Check all pods are Running
kubectl get pods -n monitoring
 
# 2. Check PVCs are Bound to EBS volumes
kubectl get pvc -n monitoring
 
# 3. Check services
kubectl get svc -n monitoring

Expected pod output should show all pods in Running state:

signoz-clickhouse-0            1/1   Running
signoz-zookeeper-0             1/1   Running
signoz-otel-collector-xxx      1/1   Running
signoz-xxx (frontend)          1/1   Running

Step 4 — Health Check via Port-Forward

# In terminal 1 — port-forward
kubectl port-forward -n monitoring svc/signoz 8080:8080
 
# In terminal 2 — health check
curl -X GET http://localhost:8080/api/v1/health
# Expected: {"status":"ok"}

Phase 5: Expose SigNoz (Optional — AWS Load Balancer)

Instead of port-forwarding in production, expose via an ALB or NLB:

# Patch signoz service to use LoadBalancer
# Or add this to your values.yaml under signoz:
signoz:
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
      service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"  # Use "internet-facing" for public

Apply via:

helm upgrade signoz signoz/signoz \
  --namespace monitoring \
  -f signoz-values.yaml

Phase 6: Collect K8s Telemetry (K8s Infra Monitoring)

To monitor your EKS cluster itself (node metrics, pod logs, etc.):

# Install k8s-infra chart from SigNoz
helm install k8s-infra signoz/k8s-infra \
  --namespace monitoring \
  --set otelCollectorEndpoint=signoz-otel-collector.monitoring.svc.cluster.local:4317 \
  --set otelInsecure=true

Quick Troubleshooting Reference

ProblemCommand
Pod stuck in Pendingkubectl describe pod <pod> -n monitoring — check PVC/AZ errors
PVC not bindingkubectl describe pvc <pvc> -n monitoring — check CSI driver
ClickHouse crashkubectl logs signoz-clickhouse-0 -n monitoring
Check EBS volumesaws ec2 describe-volumes --filters Name=tag:kubernetes.io/cluster/<cluster>,Values=owned

Summary of What Gets Installed

ComponentPurposeStorage
SigNoz UI + APIDashboards, alerts, querying
SigNoz OTel CollectorReceives traces/metrics/logs
ClickHouseTime-series data storeebs-sc (gp3 EBS)
ZookeeperClickHouse coordinationebs-sc (gp3 EBS)

Default retention: 7 days for logs & traces, 30 days for metrics. Adjust via SigNoz UI → Settings → General.