K8s by Example: Distributed Tracing

Distributed tracing follows requests as they flow through microservices. Each service adds spans to a trace, showing timing and relationships. Based on Google’s Dapper paper, tools like Jaeger and Zipkin collect traces. Use for: debugging latency, understanding dependencies, finding bottlenecks.

terminal

Quick start: deploy Jaeger for trace collection and visualization. For production, use the Jaeger Operator or Helm chart with persistent storage. The all-in-one image is suitable for development.

$ kubectl apply -n tracing --create-namespace -f https://github.com/jaegertracing/jaeger-kubernetes/raw/main/all-in-one/jaeger-all-in-one-template.yml
namespace/tracing created
deployment.apps/jaeger created
service/jaeger-query created
service/jaeger-collector created
otel-sidecar.yaml

The OpenTelemetry Collector sidecar receives traces from the application and exports to backends (Jaeger, Zipkin, Tempo). Apps send traces to localhost; the sidecar handles batching, retry, and export.

apiVersion: v1
kind: Pod
metadata:
  name: traced-app
spec:
  containers:
    - name: app
      image: my-app:v1
      env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://localhost:4317"
        - name: OTEL_SERVICE_NAME
          value: "my-app"
    - name: otel-collector
      image: otel/opentelemetry-collector:0.91.0
      args: ["--config=/etc/otel/config.yaml"]
      ports:
        - containerPort: 4317
        - containerPort: 4318
      volumeMounts:
        - name: otel-config
          mountPath: /etc/otel
  volumes:
    - name: otel-config
      configMap:
        name: otel-collector-config
otel-collector-config.yaml

Collector config defines receivers (how traces come in), processors (batching, filtering), and exporters (where traces go). This config receives OTLP traces and exports to Jaeger.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 1s
        send_batch_size: 1024
    exporters:
      otlp:
        endpoint: "jaeger-collector.tracing.svc:4317"
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp]
trace-propagation.yaml

Trace context propagation passes trace IDs between services via HTTP headers. The W3C Trace Context standard uses traceparent and tracestate headers. Configure apps to extract incoming context and inject on outgoing requests.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  template:
    spec:
      containers:
        - name: order
          image: order-service:v1
          env:
            - name: OTEL_SERVICE_NAME
              value: "order-service"
            - name: OTEL_PROPAGATORS
              value: "tracecontext,baggage"
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-collector.tracing.svc:4317"
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: "k8s.namespace.name=$(POD_NAMESPACE),k8s.pod.name=$(POD_NAME)"
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
jaeger-deployment.yaml

Jaeger all-in-one deployment for development. Includes collector, query UI, and in-memory storage. For production, use separate components with persistent storage (Elasticsearch, Cassandra).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
  namespace: tracing
spec:
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
        - name: jaeger
          image: jaegertracing/all-in-one:1.53
          ports:
            - containerPort: 16686
              name: ui
            - containerPort: 4317
              name: otlp-grpc
            - containerPort: 4318
              name: otlp-http
          env:
            - name: COLLECTOR_OTLP_ENABLED
              value: "true"
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-collector
  namespace: tracing
spec:
  selector:
    app: jaeger
  ports:
    - port: 4317
      name: otlp-grpc
    - port: 16686
      name: ui
sampling-config.yaml

Sampling reduces trace volume in high-traffic systems. Head-based sampling decides at trace start; tail-based samples after seeing the full trace. Configure ratio (0.1 = 10%) or use adaptive sampling based on traffic.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-sampling-config
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    processors:
      batch: {}
      probabilistic_sampler:
        sampling_percentage: 10
      tail_sampling:
        decision_wait: 10s
        policies:
          - name: errors
            type: status_code
            status_code: {status_codes: [ERROR]}
          - name: slow-traces
            type: latency
            latency: {threshold_ms: 1000}
    exporters:
      otlp:
        endpoint: "jaeger-collector:4317"
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [tail_sampling, batch]
          exporters: [otlp]
terminal

Access Jaeger UI to search traces by service, operation, tags, or duration. View trace timelines showing span relationships, identify slow services, and analyze error paths.

$ kubectl port-forward svc/jaeger-collector 16686:16686 -n tracing &

$ curl localhost:16686/api/services | jq '.data'
["order-service", "payment-service", "inventory-service"]

$ curl "localhost:16686/api/traces?service=order-service&limit=10" | jq '.data[0]'
{
  "traceID": "abc123...",
  "spans": [
    {"operationName": "POST /orders", "duration": 245000},
    {"operationName": "check-inventory", "duration": 89000},
    {"operationName": "process-payment", "duration": 156000}
  ]
}

Index | GitHub | Use arrow keys to navigate |