~/services/Observability, OpenTelemetry & metrics pipelines
2026·04·20
serviceB2B contractremote · GMT+3

Observability, OpenTelemetry & metrics pipelines

OTel collectors, exporters, histograms, and the bills they generate — designed so the data you ship is the data you actually need.

What I do

  • Design OTel collector topologies that split signals correctly between vendors and self-hosted backends.
  • Diagnose and fix histogram / delta-vs-cumulative / temporality bugs that quietly destroy your dashboards.
  • Migrate exporters between Dynatrace, Datadog, Grafana Cloud, and self-hosted Prometheus/Mimir without losing data continuity.
  • Cardinality-bound metric pipelines so a single high-arity label doesn't 100x your bill overnight.
  • Build per-service SLO/SLI definitions and the alerts that come from them — not the alerts the vendor template ships.

When teams hire me

  • Histograms in OTel "don't add up" — the dashboards show one number, the SDK reports another.
  • Dynatrace / Datadog bill is growing faster than the product, and engineering can't explain why.
  • A migration from a hosted vendor to self-hosted (or the other way) is stalled because nobody trusts the data parity.
  • OTel collectors are dropping data silently and the team only finds out when an SLA breaks.
  • An on-call rotation is buried in alerts that don't correspond to user pain.

Engagement formats

Pipeline review — 1 week

Audit the OTel topology end-to-end. Deliverable: signal map, dropped-data audit, ranked fixes.

Migration / rebuild — 3-8 weeks

Re-architect the metrics tier, run dual-pipelines for parity, switch over without dashboard regressions.

Cost & cardinality tuning — fixed-fee

Reduce observability spend by 30-70% without losing signal. Paid against measured savings.

What I’ve written about this