Observability, OpenTelemetry & metrics pipelines

What I do

Design OTel collector topologies that split signals correctly between vendors and self-hosted backends.
Diagnose and fix histogram / delta-vs-cumulative / temporality bugs that quietly destroy your dashboards.
Migrate exporters between Dynatrace, Datadog, Grafana Cloud, and self-hosted Prometheus/Mimir without losing data continuity.
Cardinality-bound metric pipelines so a single high-arity label doesn't 100x your bill overnight.
Build per-service SLO/SLI definitions and the alerts that come from them — not the alerts the vendor template ships.

Histograms in OTel "don't add up" — the dashboards show one number, the SDK reports another.
Dynatrace / Datadog bill is growing faster than the product, and engineering can't explain why.
A migration from a hosted vendor to self-hosted (or the other way) is stalled because nobody trusts the data parity.
OTel collectors are dropping data silently and the team only finds out when an SLA breaks.
An on-call rotation is buried in alerts that don't correspond to user pain.

Audit the OTel topology end-to-end. Deliverable: signal map, dropped-data audit, ranked fixes.

Re-architect the metrics tier, run dual-pipelines for parity, switch over without dashboard regressions.

Reduce observability spend by 30-70% without losing signal. Paid against measured savings.