~/services/Apache Flink, Kafka & streaming data architecture
2026·04·20
serviceB2B contractremote · GMT+3

Apache Flink, Kafka & streaming data architecture

Real-time pipelines that survive backpressure, schema drift, and 3am restarts — built on Flink, Kafka, ClickHouse, and the AWS surface they sit on.

What I do

  • Design Apache Flink jobs (DataStream, Table API, SQL) for production workloads — windowing, watermarks, and state TTL chosen for the actual event shape.
  • Architect Kafka / MSK / Confluent / Redpanda topologies including partitioning, retention, and consumer-group strategy.
  • Stand up ClickHouse as the analytical layer behind streaming pipelines — schema design, materialized views, replication.
  • Diagnose backpressure, checkpointing failures, and state-store growth in existing Flink / Spark Streaming jobs.
  • Build CDC pipelines (Debezium → Kafka → Flink) without losing exactly-once semantics.

When teams hire me

  • A real-time pipeline has started being not-real-time and nobody can pinpoint where the lag is.
  • Flink checkpoints are failing or growing without bound and the job keeps falling over.
  • An analytics team needs sub-second queries on event data and Postgres has hit the wall.
  • A CDC pipeline is dropping or duplicating events and the team doesn't trust the warehouse anymore.
  • A new product needs a streaming architecture from scratch and the team has only batch experience.

Engagement formats

Architecture spike — 1-2 weeks

Whiteboard + working prototype against a sample of your data. Deliverable: written architecture + IaC for the prototype.

Build engagement — 6-12 weeks

Ship the pipeline end-to-end with your team. Includes pairing, code review, on-call shadowing.

Rescue engagement — fixed-scope

Get a stuck Flink / Kafka system back to healthy and document why it broke. Scoped tightly.

What I’ve written about this