Grigoriy Dobryakov

Howto · breakdown

Breakdown 09 YouTube · 3,600 views

Sending to Kafka is a two-phase commit

The system becomes consistent only when the receiver has processed and acknowledged. Until then, two nodes are in a misaligned state — and this window needs to be designed for, not ignored.

CTO Architect Tech Lead Head of AI

Problem

Teams think: message sent — job done. But the system becomes consistent only when the receiver has processed and acknowledged. Until that moment, two nodes are in a misaligned state — and this window needs to be consciously designed for, not ignored.

Methodology

  1. 1. What the two phases actually are. Phase 1 — the sender commits the change and publishes the event. Phase 2 — the receiver processes and acknowledges. Between phases the system is inconsistent: the sender is done, the receiver isn't yet. This isn't a bug — it's the normal state of eventual consistency. It needs to be designed for explicitly.
  2. 2. The receiver is not a dumb relay. The receiver has its own business logic: it can reject the event (item out of stock, limit exceeded). If the sender doesn't handle rejection — it stays in a misaligned state permanently.
  3. 3. Two architectural choices — both require a business conversation. Fire-and-forget: the sender considers the task done immediately. Await confirmation: the sender waits for acknowledgment from the receiver. Neither is "right" by itself — the choice depends on what the business is willing to accept: a window of inconsistency or additional latency. This has to be made explicit before the architectural decision.
  4. 4. Infrastructure prerequisite: outbox. Without the outbox pattern, phase 1 is already unreliable: writing to the DB and publishing the event are two separate commits that can diverge.
# Phase 1: service A commits the change and sends
with db.transaction():
    order.status = "placed"
    order_repo.save(order)
    outbox.publish("orders.created", order)
# ← A is done; system is inconsistent until phase 2 completes

# Phase 2: service B processes asynchronously
def handle_order_created(event):
    if not inventory.reserve(event.items):
        outbox.publish("orders.rejected", event.order_id)
        return
    fulfillment_repo.create(event)
    outbox.publish("orders.confirmed", event.order_id)

# Phase 1 completes: A receives confirmation from B
def handle_order_confirmed(event):
    order_repo.update(event.order_id, status="confirmed")
# ← only now are both nodes consistent on the business fact

def handle_order_rejected(event):
    order_repo.update(event.order_id, status="rejected")

For tracking state across multiple services — Correlation ID (a single identifier that passes through all phases). For managing compensating actions on rejection — the Saga pattern.

Artifact

Video: Why sending a message to Kafka is a two-phase commit (YouTube @IT-Head, 3,600 views)

Series signature

Where it breaks

For whom and why

For teams building event-driven integrations who want to understand exactly where inconsistency arises — and how to design for it, not discover it in production.

Want to design event-driven consistency for your system?

Outbox, idempotency, Saga, Correlation ID — the mechanics of building event-driven systems that fail predictably, not randomly.

Email me

Other breakdowns

An engineering breakdown series: real task → methodology → working artifact → honest breakdown of where it fails.

Back to series →