Sending to Kafka is a two-phase commit
The system becomes consistent only when the receiver has processed and acknowledged. Until then, two nodes are in a misaligned state — and this window needs to be designed for, not ignored.
Problem
Teams think: message sent — job done. But the system becomes consistent only when the receiver has processed and acknowledged. Until that moment, two nodes are in a misaligned state — and this window needs to be consciously designed for, not ignored.
Methodology
- 1. What the two phases actually are. Phase 1 — the sender commits the change and publishes the event. Phase 2 — the receiver processes and acknowledges. Between phases the system is inconsistent: the sender is done, the receiver isn't yet. This isn't a bug — it's the normal state of eventual consistency. It needs to be designed for explicitly.
- 2. The receiver is not a dumb relay. The receiver has its own business logic: it can reject the event (item out of stock, limit exceeded). If the sender doesn't handle rejection — it stays in a misaligned state permanently.
- 3. Two architectural choices — both require a business conversation. Fire-and-forget: the sender considers the task done immediately. Await confirmation: the sender waits for acknowledgment from the receiver. Neither is "right" by itself — the choice depends on what the business is willing to accept: a window of inconsistency or additional latency. This has to be made explicit before the architectural decision.
- 4. Infrastructure prerequisite: outbox. Without the outbox pattern, phase 1 is already unreliable: writing to the DB and publishing the event are two separate commits that can diverge.
# Phase 1: service A commits the change and sends
with db.transaction():
order.status = "placed"
order_repo.save(order)
outbox.publish("orders.created", order)
# ← A is done; system is inconsistent until phase 2 completes
# Phase 2: service B processes asynchronously
def handle_order_created(event):
if not inventory.reserve(event.items):
outbox.publish("orders.rejected", event.order_id)
return
fulfillment_repo.create(event)
outbox.publish("orders.confirmed", event.order_id)
# Phase 1 completes: A receives confirmation from B
def handle_order_confirmed(event):
order_repo.update(event.order_id, status="confirmed")
# ← only now are both nodes consistent on the business fact
def handle_order_rejected(event):
order_repo.update(event.order_id, status="rejected")
For tracking state across multiple services — Correlation ID (a single identifier that passes through all phases). For managing compensating actions on rejection — the Saga pattern.
Artifact
Video: Why sending a message to Kafka is a two-phase commit (YouTube @IT-Head, 3,600 views)
Where it breaks
- Dual write without outbox — the DB write committed, the publish failed (or vice versa): inconsistency that will surface under load, not in the demo.
- Consumer idempotency is mandatory — under retries, the two-phase nature of the protocol produces duplicates by design.
- Business isn't ready for the inconsistency window — if the stakeholder expects immediate consistency but the system is eventual consistency, this is a requirements conflict that needs to be resolved before the architecture, not after an incident.
-
Receiver rejected — sender doesn't know —
without an explicit
handle_rejected, the system hangs in an intermediate state permanently.
For whom and why
For teams building event-driven integrations who want to understand exactly where inconsistency arises — and how to design for it, not discover it in production.
Want to design event-driven consistency for your system?
Outbox, idempotency, Saga, Correlation ID — the mechanics of building event-driven systems that fail predictably, not randomly.
Email meOther breakdowns
An engineering breakdown series: real task → methodology → working artifact → honest breakdown of where it fails.
Back to series →