A production deep-dive into the PostgreSQL to Kafka to Iceberg CDC path with Debezium: pgoutput logical decoding, replication slots, snapshot strategies, schema history, delivery guarantees, and Iceberg sink options.

Debezium in Production: PostgreSQL to Kafka to Iceberg CDC Patterns That Actually Work

Change Data Capture sounds simple until you run it. You point Debezium at a PostgreSQL primary, events land in Kafka, a sink writes them to Apache Iceberg, and you have a lakehouse mirror of your operational database. The demo works in an afternoon. Then a connector restarts during a long snapshot, a replication slot quietly eats 400 GB of disk over a weekend, and an ALTER TABLE breaks the sink because nobody planned for schema evolution.

This guide is about the parts that the quickstart tutorials skip. It covers the PostgreSQL to Debezium to Kafka to Iceberg path specifically, focused on what survives contact with production: how logical decoding actually feeds Debezium, which snapshot mode to pick, why replication slots are the most dangerous component in the whole pipeline, and what your options are for landing CDC events in Iceberg tables with correct deletes. For where log-based CDC fits among the other pipeline shapes, see our overview of ETL pipeline patterns in 2026.

How Debezium reads PostgreSQL: logical decoding and replication slots

Debezium is a log-based CDC platform, and one of several options we weigh against managed services in our data migration tools comparison. Instead of polling tables or comparing timestamps, it reads the database's transaction log and emits a change event for every row-level insert, update, and delete. On PostgreSQL that transaction log is the Write-Ahead Log (WAL), and the mechanism that turns raw WAL into a consumable stream is logical decoding.

Logical decoding is the PostgreSQL feature that extracts row-level change events from the WAL and streams them to an external consumer through a replication slot. It requires wal_level = logical and turns physical log records into a logical representation of inserts, updates, and deletes.

Three pieces have to line up before Debezium can stream a single event:

  • wal_level = logical in postgresql.conf. The default is replica, which carries enough information for physical standbys but not for logical decoding. Changing this requires a restart, so plan it into a maintenance window.
  • A logical decoding output plugin. This is the code that formats decoded WAL into a wire format. Debezium's PostgreSQL connector supports decoderbufs, wal2json, and pgoutput. In current Debezium releases (2.x and 3.x) the plugin.name property defaults to pgoutput, and it is the right default for almost everyone: pgoutput ships inside PostgreSQL 10+ and is maintained by the Postgres community itself, so there is nothing to compile or install. decoderbufs is a Debezium-maintained Protobuf plugin that needs a native extension on the server; reach for it only if you have a specific reason.
  • A replication slot. The slot is a named, server-side cursor that tracks how far a consumer has read into the WAL. Debezium creates one (default name debezium) and uses it to resume from the exact LSN it last confirmed.

With pgoutput you also get a publication, the Postgres object that defines which tables are part of the logical replication stream. Debezium will create one named dbz_publication for all tables by default. On a busy primary you usually do not want that. Set publication.autocreate.mode to filtered so the publication only contains the tables the connector actually captures, which also keeps WAL volume down.

A minimal, production-shaped connector config looks like this:

{
    "name": "pg-orders-connector",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "database.hostname": "pg-primary.internal",
      "database.port": "5432",
      "database.user": "debezium",
      "database.dbname": "appdb",
      "topic.prefix": "appdb",
      "plugin.name": "pgoutput",
      "slot.name": "debezium_orders",
      "publication.name": "dbz_orders",
      "publication.autocreate.mode": "filtered",
      "table.include.list": "public.orders,public.order_items",
      "snapshot.mode": "initial",
      "heartbeat.interval.ms": "10000",
      "signal.data.collection": "public.debezium_signal",
      "schema.history.internal.kafka.topic": "schema-history.appdb",
      "schema.history.internal.kafka.bootstrap.servers": "kafka:9092"
    }
  }
  

One non-obvious operational fact: a logical replication slot is bound to a single database and lives only on the primary. It is not replicated to physical standbys. If you fail over to a standby, the slot is gone, and Debezium cannot resume from where it left off. Plan failover explicitly, either by recreating the slot and re-snapshotting, or by using a Postgres distribution that supports failover slots. We cover the broader high-availability picture in our guide to PostgreSQL multi-node high availability.

The replication slot disk-growth failure mode

If you remember one thing from this article, make it this. The single most common way a Debezium PostgreSQL pipeline takes down the source database is WAL accumulation behind a stalled replication slot.

A PostgreSQL replication slot prevents the server from recycling any WAL segment that the slot's consumer has not yet confirmed. If Debezium stops consuming, the WAL is retained on the primary's disk and grows without bound until the connector resumes or the disk fills.

The mechanism is simple and unforgiving. Postgres advances confirmed_flush_lsn for a slot only when its consumer acknowledges progress. WAL segments older than the slowest slot's confirmed position cannot be removed. So any time Debezium stops acknowledging, for example a crashed connector, a long network partition, a paused task, or a Kafka outage that blocks the producer, the primary keeps every WAL segment generated since. On a high-write database that can be tens of gigabytes per hour. When the data partition fills, PostgreSQL stops accepting writes, and now your CDC tool has caused a production outage.

There is a second, quieter version of this problem. Even a healthy connector can fall behind if it is watching a low-traffic set of tables on a high-traffic database. Logical decoding has to read through all WAL, including changes to tables you do not capture, but the slot's confirmed position only advances when Debezium sends a flush. If nothing it cares about changes for a while, the slot stays pinned at an old LSN even though the database is busy.

The fix is heartbeats. Set heartbeat.interval.ms so Debezium periodically emits a heartbeat event, which forces a flush and advances the slot. Pair it with heartbeat.action.query, a statement Debezium runs against the source on each heartbeat so there is always a fresh change to decode and confirm:

{
    "heartbeat.interval.ms": "10000",
    "heartbeat.action.query": "INSERT INTO public.debezium_heartbeat (id, ts) VALUES (1, now()) ON CONFLICT (id) DO UPDATE SET ts = now()"
  }
  

Gunnar Morling's deep dive on mastering Postgres replication slots is the canonical reference here and worth reading in full. Operationally, you need three guardrails regardless of heartbeats:

  1. Alert on pg_replication_slots. Monitor pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) per slot and page someone before retained WAL approaches your free disk.
  2. Set max_slot_wal_keep_size (PostgreSQL 13+). This caps how much WAL a slot may retain. The slot is invalidated past the cap, which loses CDC continuity but saves the database. That is the right trade when the alternative is a downed primary.
  3. Drop orphaned slots. If you delete a connector, drop its slot. An abandoned slot retains WAL forever.

Snapshots: getting existing data without freezing the database

A fresh connector has to capture rows that existed before CDC started. That is the snapshot. Debezium reads the current table contents and emits them as read events (op: "r"), then switches to streaming WAL changes. How it does that is controlled by snapshot.mode, and the choice matters a lot on large tables.

Snapshot mode What it does When to use it
initial (default) Snapshot all captured tables once, then stream Most new pipelines; default and safest starting point
initial_only Snapshot, then stop without streaming One-time bulk load into Iceberg, no ongoing CDC
never Skip snapshot, stream from current WAL position Tables already loaded by another process; you only want new changes
always Snapshot on every connector start Rare; testing or environments where prior state cannot be trusted
custom Plug in your own snapshotter implementation Specialized loading logic

The standard initial snapshot is consistent but blunt. Older Debezium versions took an exclusive lock; current versions avoid that on Postgres, but a snapshot of a multi-terabyte table is still a long, single-threaded read that holds the export transaction open and delays the start of streaming. If the connector dies at 80% through that snapshot, it starts over from zero.

Incremental snapshots solve this. Introduced in the DDD-3 design document and described in Debezium's incremental snapshots blog post, an incremental snapshot reads a table in chunks while the connector keeps streaming live WAL changes at the same time. Chunks are watermarked so that a streamed change to a row already captured wins over the snapshot read, which keeps the result consistent without a long-held lock. Crucially, incremental snapshots are resumable: a restart picks up at the last completed chunk instead of restarting the whole table.

You trigger one with an ad-hoc signal. Debezium watches a signaling table named in signal.data.collection, and you start a snapshot by inserting an execute-snapshot row:

INSERT INTO public.debezium_signal (id, type, data)
  VALUES (
    'snap-orders-2026-07',
    'execute-snapshot',
    '{"data-collections": ["public.orders"], "type": "incremental"}'
  );
  

This is how you add a new table to an existing pipeline without restarting the connector or re-snapshotting everything. If your Debezium user has read-only database permissions and cannot write to a signaling table, Debezium also supports sending signals through a Kafka signaling topic instead.

The change event, schema history, and delivery guarantees

Every Debezium message is a structured envelope, not a raw row. The shape is consistent across databases: a before image, an after image, an op code, a source block with metadata like LSN and transaction ID, and a ts_ms timestamp.

Debezium's change event envelope carries op codes of c (create/insert), u (update), d (delete), r (read, emitted during snapshots), and t (truncate). An insert has a null before and a populated after; a delete has a populated before and a null after; an update has both.

That before/after/op structure is exactly what a downstream Iceberg sink needs to reconstruct the table state, so do not flatten it away unless you know your sink expects flat rows.

Schema history is the other piece of state you have to manage. The PostgreSQL connector records DDL and schema changes to an internal Kafka topic (schema.history.internal.kafka.topic) so it can correctly interpret events after a restart. This topic is connector-internal state, not for downstream consumers. Three rules keep it healthy: it must be a single partition, it must never be compacted or have a retention shorter than the connector's lifetime, and you should not let multiple connectors share one. Losing or truncating the schema history topic means the connector can no longer deserialize the WAL stream and you are looking at a re-snapshot.

On schema changes themselves: an additive change such as a new nullable column flows through cleanly when consumers and the sink honor schema evolution. Pair Debezium with a schema registry and Avro so that schema versions are tracked and compatibility is enforced. Breaking changes (dropping a column, narrowing a type) need coordination with downstream consumers regardless of tooling, the same discipline that any schema migration on a live system demands.

For delivery guarantees, the honest default is at-least-once. Debezium tracks offsets in Kafka Connect; on restart it can replay events it had read but not yet committed an offset for, so duplicates are possible. The producer guard is to set exactly.once.support and idempotent producers in Kafka Connect, and every event carries its source LSN, so a downstream consumer can deduplicate on (source.lsn, op) if it needs to. End-to-end exactly-once into Iceberg is a property of the sink, not of Debezium, which brings us to the last and trickiest part.

Guarantee Where it comes from Practical implication
At-least-once Debezium default behavior Duplicates possible on restart; dedupe downstream on LSN
Idempotent producer Kafka Connect producer config Removes duplicate writes within a producer session
Exactly-once sink Iceberg sink connector (control topic / 2PC) Required for accurate MERGE/delete semantics in Iceberg

Sinking CDC into Iceberg: deletes are the hard part

Appending inserts to Iceberg is easy. The challenge is representing updates and deletes, because a CDC stream of u and d events has to translate into row-level mutations on the table. Iceberg handles this with V2 format and equality deletes, delete files that mark rows matching a set of identifier-column values as removed. Your sink has to emit those correctly, and not every sink does.

There are two mainstream paths from Kafka to Iceberg, plus a Kafka-free option.

Apache Iceberg Kafka Connect sink. This is the connector that was donated to the Apache Iceberg project (formerly the Tabular sink) and is now maintained in the Iceberg repo. It supports CDC and upsert modes. Setting iceberg.tables.cdc-field tells it which field carries the operation code, and iceberg.tables.upsert-mode-enabled makes every append precede itself with an equality delete. Both modes require a V2 Iceberg table with identifier fields defined, because that is what equality deletes key on. The connector also provides exactly-once delivery through a sink-managed consumer group and control topic. Note that the connector expects a specific record shape, and the Debezium SMT that adapts the envelope has had rough edges with delete and update handling, so test your op-code routing end to end before trusting it.

Debezium to Kafka to Flink to Iceberg. When you want full control over merge semantics, route the stream through Apache Flink. Flink reads the Debezium changelog, understands +I/-U/+U/-D changelog semantics natively, and writes equality deletes into Iceberg. This is the most flexible path and the one we reach for on demanding pipelines; the trade-off is operating a Flink cluster. We go deep on this combination in Flink and Iceberg: a powerful duo, and on the broader streaming stack in the Kafka, Flink, and ClickHouse architecture blueprint.

Debezium Server (no Kafka). Debezium Server Iceberg runs Debezium as a standalone process that writes straight to Iceberg, skipping Kafka and Kafka Connect entirely. It is a clean fit when the lakehouse is the only consumer and you do not need Kafka as a shared event backbone. You lose Kafka's buffering, replay, and fan-out, so weigh it against a Kafka-centered design rather than treating it as a strict upgrade.

A pattern that works well regardless of sink: land raw Debezium events in an append-only bronze Iceberg table first, then build a merged silver table from it. Bronze gives you a replayable, immutable audit log of every change; silver gives you the current-state table your analysts query. Whatever your sink choice, keep up with Iceberg table maintenance, because equality deletes and frequent small commits produce many delete files and small data files that degrade read performance until they are compacted.

Key takeaways

  • Use pgoutput. It is the current Debezium default plugin, ships with PostgreSQL 10+, and needs no server-side install. Set wal_level = logical and scope your publication with publication.autocreate.mode = filtered.
  • Replication slot disk growth is the top production risk. A stalled or pinned slot retains WAL until the primary's disk fills. Enable heartbeats, alert on pg_replication_slots, set max_slot_wal_keep_size, and drop orphaned slots.
  • Prefer incremental snapshots for large tables. They stream concurrently, resume after a crash, and let you add tables via an execute-snapshot signal without restarting the connector.
  • Protect the schema history topic. Single partition, no compaction, never shared. Losing it forces a re-snapshot.
  • Deletes drive your sink choice. Updates and deletes need Iceberg V2 equality deletes keyed on identifier fields. The Apache Iceberg Kafka Connect sink does this with exactly-once and CDC mode; Flink gives you maximum control over merge semantics; Debezium Server skips Kafka when Iceberg is the only consumer.
  • Adopt bronze/silver. Append raw events to immutable bronze, merge into queryable silver, and run regular compaction to keep delete files and small files under control.

Getting a Postgres to Iceberg CDC pipeline running is a weekend project. Getting one that survives failovers, schema changes, and a Kafka outage at 3 a.m. is a different exercise. If you are designing or stabilizing a CDC and lakehouse pipeline, BigData Boutique does this work for streaming and data platforms in production.