Apache Iceberg vs Delta Lake: Choosing the Right Table Format

A head-to-head comparison of Apache Iceberg and Delta Lake for engineers - covering architecture, features, engine support, vendor neutrality, and real decision criteria based on production experience with both formats.

Apache Iceberg and Delta Lake are the two dominant open table formats for data lakehouses. Both layer ACID transactions, schema evolution, and time travel on top of Parquet data files stored in object storage. The feature gap between them has narrowed considerably since 2023, so the real decision now comes down to ecosystem fit, engine diversity, and how much vendor coupling you can tolerate.

We deploy both formats in production for clients across different stacks. This post covers the architectural differences that actually matter, a feature-by-feature comparison, engine support realities, and a practical framework for choosing between them.

What Is Apache Iceberg and What Is Delta Lake

Apache Iceberg is an open table format specification originally developed at Netflix and donated to the Apache Software Foundation. It defines how table metadata, partitioning, and file tracking work - independent of any compute engine. The specification is designed so that Spark, Flink, Trino, Snowflake, ClickHouse, and any other engine can read and write the same table concurrently. The latest stable release is Iceberg 1.10.1 (December 2025), which brought significant integration improvements including full MERGE support in PyIceberg, enhanced REST catalog capabilities, and broader engine compatibility.

Delta Lake was created by Databricks in 2019 and later contributed to the Linux Foundation. It was built with Apache Spark in mind, and its strongest performance and deepest feature integration remain within the Spark and Databricks ecosystem. Delta Lake 4.0, released in September 2025, added coordinated commits for multi-engine writes, a variant data type for semi-structured data, and catalog-managed tables. The more recent Delta Lake 4.1.0, released in March 2026, introduced Kernel and Spark improvements including support for Spark declarative pipelines and enhanced catalog-managed table capabilities.

Both formats store data as Parquet files and add a metadata layer on top. The difference is in how that metadata layer is structured and what assumptions it makes about the compute engine.

Architecture: Metadata and Transaction Model

The architectural split between Iceberg and Delta Lake starts at the metadata layer, and this difference has real operational consequences.

Iceberg's Hierarchical Metadata

Iceberg uses a multi-layered metadata tree: a catalog points to a metadata file, which references a manifest list, which references individual manifests, which track the actual Parquet data files. Each manifest carries column-level statistics (min/max values, null counts), enabling aggressive file pruning at query planning time - before any data is scanned. For tables with thousands of partitions or billions of files, this architecture avoids the full-scan planning bottleneck that plagues simpler approaches. Critically, Iceberg never needs to list files in object storage - all file tracking is done through the metadata tree, eliminating the expensive and slow directory listing operations that become a major bottleneck at scale.

Iceberg's catalog layer is defined by an open REST API specification, meaning any engine in any language can discover and interact with tables without importing a specific runtime. Apache Polaris, which graduated to a top-level Apache project in February 2026, implements this spec and provides a vendor-neutral catalog with fine-grained access control. Other options include AWS Glue (39.3% adoption in the 2025 Iceberg ecosystem survey), Nessie, Lakekeeper, Unity Catalog, and the traditional Hive Metastore.

Delta Lake's Transaction Log

Delta Lake uses a flat transaction log - the _delta_log directory - consisting of JSON files for each commit and periodic Parquet checkpoint files. Every read must reconstruct the current table state by replaying the log from the last checkpoint. This is straightforward and works well in Spark, where the Delta library handles log replay natively. Outside Spark, engines must implement their own log replay logic, which has historically lagged behind in feature coverage.

Delta Lake's catalog story centers on Unity Catalog, which Databricks open-sourced in mid-2024. Unity Catalog provides lineage, access control, and cross-workspace sharing. The open-source release has made it available outside the Databricks platform, though the most advanced governance features still benefit from Databricks integration. That said, Delta Lake does not require Unity Catalog - it has long been used with standard Hive Metastores and AWS Glue as catalog options. While Delta Lake is formally hosted under the Linux Foundation (specifically the LF AI & Data Foundation since 2024), Databricks remains the dominant contributor and effectively steers the project's direction.

Aspect	Apache Iceberg	Delta Lake
Metadata structure	Hierarchical (catalog -> metadata -> manifest list -> manifests -> data files)	Flat transaction log (JSON commits + Parquet checkpoints)
Query planning	File-level pruning via manifest statistics; no file listing needed	File-level pruning via per-partition and per-column statistics in log files
Catalog API	Open REST spec (Polaris, Glue, Nessie, Unity Catalog, Hive)	Unity Catalog (open-sourced), Hive Metastore, AWS Glue
Governance	Apache Foundation	Linux Foundation (Databricks-steered)
Spec versioning	v1, v2, v3 (modular feature flags)	Protocol versions (reader/writer version pairs)

Feature Comparison

The feature sets have converged significantly. Both formats now support the core lakehouse capabilities, but the implementation details differ in ways that affect production operations.

Feature	Apache Iceberg	Delta Lake
Schema evolution	Full (add, drop, rename, reorder, type promotion)	Full (add, drop, rename, type widening in 4.0)
Partition evolution	Hidden partitioning with in-place evolution - metadata-only operation, no data rewrite	Traditional Hive-style partitioning, or liquid clustering which replaces partitioning entirely with automatic, adaptive data layout
Time travel	By snapshot ID or timestamp	By version number or timestamp
Row-level deletes	Copy-on-write and merge-on-read; deletion vectors in v3 spec (1.8.0+)	Copy-on-write and merge-on-read; deletion vectors since 2.3
Branching/tagging	Native branch and tag support on snapshots	Not natively supported
MERGE INTO	Supported across engines	Best-in-class on Spark; limited elsewhere
Variant type	Supported in v3 spec (1.9.0+)	Supported in Delta Lake 4.0
Change data feed	Supported via CDC readers	Native Change Data Feed (CDF) with row-level change tracking
Multi-table transactions	Not natively supported	Supported - atomic commits across multiple Delta tables
Multi-engine writes	Native via REST catalog and spec-level guarantees	Coordinated commits (preview in 4.0)

Hidden Partitioning Deserves Special Attention

Iceberg's hidden partitioning is one of the most impactful differences in day-to-day use. With Iceberg, you define partition transforms (year, month, day, hour, bucket, truncate) at the table level, and queries are automatically pruned without users needing to know the partition layout. When business requirements change - say you need to re-partition from monthly to daily granularity - Iceberg handles this as a metadata-only operation. Old data stays in the old partition layout. New data follows the new scheme. No rewrite needed.

Delta Lake traditionally used Hive-style partitioning, but liquid clustering is replacing the partition mechanism entirely. Liquid clustering automatically organizes data based on clustering keys you specify, adapting the physical layout over time without manual partition management. It eliminates the need to choose a fixed partition scheme upfront and handles layout evolution transparently. For teams starting new Delta Lake tables, liquid clustering is the recommended approach. However, migrating existing partitioned tables to liquid clustering can be complex due to historical data layouts, which is where Iceberg's metadata-only partition evolution still holds an advantage.

Engine and Ecosystem Support

This is where the practical gap is widest. Iceberg was designed as an engine-agnostic specification; Delta Lake was designed for Spark.

Engine	Iceberg Support	Delta Lake Support
Apache Spark	Native read/write	Native read/write (best-in-class)
Apache Flink	Native read/write	Read/write via connector (maturing)
Trino/Presto	Full-featured connector	Connector available (fewer features)
Snowflake	Native managed Iceberg tables (GA since 2024)	Read via external tables
ClickHouse	Production-grade read support; experimental write support	Read/write support
AWS Athena	Native support	Read support
DuckDB	Native read/write	Read and limited write support via extension
Databricks	Full support	Native (first-class citizen)
Dremio	Native read/write	Read via UniForm

The 2025 State of the Iceberg Ecosystem survey found that while 96.4% of respondents use Spark with Iceberg, multi-engine adoption is strong: 60.7% use Trino, 32.1% use Flink, and 28.6% use DuckDB. That multi-engine story is Iceberg's defining advantage.

UniForm: Delta's Interoperability Bridge

Databricks addressed the engine gap with UniForm, which automatically generates Iceberg metadata alongside Delta commits. This lets Iceberg-compatible engines read Delta tables without data conversion. It is read-only - you still write through Delta/Spark. And there are constraints: tables must have column mapping enabled, deletion vectors cannot be used alongside UniForm, and Delta versions do not map cleanly to Iceberg snapshot IDs. UniForm is a pragmatic bridge, not a replacement for native Iceberg support.

For teams that need bidirectional conversion, Apache XTable (incubating) translates metadata between Iceberg, Delta Lake, and Hudi. It is still early-stage, with rough edges around incremental sync and engine compatibility, but it works for one-time migrations.

Language and SDK Support

Engine support gets the most attention, but programming language coverage matters just as much - especially for teams building data pipelines, CLI tools, or embedded analytics outside the JVM.

Apache Iceberg has official implementations across four languages, all governed as Apache sub-projects:

Java - the reference implementation (apache/iceberg), latest v1.10.1. This is the most mature and feature-complete library, used by Spark, Flink, and Trino.
Python - PyIceberg (apache/iceberg-python), latest v0.11.1. Supports reading with filter pushdowns, append/overwrite writes, MERGE operations, and integrates with Pandas, DuckDB, and Ray.
Rust - iceberg-rust, latest v0.8.0. Supports V3 metadata format, native partitioned writes, and exposes Python bindings via pyiceberg-core.
Go - iceberg-go, actively maintained with regular releases.

Delta Lake has two main implementations:

JVM/Scala - the primary Delta Lake project (delta-io/delta), v4.0. This is the canonical implementation, tightly integrated with Spark.
Rust/Python - delta-rs with Python bindings via the deltalake PyPI package. This is a community-driven project (not Databricks-maintained) that enables Delta Lake usage outside the JVM. It is the only viable path for Python-native or Rust-native Delta workloads.

Language	Apache Iceberg	Delta Lake
Java/Scala	Reference implementation (Apache)	Primary implementation (Databricks-steered)
Python	PyIceberg (Apache, v0.11.1)	deltalake via delta-rs (community, v1.4.x)
Rust	iceberg-rust (Apache, v0.8.0)	delta-rs (community)
Go	iceberg-go (Apache)	No production implementation

The pattern is clear: Iceberg's multi-language story is broader and officially governed under Apache, while Delta Lake's non-JVM support depends on the community-maintained delta-rs project. For teams building in Python or Rust without Spark, this is a meaningful consideration.

When to Choose Iceberg vs When to Choose Delta Lake

The technology choice matters less than the ecosystem you are building around.

Choose Apache Iceberg when:

You run a multi-engine architecture - Spark for batch, Flink for streaming, Trino for ad-hoc queries, ClickHouse for analytics
Vendor neutrality is a hard requirement - your data stays in open formats governed by the Apache Foundation
You have existing tables with established partitioning layouts that are complex to migrate to liquid clustering due to historical data and dependencies
You need language diversity beyond the JVM - PyIceberg, iceberg-rust, and iceberg-go are all official Apache projects with active development

Choose Delta Lake when:

Your stack is Spark and Databricks end-to-end, and you want the tightest possible integration with Databricks' optimized runtime, Photon engine, and Unity Catalog
You need Unity Catalog's governance features (lineage, fine-grained access, cross-workspace sharing) today
Your team already has significant Delta Lake investment and migration cost is not justified
You primarily do streaming upserts on Spark, where Delta's merge performance is heavily optimized
You need native Change Data Feed capabilities for downstream consumers to efficiently track row-level changes, or require multi-table transaction support for mission-critical workloads

The bigger picture: for teams running exclusively on Databricks, Delta Lake is the right choice - it is deeply optimized for that environment, and UniForm handles interoperability edge cases. For multi-engine architectures, Iceberg's engine-agnostic design and broader ecosystem support remain a strong advantage. That said, the two formats are converging - features like deletion vectors, variant types, and interoperability layers are narrowing the gap from both sides. It is increasingly likely that Delta Lake and Iceberg will continue converging or eventually unify, making the choice less permanent than it appears today.

Key Takeaways

Apache Iceberg and Delta Lake both provide ACID transactions, time travel, and schema evolution on top of Parquet. The feature gap has narrowed to the point where features alone do not determine the choice.
Iceberg's hierarchical metadata and hidden partitioning give it architectural advantages for large-scale tables and evolving partition strategies.
Delta Lake's transaction log is simpler and deeply optimized for Spark - teams running Databricks get the best performance and tightest integration. Delta Lake also offers unique strengths like native Change Data Feed, multi-table transactions, and liquid clustering.
Engine and language support are decisive factors: Iceberg has native, production-grade support across Spark, Flink, Trino, Snowflake, ClickHouse, DuckDB, and more, plus official Apache-governed libraries in Java, Python, Rust, and Go. Delta Lake is strongest on Spark, with non-JVM support relying on the community-maintained delta-rs project.
For multi-engine, vendor-neutral architectures, Iceberg has the edge. For Databricks-centric stacks, Delta Lake remains the practical default.
The two formats are converging in features and interoperability. Migration paths exist - UniForm for read interoperability, Apache XTable for metadata translation - but switching formats mid-stream has real operational cost. The good news is that this choice is becoming less permanent as the ecosystem matures.