A head-to-head comparison of Apache Iceberg and Delta Lake for engineers - covering architecture, features, engine support, vendor neutrality, and real decision criteria based on production experience with both formats.
Apache Iceberg and Delta Lake are the two dominant open table formats for data lakehouses. Both layer ACID transactions, schema evolution, and time travel on top of Parquet data files stored in object storage. The feature gap between them has narrowed considerably since 2023, so the real decision now comes down to ecosystem fit, engine diversity, and how much vendor coupling you can tolerate.
We deploy both formats in production for clients across different stacks. This post covers the architectural differences that actually matter, a feature-by-feature comparison, engine support realities, and a practical framework for choosing between them.
What Is Apache Iceberg and What Is Delta Lake
Apache Iceberg is an open table format specification originally developed at Netflix and donated to the Apache Software Foundation. It defines how table metadata, partitioning, and file tracking work - independent of any compute engine. The specification is designed so that Spark, Flink, Trino, Snowflake, ClickHouse, and any other engine can read and write the same table concurrently. The latest stable release is Iceberg 1.10.1 (December 2025), which brought significant integration improvements including full MERGE support in PyIceberg, enhanced REST catalog capabilities, and broader engine compatibility.
Delta Lake was created by Databricks in 2019 and later contributed to the Linux Foundation. It was built with Apache Spark in mind, and its strongest performance and deepest feature integration remain within the Spark and Databricks ecosystem. Delta Lake 4.0, released in September 2025, added coordinated commits for multi-engine writes, a variant data type for semi-structured data, and catalog-managed tables. The more recent Delta Lake 4.1.0, released in March 2026, introduced Kernel and Spark improvements including support for Spark declarative pipelines and enhanced catalog-managed table capabilities.
Both formats store data as Parquet files and add a metadata layer on top. The difference is in how that metadata layer is structured and what assumptions it makes about the compute engine.
Architecture: Metadata and Transaction Model
The architectural split between Iceberg and Delta Lake starts at the metadata layer, and this difference has real operational consequences.
Iceberg's Hierarchical Metadata
Iceberg uses a multi-layered metadata tree: a catalog points to a metadata file, which references a manifest list, which references individual manifests, which track the actual Parquet data files. Each manifest carries column-level statistics (min/max values, null counts), enabling aggressive file pruning at query planning time - before any data is scanned. For tables with thousands of partitions or billions of files, this architecture avoids the full-scan planning bottleneck that plagues simpler approaches. Critically, Iceberg never needs to list files in object storage - all file tracking is done through the metadata tree, eliminating the expensive and slow directory listing operations that become a major bottleneck at scale.
Iceberg's catalog layer is defined by an open REST API specification, meaning any engine in any language can discover and interact with tables without importing a specific runtime. Apache Polaris, which graduated to a top-level Apache project in February 2026, implements this spec and provides a vendor-neutral catalog with fine-grained access control. Other options include AWS Glue (39.3% adoption in the 2025 Iceberg ecosystem survey), Nessie, Lakekeeper, Unity Catalog, and the traditional Hive Metastore.
Delta Lake's Transaction Log
Delta Lake uses a flat transaction log - the _delta_log directory - consisting of JSON files for each commit and periodic Parquet checkpoint files. Every read must reconstruct the current table state by replaying the log from the last checkpoint. This is straightforward and works well in Spark, where the Delta library handles log replay natively. Outside Spark, engines must implement their own log replay logic, which has historically lagged behind in feature coverage.
Delta Lake's catalog story centers on Unity Catalog, which Databricks open-sourced in mid-2024. Unity Catalog provides lineage, access control, and cross-workspace sharing. The open-source release has made it available outside the Databricks platform, though the most advanced governance features still benefit from Databricks integration. That said, Delta Lake does not require Unity Catalog - it has long been used with standard Hive Metastores and AWS Glue as catalog options. While Delta Lake is formally hosted under the Linux Foundation (specifically the LF AI & Data Foundation since 2024), Databricks remains the dominant contributor and effectively steers the project's direction.
| Aspect | Apache Iceberg | Delta Lake |
|---|---|---|
| Metadata structure | Hierarchical (catalog -> metadata -> manifest list -> manifests -> data files) | Flat transaction log (JSON commits + Parquet checkpoints) |
| Query planning | File-level pruning via manifest statistics; no file listing needed | File-level pruning via per-partition and per-column statistics in log files |
| Catalog API | Open REST spec (Polaris, Glue, Nessie, Unity Catalog, Hive) | Unity Catalog (open-sourced), Hive Metastore, AWS Glue |
| Governance | Apache Foundation | Linux Foundation (Databricks-steered) |
| Spec versioning | v1, v2, v3 (modular feature flags) | Protocol versions (reader/writer version pairs) |
Feature Comparison
The feature sets have converged significantly. Both formats now support the core lakehouse capabilities, but the implementation details differ in ways that affect production operations.
| Feature | Apache Iceberg | Delta Lake |
|---|---|---|
| Schema evolution | Full (add, drop, rename, reorder, type promotion) | Full (add, drop, rename, type widening in 4.0) |
| Partition evolution | Hidden partitioning with in-place evolution - metadata-only operation, no data rewrite | Traditional Hive-style partitioning, or liquid clustering which replaces partitioning entirely with automatic, adaptive data layout |
| Time travel | By snapshot ID or timestamp | By version number or timestamp |
| Row-level deletes | Copy-on-write and merge-on-read; deletion vectors in v3 spec (1.8.0+) | Copy-on-write and merge-on-read; deletion vectors since 2.3 |
| Branching/tagging | Native branch and tag support on snapshots | Not natively supported |
| MERGE INTO | Supported across engines | Best-in-class on Spark; limited elsewhere |
| Variant type | Supported in v3 spec (1.9.0+) | Supported in Delta Lake 4.0 |
| Change data feed | Supported via CDC readers | Native Change Data Feed (CDF) with row-level change tracking |
| Multi-table transactions | Not natively supported | Supported - atomic commits across multiple Delta tables |
| Multi-engine writes | Native via REST catalog and spec-level guarantees | Coordinated commits (preview in 4.0) |
Hidden Partitioning Deserves Special Attention
Iceberg's hidden partitioning is one of the most impactful differences in day-to-day use. With Iceberg, you define partition transforms (year, month, day, hour, bucket, truncate) at the table level, and queries are automatically pruned without users needing to know the partition layout. When business requirements change - say you need to re-partition from monthly to daily granularity - Iceberg handles this as a metadata-only operation. Old data stays in the old partition layout. New data follows the new scheme. No rewrite needed.
Delta Lake traditionally used Hive-style partitioning, but liquid clustering is replacing the partition mechanism entirely. Liquid clustering automatically organizes data based on clustering keys you specify, adapting the physical layout over time without manual partition management. It eliminates the need to choose a fixed partition scheme upfront and handles layout evolution transparently. For teams starting new Delta Lake tables, liquid clustering is the recommended approach. However, migrating existing partitioned tables to liquid clustering can be complex due to historical data layouts, which is where Iceberg's metadata-only partition evolution still holds an advantage.
Engine and Ecosystem Support
This is where the practical gap is widest. Iceberg was designed as an engine-agnostic specification; Delta Lake was designed for Spark.
| Engine | Iceberg Support | Delta Lake Support |
|---|---|---|
| Apache Spark | Native read/write | Native read/write (best-in-class) |
| Apache Flink | Native read/write | Read/write via connector (maturing) |
| Trino/Presto | Full-featured connector | Connector available (fewer features) |
| Snowflake | Native managed Iceberg tables (GA since 2024) | Read via external tables |
| ClickHouse | Production-grade read support; experimental write support | Read/write support |
| AWS Athena | Native support | Read support |
| DuckDB | Native read/write | Read and limited write support via extension |
| Databricks | Full support | Native (first-class citizen) |
| Dremio | Native read/write | Read via UniForm |
The 2025 State of the Iceberg Ecosystem survey found that while 96.4% of respondents use Spark with Iceberg, multi-engine adoption is strong: 60.7% use Trino, 32.1% use Flink, and 28.6% use DuckDB. That multi-engine story is Iceberg's defining advantage.
UniForm: Delta's Interoperability Bridge
Databricks addressed the engine gap with UniForm, which automatically generates Iceberg metadata alongside Delta commits. This lets Iceberg-compatible engines read Delta tables without data conversion. It is read-only - you still write through Delta/Spark. And there are constraints: tables must have column mapping enabled, deletion vectors cannot be used alongside UniForm, and Delta versions do not map cleanly to Iceberg snapshot IDs. UniForm is a pragmatic bridge, not a replacement for native Iceberg support.
For teams that need bidirectional conversion, Apache XTable (incubating) translates metadata between Iceberg, Delta Lake, and Hudi. It is still early-stage, with rough edges around incremental sync and engine compatibility, but it works for one-time migrations.
Language and SDK Support
Engine support gets the most attention, but programming language coverage matters just as much - especially for teams building data pipelines, CLI tools, or embedded analytics outside the JVM.
Apache Iceberg has official implementations across four languages, all governed as Apache sub-projects:
- Java - the reference implementation (apache/iceberg), latest v1.10.1. This is the most mature and feature-complete library, used by Spark, Flink, and Trino.
- Python - PyIceberg (apache/iceberg-python), latest v0.11.1. Supports reading with filter pushdowns, append/overwrite writes, MERGE operations, and integrates with Pandas, DuckDB, and Ray.
- Rust - iceberg-rust, latest v0.8.0. Supports V3 metadata format, native partitioned writes, and exposes Python bindings via
pyiceberg-core. - Go - iceberg-go, actively maintained with regular releases.
Delta Lake has two main implementations:
- JVM/Scala - the primary Delta Lake project (delta-io/delta), v4.0. This is the canonical implementation, tightly integrated with Spark.
- Rust/Python - delta-rs with Python bindings via the
deltalakePyPI package. This is a community-driven project (not Databricks-maintained) that enables Delta Lake usage outside the JVM. It is the only viable path for Python-native or Rust-native Delta workloads.
| Language | Apache Iceberg | Delta Lake |
|---|---|---|
| Java/Scala | Reference implementation (Apache) | Primary implementation (Databricks-steered) |
| Python | PyIceberg (Apache, v0.11.1) | deltalake via delta-rs (community, v1.4.x) |
| Rust | iceberg-rust (Apache, v0.8.0) | delta-rs (community) |
| Go | iceberg-go (Apache) | No production implementation |
The pattern is clear: Iceberg's multi-language story is broader and officially governed under Apache, while Delta Lake's non-JVM support depends on the community-maintained delta-rs project. For teams building in Python or Rust without Spark, this is a meaningful consideration.
When to Choose Iceberg vs When to Choose Delta Lake
The technology choice matters less than the ecosystem you are building around.
Choose Apache Iceberg when:
- You run a multi-engine architecture - Spark for batch, Flink for streaming, Trino for ad-hoc queries, ClickHouse for analytics
- Vendor neutrality is a hard requirement - your data stays in open formats governed by the Apache Foundation
- You have existing tables with established partitioning layouts that are complex to migrate to liquid clustering due to historical data and dependencies
- You need language diversity beyond the JVM - PyIceberg, iceberg-rust, and iceberg-go are all official Apache projects with active development
Choose Delta Lake when:
- Your stack is Spark and Databricks end-to-end, and you want the tightest possible integration with Databricks' optimized runtime, Photon engine, and Unity Catalog
- You need Unity Catalog's governance features (lineage, fine-grained access, cross-workspace sharing) today
- Your team already has significant Delta Lake investment and migration cost is not justified
- You primarily do streaming upserts on Spark, where Delta's merge performance is heavily optimized
- You need native Change Data Feed capabilities for downstream consumers to efficiently track row-level changes, or require multi-table transaction support for mission-critical workloads
The bigger picture: for teams running exclusively on Databricks, Delta Lake is the right choice - it is deeply optimized for that environment, and UniForm handles interoperability edge cases. For multi-engine architectures, Iceberg's engine-agnostic design and broader ecosystem support remain a strong advantage. That said, the two formats are converging - features like deletion vectors, variant types, and interoperability layers are narrowing the gap from both sides. It is increasingly likely that Delta Lake and Iceberg will continue converging or eventually unify, making the choice less permanent than it appears today.
Key Takeaways
- Apache Iceberg and Delta Lake both provide ACID transactions, time travel, and schema evolution on top of Parquet. The feature gap has narrowed to the point where features alone do not determine the choice.
- Iceberg's hierarchical metadata and hidden partitioning give it architectural advantages for large-scale tables and evolving partition strategies.
- Delta Lake's transaction log is simpler and deeply optimized for Spark - teams running Databricks get the best performance and tightest integration. Delta Lake also offers unique strengths like native Change Data Feed, multi-table transactions, and liquid clustering.
- Engine and language support are decisive factors: Iceberg has native, production-grade support across Spark, Flink, Trino, Snowflake, ClickHouse, DuckDB, and more, plus official Apache-governed libraries in Java, Python, Rust, and Go. Delta Lake is strongest on Spark, with non-JVM support relying on the community-maintained delta-rs project.
- For multi-engine, vendor-neutral architectures, Iceberg has the edge. For Databricks-centric stacks, Delta Lake remains the practical default.
- The two formats are converging in features and interoperability. Migration paths exist - UniForm for read interoperability, Apache XTable for metadata translation - but switching formats mid-stream has real operational cost. The good news is that this choice is becoming less permanent as the ecosystem matures.