A head-to-head comparison of Apache Iceberg and Delta Lake for engineers - covering architecture, features, engine support, vendor neutrality, and real decision criteria based on production experience with both formats.
Apache Iceberg and Delta Lake are the two dominant open table formats for data lakehouses. Both layer ACID transactions, schema evolution, and time travel on top of Parquet data files stored in object storage. The feature gap between them has narrowed considerably since 2023, so the real decision now comes down to ecosystem fit, engine diversity, and how much vendor coupling you can tolerate.
We deploy both formats in production for clients across different stacks. This post covers the architectural differences that actually matter, a feature-by-feature comparison, engine support realities, and a practical framework for choosing between them.
What Is Apache Iceberg and What Is Delta Lake
Apache Iceberg is an open table format specification originally developed at Netflix and donated to the Apache Software Foundation. It defines how table metadata, partitioning, and file tracking work - independent of any compute engine. The specification is designed so that Spark, Flink, Trino, Snowflake, ClickHouse, and any other engine can read and write the same table concurrently. The latest stable release is Iceberg 1.8.0 (February 2025), which introduced deletion vectors and the v3 spec features.
Delta Lake was created by Databricks in 2019 and later contributed to the Linux Foundation. It was built with Apache Spark in mind, and its strongest performance and deepest feature integration remain within the Spark and Databricks ecosystem. Delta Lake 4.0, released in September 2025, added coordinated commits for multi-engine writes, a variant data type for semi-structured data, and catalog-managed tables.
Both formats store data as Parquet files and add a metadata layer on top. The difference is in how that metadata layer is structured and what assumptions it makes about the compute engine.
Architecture: Metadata and Transaction Model
The architectural split between Iceberg and Delta Lake starts at the metadata layer, and this difference has real operational consequences.
Iceberg's Hierarchical Metadata
Iceberg uses a multi-layered metadata tree: a catalog points to a metadata file, which references a manifest list, which references individual manifests, which track the actual Parquet data files. Each manifest carries column-level statistics (min/max values, null counts), enabling aggressive file pruning at query planning time - before any data is scanned. For tables with thousands of partitions or billions of files, this architecture avoids the full-scan planning bottleneck that plagues simpler approaches.
Iceberg's catalog layer is defined by an open REST API specification, meaning any engine in any language can discover and interact with tables without importing a specific runtime. Apache Polaris, which graduated to a top-level Apache project in February 2026, implements this spec and provides a vendor-neutral catalog with fine-grained access control. Other options include AWS Glue (39.3% adoption in the 2025 Iceberg ecosystem survey), Nessie, and Lakekeeper.
Delta Lake's Transaction Log
Delta Lake uses a flat transaction log - the _delta_log directory - consisting of JSON files for each commit and periodic Parquet checkpoint files. Every read must reconstruct the current table state by replaying the log from the last checkpoint. This is straightforward and works well in Spark, where the Delta library handles log replay natively. Outside Spark, engines must implement their own log replay logic, which has historically lagged behind in feature coverage.
Delta Lake's catalog story centers on Unity Catalog, Databricks' governance layer. Unity Catalog provides lineage, access control, and cross-workspace sharing, but it is tightly coupled to the Databricks platform. The open-source Unity Catalog release covers basic functionality; full governance features require Databricks.
| Aspect | Apache Iceberg | Delta Lake |
|---|---|---|
| Metadata structure | Hierarchical (catalog -> metadata -> manifest list -> manifests -> data files) | Flat transaction log (JSON commits + Parquet checkpoints) |
| Query planning | File-level pruning via manifest statistics | Log replay to reconstruct file list |
| Catalog API | Open REST spec (Polaris, Glue, Nessie) | Unity Catalog (Databricks-centric) |
| Governance | Apache Foundation | Linux Foundation (Databricks-steered) |
| Spec versioning | v1, v2, v3 (modular feature flags) | Protocol versions (reader/writer version pairs) |
Feature Comparison
The feature sets have converged significantly. Both formats now support the core lakehouse capabilities, but the implementation details differ in ways that affect production operations.
| Feature | Apache Iceberg | Delta Lake |
|---|---|---|
| Schema evolution | Full (add, drop, rename, reorder, type promotion) | Full (add, drop, rename, type widening in 4.0) |
| Partition evolution | Hidden partitioning with in-place evolution - metadata-only operation, no data rewrite | Partition columns are explicit; changing requires rewriting data or using generated columns |
| Time travel | By snapshot ID or timestamp | By version number or timestamp |
| Row-level deletes | Copy-on-write and merge-on-read; deletion vectors in v3 spec (1.8.0+) | Copy-on-write and merge-on-read; deletion vectors since 2.3 |
| Branching/tagging | Native branch and tag support on snapshots | Not natively supported |
| MERGE INTO | Supported across engines | Best-in-class on Spark; limited elsewhere |
| Variant type | Supported in v3 spec (1.9.0+) | Supported in Delta Lake 4.0 |
| Multi-engine writes | Native via REST catalog and spec-level guarantees | Coordinated commits (preview in 4.0) |
Hidden Partitioning Deserves Special Attention
Iceberg's hidden partitioning is one of the most impactful differences in day-to-day use. With Iceberg, you define partition transforms (year, month, day, hour, bucket, truncate) at the table level, and queries are automatically pruned without users needing to know the partition layout. When business requirements change - say you need to re-partition from monthly to daily granularity - Iceberg handles this as a metadata-only operation. Old data stays in the old partition layout. New data follows the new scheme. No rewrite needed.
Delta Lake uses traditional Hive-style partitioning where partition columns appear explicitly in the directory structure and in queries. You can approximate Iceberg-like behavior using generated columns, but partition evolution still requires either a data rewrite or careful handling of mixed layouts. In practice, teams that outgrow their initial partition scheme on Delta Lake face a more painful migration than those on Iceberg.
Engine and Ecosystem Support
This is where the practical gap is widest. Iceberg was designed as an engine-agnostic specification; Delta Lake was designed for Spark.
| Engine | Iceberg Support | Delta Lake Support |
|---|---|---|
| Apache Spark | Native read/write | Native read/write (best-in-class) |
| Apache Flink | Native read/write | Read/write via connector (maturing) |
| Trino/Presto | Full-featured connector | Connector available (fewer features) |
| Snowflake | Native managed Iceberg tables (GA since 2024) | Read via UniForm |
| ClickHouse | Production-grade read support | Limited read support |
| AWS Athena | Native support | Read support |
| DuckDB | Native read/write | Read support via extension |
| Databricks | Full support | Native (first-class citizen) |
| Dremio | Native read/write | Read via UniForm |
The 2025 State of the Iceberg Ecosystem survey found that while 96.4% of respondents use Spark with Iceberg, multi-engine adoption is strong: 60.7% use Trino, 32.1% use Flink, and 28.6% use DuckDB. That multi-engine story is Iceberg's defining advantage.
UniForm: Delta's Interoperability Bridge
Databricks addressed the engine gap with UniForm, which automatically generates Iceberg metadata alongside Delta commits. This lets Iceberg-compatible engines read Delta tables without data conversion. It is read-only - you still write through Delta/Spark. And there are constraints: tables must have column mapping enabled, deletion vectors cannot be used alongside UniForm, and Delta versions do not map cleanly to Iceberg snapshot IDs. UniForm is a pragmatic bridge, not a replacement for native Iceberg support.
For teams that need bidirectional conversion, Apache XTable (incubating) translates metadata between Iceberg, Delta Lake, and Hudi. It is still early-stage, with rough edges around incremental sync and engine compatibility, but it works for one-time migrations.
When to Choose Iceberg vs When to Choose Delta Lake
The technology choice matters less than the ecosystem you are building around.
Choose Apache Iceberg when:
- You run a multi-engine architecture - Spark for batch, Flink for streaming, Trino for ad-hoc queries, ClickHouse for analytics
- Vendor neutrality is a hard requirement - your data stays in open formats governed by the Apache Foundation
- You use Snowflake, which has bet heavily on Iceberg as its open format
- You expect partition layouts to evolve as data volumes and query patterns change
- You are on AWS or GCP and want broad compatibility across managed services (Athena, Glue, BigQuery)
Choose Delta Lake when:
- Your stack is Spark and Databricks end-to-end, and you want the tightest possible integration
- You need Unity Catalog's governance features (lineage, fine-grained access, cross-workspace sharing) today
- Your team already has significant Delta Lake investment and migration cost is not justified
- You primarily do streaming upserts on Spark, where Delta's merge performance is heavily optimized
- You can tolerate UniForm for cross-engine reads and do not need multi-engine writes
The uncomfortable truth: for teams running exclusively on Databricks, Delta Lake is the right choice - it is deeply optimized for that environment, and UniForm handles the interoperability edge cases. For everyone else, Iceberg's engine-agnostic design and broader ecosystem support make it the safer long-term bet. The industry is converging on Iceberg as the standard open format, with even Databricks adding native Iceberg support alongside Delta.
Key Takeaways
- Apache Iceberg and Delta Lake both provide ACID transactions, time travel, and schema evolution on top of Parquet. The feature gap has narrowed to the point where features alone do not determine the choice.
- Iceberg's hierarchical metadata and hidden partitioning give it architectural advantages for large-scale tables and evolving partition strategies.
- Delta Lake's transaction log is simpler and deeply optimized for Spark - teams running Databricks get the best performance and tightest integration.
- Engine support is the decisive factor: Iceberg has native, production-grade support across Spark, Flink, Trino, Snowflake, ClickHouse, DuckDB, and more. Delta Lake is strongest on Spark and uses UniForm as a read-only bridge to other engines.
- For multi-engine, vendor-neutral architectures, Iceberg is the clear choice. For Databricks-centric stacks, Delta Lake remains the practical default.
- Migration paths exist - UniForm for read interoperability, Apache XTable for metadata translation - but switching formats mid-stream has real operational cost. Pick deliberately.