A head-to-head comparison of Apache Iceberg and Delta Lake for engineers - covering architecture, features, engine support, vendor neutrality, and real decision criteria based on production experience with both formats.

Apache Iceberg and Delta Lake are the two dominant open table formats for data lakehouses. Both layer ACID transactions, schema evolution, and time travel on top of Parquet data files stored in object storage. The feature gap between them has narrowed considerably since 2023, so the real decision now comes down to ecosystem fit, engine diversity, and how much vendor coupling you can tolerate.

We deploy both formats in production for clients across different stacks. This post covers the architectural differences that actually matter, a feature-by-feature comparison, engine support realities, and a practical framework for choosing between them.

What Is Apache Iceberg and What Is Delta Lake

Apache Iceberg is an open table format specification originally developed at Netflix and donated to the Apache Software Foundation. It defines how table metadata, partitioning, and file tracking work - independent of any compute engine. The specification is designed so that Spark, Flink, Trino, Snowflake, ClickHouse, and any other engine can read and write the same table concurrently. The latest stable release is Iceberg 1.8.0 (February 2025), which introduced deletion vectors and the v3 spec features.

Delta Lake was created by Databricks in 2019 and later contributed to the Linux Foundation. It was built with Apache Spark in mind, and its strongest performance and deepest feature integration remain within the Spark and Databricks ecosystem. Delta Lake 4.0, released in September 2025, added coordinated commits for multi-engine writes, a variant data type for semi-structured data, and catalog-managed tables.

Both formats store data as Parquet files and add a metadata layer on top. The difference is in how that metadata layer is structured and what assumptions it makes about the compute engine.

Architecture: Metadata and Transaction Model

The architectural split between Iceberg and Delta Lake starts at the metadata layer, and this difference has real operational consequences.

Iceberg's Hierarchical Metadata

Iceberg uses a multi-layered metadata tree: a catalog points to a metadata file, which references a manifest list, which references individual manifests, which track the actual Parquet data files. Each manifest carries column-level statistics (min/max values, null counts), enabling aggressive file pruning at query planning time - before any data is scanned. For tables with thousands of partitions or billions of files, this architecture avoids the full-scan planning bottleneck that plagues simpler approaches.

Iceberg's catalog layer is defined by an open REST API specification, meaning any engine in any language can discover and interact with tables without importing a specific runtime. Apache Polaris, which graduated to a top-level Apache project in February 2026, implements this spec and provides a vendor-neutral catalog with fine-grained access control. Other options include AWS Glue (39.3% adoption in the 2025 Iceberg ecosystem survey), Nessie, and Lakekeeper.

Delta Lake's Transaction Log

Delta Lake uses a flat transaction log - the _delta_log directory - consisting of JSON files for each commit and periodic Parquet checkpoint files. Every read must reconstruct the current table state by replaying the log from the last checkpoint. This is straightforward and works well in Spark, where the Delta library handles log replay natively. Outside Spark, engines must implement their own log replay logic, which has historically lagged behind in feature coverage.

Delta Lake's catalog story centers on Unity Catalog, Databricks' governance layer. Unity Catalog provides lineage, access control, and cross-workspace sharing, but it is tightly coupled to the Databricks platform. The open-source Unity Catalog release covers basic functionality; full governance features require Databricks. While Delta Lake is formally hosted under the Linux Foundation (specifically the LF AI & Data Foundation since 2024), Databricks remains the dominant contributor and effectively steers the project's direction - the Technical Steering Committee has historically been composed of Databricks affiliates, and the project's roadmap closely tracks Databricks' commercial priorities.

Aspect Apache Iceberg Delta Lake
Metadata structure Hierarchical (catalog -> metadata -> manifest list -> manifests -> data files) Flat transaction log (JSON commits + Parquet checkpoints)
Query planning File-level pruning via manifest statistics Log replay to reconstruct file list
Catalog API Open REST spec (Polaris, Glue, Nessie) Unity Catalog (Databricks-centric)
Governance Apache Foundation Linux Foundation (Databricks-steered)
Spec versioning v1, v2, v3 (modular feature flags) Protocol versions (reader/writer version pairs)

Feature Comparison

The feature sets have converged significantly. Both formats now support the core lakehouse capabilities, but the implementation details differ in ways that affect production operations.

Feature Apache Iceberg Delta Lake
Schema evolution Full (add, drop, rename, reorder, type promotion) Full (add, drop, rename, type widening in 4.0)
Partition evolution Hidden partitioning with in-place evolution - metadata-only operation, no data rewrite Partition columns are explicit; changing requires rewriting data or using generated columns
Time travel By snapshot ID or timestamp By version number or timestamp
Row-level deletes Copy-on-write and merge-on-read; deletion vectors in v3 spec (1.8.0+) Copy-on-write and merge-on-read; deletion vectors since 2.3
Branching/tagging Native branch and tag support on snapshots Not natively supported
MERGE INTO Supported across engines Best-in-class on Spark; limited elsewhere
Variant type Supported in v3 spec (1.9.0+) Supported in Delta Lake 4.0
Multi-engine writes Native via REST catalog and spec-level guarantees Coordinated commits (preview in 4.0)

Hidden Partitioning Deserves Special Attention

Iceberg's hidden partitioning is one of the most impactful differences in day-to-day use. With Iceberg, you define partition transforms (year, month, day, hour, bucket, truncate) at the table level, and queries are automatically pruned without users needing to know the partition layout. When business requirements change - say you need to re-partition from monthly to daily granularity - Iceberg handles this as a metadata-only operation. Old data stays in the old partition layout. New data follows the new scheme. No rewrite needed.

Delta Lake uses traditional Hive-style partitioning where partition columns appear explicitly in the directory structure and in queries. You can approximate Iceberg-like behavior using generated columns, but partition evolution still requires either a data rewrite or careful handling of mixed layouts. In practice, teams that outgrow their initial partition scheme on Delta Lake face a more painful migration than those on Iceberg.

Engine and Ecosystem Support

This is where the practical gap is widest. Iceberg was designed as an engine-agnostic specification; Delta Lake was designed for Spark.

Engine Iceberg Support Delta Lake Support
Apache Spark Native read/write Native read/write (best-in-class)
Apache Flink Native read/write Read/write via connector (maturing)
Trino/Presto Full-featured connector Connector available (fewer features)
Snowflake Native managed Iceberg tables (GA since 2024) Read via UniForm
ClickHouse Production-grade read support Limited read support
AWS Athena Native support Read support
DuckDB Native read/write Read support via extension
Databricks Full support Native (first-class citizen)
Dremio Native read/write Read via UniForm

The 2025 State of the Iceberg Ecosystem survey found that while 96.4% of respondents use Spark with Iceberg, multi-engine adoption is strong: 60.7% use Trino, 32.1% use Flink, and 28.6% use DuckDB. That multi-engine story is Iceberg's defining advantage.

UniForm: Delta's Interoperability Bridge

Databricks addressed the engine gap with UniForm, which automatically generates Iceberg metadata alongside Delta commits. This lets Iceberg-compatible engines read Delta tables without data conversion. It is read-only - you still write through Delta/Spark. And there are constraints: tables must have column mapping enabled, deletion vectors cannot be used alongside UniForm, and Delta versions do not map cleanly to Iceberg snapshot IDs. UniForm is a pragmatic bridge, not a replacement for native Iceberg support.

For teams that need bidirectional conversion, Apache XTable (incubating) translates metadata between Iceberg, Delta Lake, and Hudi. It is still early-stage, with rough edges around incremental sync and engine compatibility, but it works for one-time migrations.

Language and SDK Support

Engine support gets the most attention, but programming language coverage matters just as much - especially for teams building data pipelines, CLI tools, or embedded analytics outside the JVM.

Apache Iceberg has official implementations across four languages, all governed as Apache sub-projects:

  • Java - the reference implementation (apache/iceberg), latest v1.10.1. This is the most mature and feature-complete library, used by Spark, Flink, and Trino.
  • Python - PyIceberg (apache/iceberg-python), latest v0.11.1. Supports reading with filter pushdowns, append/overwrite writes, and integrates with Pandas, DuckDB, and Ray. Does not yet support MERGE operations.
  • Rust - iceberg-rust, latest v0.8.0. Supports V3 metadata format, native partitioned writes, and exposes Python bindings via pyiceberg-core.
  • Go - iceberg-go, actively maintained with regular releases.

Delta Lake has two main implementations:

  • JVM/Scala - the primary Delta Lake project (delta-io/delta), v4.0. This is the canonical implementation, tightly integrated with Spark.
  • Rust/Python - delta-rs with Python bindings via the deltalake PyPI package. This is a community-driven project (not Databricks-maintained) that enables Delta Lake usage outside the JVM. It is the only viable path for Python-native or Rust-native Delta workloads.
Language Apache Iceberg Delta Lake
Java/Scala Reference implementation (Apache) Primary implementation (Databricks-steered)
Python PyIceberg (Apache, v0.11.1) deltalake via delta-rs (community, v1.4.x)
Rust iceberg-rust (Apache, v0.8.0) delta-rs (community)
Go iceberg-go (Apache) No production implementation

The pattern is clear: Iceberg's multi-language story is broader and officially governed under Apache, while Delta Lake's non-JVM support depends on the community-maintained delta-rs project. For teams building in Python or Rust without Spark, this is a meaningful consideration.

When to Choose Iceberg vs When to Choose Delta Lake

The technology choice matters less than the ecosystem you are building around.

Choose Apache Iceberg when:

  • You run a multi-engine architecture - Spark for batch, Flink for streaming, Trino for ad-hoc queries, ClickHouse for analytics
  • Vendor neutrality is a hard requirement - your data stays in open formats governed by the Apache Foundation
  • You use Snowflake, which has bet heavily on Iceberg as its open format
  • You expect partition layouts to evolve as data volumes and query patterns change
  • You are on AWS or GCP and want broad compatibility across managed services (Athena, Glue, BigQuery)
  • You need language diversity beyond the JVM - PyIceberg, iceberg-rust, and iceberg-go are all official Apache projects with active development

Choose Delta Lake when:

  • Your stack is Spark and Databricks end-to-end, and you want the tightest possible integration with Databricks' optimized runtime, Photon engine, and Unity Catalog
  • You need Unity Catalog's governance features (lineage, fine-grained access, cross-workspace sharing) today
  • Your team already has significant Delta Lake investment and migration cost is not justified
  • You primarily do streaming upserts on Spark, where Delta's merge performance is heavily optimized
  • You can tolerate UniForm for cross-engine reads and do not need multi-engine writes

The uncomfortable truth: for teams running exclusively on Databricks, Delta Lake is the right choice - it is deeply optimized for that environment, and UniForm handles the interoperability edge cases. For everyone else, Iceberg's engine-agnostic design and broader ecosystem support make it the safer long-term bet. The industry is converging on Iceberg as the standard open format, with even Databricks adding native Iceberg support alongside Delta.

Key Takeaways

  • Apache Iceberg and Delta Lake both provide ACID transactions, time travel, and schema evolution on top of Parquet. The feature gap has narrowed to the point where features alone do not determine the choice.
  • Iceberg's hierarchical metadata and hidden partitioning give it architectural advantages for large-scale tables and evolving partition strategies.
  • Delta Lake's transaction log is simpler and deeply optimized for Spark - teams running Databricks get the best performance and tightest integration.
  • Engine and language support are decisive factors: Iceberg has native, production-grade support across Spark, Flink, Trino, Snowflake, ClickHouse, DuckDB, and more, plus official Apache-governed libraries in Java, Python, Rust, and Go. Delta Lake is strongest on Spark, with non-JVM support relying on the community-maintained delta-rs project.
  • For multi-engine, vendor-neutral architectures, Iceberg is the clear choice. For Databricks-centric stacks, Delta Lake remains the practical default.
  • Migration paths exist - UniForm for read interoperability, Apache XTable for metadata translation - but switching formats mid-stream has real operational cost. Pick deliberately.