Data Migration Tools Compared: AWS DMS, Debezium, Fivetran, and pgloader

A workload-driven comparison of AWS DMS, Debezium, Fivetran, and pgloader. Choose a data migration tool by the job it does: one-time bulk loads, ongoing CDC replication, or managed ELT.

Most "best data migration tools" lists are written by vendors ranking their own product first. They compare tools on feature checkboxes, not on the job you actually need done. That framing leads teams to reach for a managed CDC platform when a single command-line invocation would have finished the job in an afternoon, or to script pg_dump loops when they really need continuous replication with sub-second lag.

The useful question is not "which tool is best." It is "which tool fits this workload." A one-time bulk copy from an old MySQL box into PostgreSQL has almost nothing in common with keeping a warehouse in sync with a production database that takes thousands of writes per second. This post compares four tools that each own a different corner of that space: AWS DMS, Debezium, Fivetran, and pgloader, with an honest note on where Airbyte fits. Pick by workload, and the choice usually makes itself.

The Four Migration Workloads That Actually Matter

Before comparing tools, name the workload. In our migration engagements, nearly every project collapses into one of four shapes, and the tool follows from the shape.

One-time bulk migration. You are moving a database once, then decommissioning the source. Cutover is a scheduled event, not a continuous process. What you care about: throughput, type fidelity, and how much of the schema (indexes, constraints, sequences) the tool recreates for you. Lag after cutover is irrelevant because there is no "after."

Ongoing CDC replication. The source stays live and you need every insert, update, and delete reflected downstream with low latency, indefinitely. This is the hardest workload. Change Data Capture (CDC) is a technique that reads a database's transaction log (the WAL in PostgreSQL, the binlog in MySQL, the redo log in Oracle) and emits each row-level change as an event, rather than repeatedly querying tables. Reading the log instead of polling is what keeps source-side overhead low and ordering correct, and it is one of the core pipeline shapes we lay out in ETL pipeline patterns in 2026.

Managed SaaS ELT. You want operational databases and SaaS apps (Salesforce, Stripe, Postgres) landing in Snowflake, BigQuery, or Redshift, and you do not want to run or babysit the pipeline. You are trading money for the elimination of connector maintenance and schema-drift handling. For the difference between transforming before versus after the load, see ETL vs ELT.

Single-command Postgres loads. A narrow but extremely common case: you are migrating into PostgreSQL specifically, often from MySQL, SQLite, or MS SQL Server, and you want type casting, encoding fixes, and schema creation handled in one shot without standing up infrastructure.

Match the workload to the tool and the rest of this article is just detail.

AWS DMS: The Default for Lift-and-Shift Into AWS

AWS Database Migration Service is a managed service for migrating and replicating databases, supporting both one-time full loads and ongoing CDC. It runs the migration on AWS-managed capacity, handles the source-to-target plumbing, and supports heterogeneous moves (for example MySQL to PostgreSQL) when paired with the AWS Schema Conversion Tool. Sources include Oracle, SQL Server, MySQL, and PostgreSQL; targets span Aurora, RDS, Redshift, S3, DynamoDB, and more, per the AWS DMS sources documentation.

DMS offers three replication modes: full load, CDC, and full load plus CDC. The combined mode is the workhorse for zero-downtime cutover: it bulk-copies existing rows, then tails the source transaction log to keep the target current until you flip traffic. AWS DMS Serverless removes the replication-instance sizing decision: it provisions and scales capacity automatically and bills per hour in DMS Capacity Units (DCUs), where one DCU equals 2GB of RAM. That pay-for-use model suits bursty migrations where a fixed instance would sit idle or get overwhelmed.

The honest caveats matter. DMS Serverless does not support setting custom CDC start points, per the Serverless limitations docs, so replay-from-a-specific-LSN recovery scenarios still want the provisioned instance. CDC also has hard source-side requirements: tables generally need primary keys for updates and deletes to replicate correctly, and the source must be configured for log retention and the right logical replication settings. DMS is strongest as a lift-and-shift engine into AWS targets. It is weaker as a long-term streaming backbone where you want changes flowing into Kafka and consumed by many systems. For the heterogeneous MySQL-to-Postgres case specifically, weigh DMS against pgloader and Debezium in our MySQL to PostgreSQL migration guide.

Debezium: CDC as a Streaming Primitive

Debezium is an open-source CDC platform, licensed under Apache License 2.0, that reads database transaction logs and emits row-level change events, typically onto Apache Kafka via Kafka Connect. It supports MySQL, PostgreSQL, MongoDB, SQL Server, Oracle, MariaDB, Vitess, and others, as listed in the Debezium connectors documentation. Active development continues into 2026, with the 3.4 release line shipping through early 2026.

The reason to choose Debezium over DMS for replication is architectural, not feature-by-feature. Debezium does not move data from point A to point B; it turns your database's change log into a durable, replayable event stream that any number of consumers can read independently. One change event can fan out to a warehouse, a search index, a cache invalidator, and an audit log at once, each consumer tracking its own offset. That fan-out is awkward to model in a point-to-point migration service. Because the stream lives in Kafka, you also get replay: a downstream system can be rebuilt from the retained log without touching the source database again.

The tradeoff is operational weight. Debezium means running Kafka and Kafka Connect (or a managed equivalent), monitoring connector offsets, handling schema changes through a schema registry, and reasoning about exactly-once versus at-least-once delivery in your consumers. For a one-time copy this is wildly over-engineered. For a durable, multi-consumer replication backbone it is the right primitive, and the Apache 2.0 license means no per-row metering on the data itself. Debezium is broad enough to deserve its own treatment; this comparison covers only its role relative to the alternatives. If your decision is leaning toward log-based streaming, our Debezium production CDC patterns guide is where that depth lives.

Fivetran and Airbyte: Paying to Not Run the Pipeline

Fivetran is a proprietary, fully managed ELT platform that extracts from hundreds of sources and loads into cloud warehouses and lakes, handling schema drift and connector maintenance for you. Its defining characteristic is the pricing model: Fivetran bills on Monthly Active Rows (MAR), the count of distinct rows added, updated, or deleted in a given month, as described in its usage-based pricing docs. The strategic implication is direct. MAR pricing is friendly to wide-but-stable datasets and punishing to high-churn tables where the same rows update constantly. Before committing, profile your update patterns; a chatty status table can dominate the bill while contributing little analytical value.

What you buy with Fivetran is the elimination of pipeline operations. Connectors are maintained by the vendor, API schema changes are absorbed upstream, and you do not page anyone at 2am when a source adds a column. That is genuinely worth money for teams whose engineers should be building product, not patching connectors. The cost is reduced control, data egress to a third party, and a bill that scales with data change rather than with a fixed budget. See our What is Fivetran primer for the platform model in depth.

Airbyte is the honest alternative to mention here: an open-source-licensed ELT platform (its platform under Elastic License 2.0, connectors largely MIT) with 600+ connectors and both self-hosted and cloud options, per Airbyte's license docs. Self-hosting Airbyte trades a vendor bill for infrastructure and maintenance you own, which appeals to teams with capacity to run it and a need to avoid MAR-style metering or keep data in their own environment. It is not free in the operational sense, just free in licensing. The What is Airbyte page covers the deployment options.

pgloader: One Command, Postgres-Only, Done by Lunch

pgloader is an open-source data-loading tool, released under The PostgreSQL License, that migrates from MySQL, SQLite, MS SQL Server, and several file formats into PostgreSQL in a single command, creating the target schema, casting types, and loading data in one pass. Its scope is deliberately narrow: the target is always PostgreSQL. Within that scope it is the fastest path from "I have a MySQL dump" to "it runs on Postgres," documented in the pgloader introduction.

The value is that pgloader does the tedious parts of a heterogeneous migration automatically. It maps source types to sensible PostgreSQL types, handles encoding conversion, recreates indexes and constraints after the bulk load (which is faster than loading with them live), and continues past individual row errors while logging them rather than aborting the whole run. A migration that would be a multi-step mysqldump, sed-the-DDL, manually-fix-types ordeal becomes a single invocation against a small config file. For the full walkthrough, see pgloader: migrating databases to PostgreSQL in a single command.

Know its limits. pgloader is built for one-time and repeatable bulk loads, not continuous CDC; there is no transaction-log tailing keeping the target live after the load finishes. Very large databases may need batching and tuning rather than a naive single run. And because the target is Postgres only, it is irrelevant to any migration that does not end there. For Postgres-bound bulk work, though, nothing else on this list competes on time-to-done. (The same single-command philosophy does not extend to search engines; moving from Elasticsearch to OpenSearch is its own problem, covered in our OpenSearch data migration guide.)

The Comparison Table

Tool	Type	CDC support	Managed / Self-hosted	Sources / Targets	Pricing model	Best for
AWS DMS	Managed migration & replication service	Yes (full load, CDC, full+CDC)	Managed (provisioned or Serverless)	Oracle, SQL Server, MySQL, PostgreSQL sources; Aurora, RDS, Redshift, S3, DynamoDB targets	Hourly; Serverless billed per DCU-hour (1 DCU = 2GB RAM)	Lift-and-shift and zero-downtime cutover into AWS targets
Debezium	Open-source CDC platform on Kafka Connect	Yes (log-based, core purpose)	Self-hosted (or via managed Kafka providers)	MySQL, PostgreSQL, MongoDB, SQL Server, Oracle, MariaDB, Vitess and more to Kafka	Free (Apache License 2.0); you pay for Kafka infrastructure	Durable, multi-consumer streaming replication and fan-out
Fivetran	Proprietary managed ELT	Yes (for supported DB sources)	Managed SaaS	350+ sources to Snowflake, BigQuery, Redshift, lakes	Usage-based on Monthly Active Rows (MAR)	Hands-off SaaS-and-DB-to-warehouse ELT
Airbyte	Open-source / source-available ELT	Yes (for supported DB sources)	Self-hosted or Cloud	600+ connectors to warehouses, lakes, DBs	Open source (ELv2 platform / MIT connectors); Cloud is usage-based	Owning the ELT stack to avoid MAR metering or keep data in-house
pgloader	Open-source bulk loader	No	Self-hosted CLI	MySQL, SQLite, MS SQL, files to PostgreSQL	Free (The PostgreSQL License)	One-time / repeatable bulk loads into PostgreSQL

Key Takeaways

Name the workload first. One-time bulk, ongoing CDC, managed ELT, and single-command Postgres loads are different problems; the tool is downstream of the workload, not the other way around.
AWS DMS is the pragmatic default for lift-and-shift and zero-downtime cutover into AWS, with Serverless removing instance sizing. Watch CDC primary-key requirements and the no-custom-start-point limit in Serverless.
Debezium wins when replication is a long-term streaming backbone with many consumers and replay needs. It is over-engineered for a one-time copy and carries Kafka's operational weight, but the Apache 2.0 license means no per-row data metering.
Fivetran sells the elimination of pipeline operations, priced on Monthly Active Rows. Profile your update churn before committing, since high-churn tables drive the bill. Airbyte is the self-hostable alternative if you would rather own infrastructure than pay MAR.
pgloader is unbeatable for one-shot loads into PostgreSQL and irrelevant for everything else. If the target is Postgres and the job is once, start here.
There is no single "best data migration tool." There is only the tool that matches your workload, your target, and your tolerance for operating infrastructure.

Most stalled migrations we are called into picked the tool before they defined the workload. Define the workload first and the shortlist usually narrows to one. If you are weighing these options for a live system, our data migration practice does exactly this workload-to-tooling matching, including the messy parts that surface only at cutover: sequences, blob columns, and custom types.