pgloader: Migrating Databases to PostgreSQL in a Single Command

pgloader is an open-source tool that migrates MySQL, MS SQL, SQLite, and CSV data into PostgreSQL with automatic schema conversion, parallel loading, and resilient error handling. Learn how to use it, what it handles well, and where it falls short.

Database migrations rank among the most dreaded tasks in engineering. Moving from MySQL or SQL Server to PostgreSQL means dealing with incompatible data types, broken schemas, and the constant risk of silent data loss. pgloader is an open-source command-line tool that handles the entire migration - schema conversion, data loading, and type mapping - in a single command. It reads directly from a source database (or file), translates the schema, and streams data into PostgreSQL using the native COPY protocol.

What sets pgloader apart from a raw COPY or a Foreign Data Wrapper is its error resilience. PostgreSQL's default transactional behavior means a single bad row aborts the entire batch. pgloader instead logs rejected rows to a separate file and keeps loading the rest of your data. For large, messy production databases, that difference is the one that matters.

Supported Sources and Core Capabilities

pgloader is a data loading tool for PostgreSQL, written in Common Lisp, with 6,300+ stars on GitHub. It supports migrating from MySQL, MS SQL Server, SQLite, CSV, fixed-width text, dBase3 (DBF), and IXF files into PostgreSQL. The current stable release is version 3.6.9, licensed under the PostgreSQL License.

During a database-to-database migration, pgloader connects to the source, reads catalog metadata, and constructs an equivalent PostgreSQL schema automatically. It handles:

Auto-increment columns converted to bigserial
MySQL identifiers downcased to follow PostgreSQL conventions (with reserved keyword escaping)
Invalid dates like 0000-00-00 transformed to NULL
Indexes, foreign keys, comments, and constraints carried over from the source schema

Parallel loading is built in. pgloader distributes work across multiple threads, which helps significantly when migrating databases with many tables or large row counts.

How It Compares to Alternatives

Choosing the right migration tool depends on your source database, your tolerance for manual configuration, and whether you need ongoing replication or a one-shot migration.

Tool	Source DBs	Direction	Replication	Cost	Best For
pgloader	MySQL, MSSQL, SQLite, CSV, DBF	One-way to PG	No	Free/OSS	One-shot full migrations
AWS DMS	20+ sources	Bi-directional	CDC supported	Pay-per-use	Cloud-native, ongoing sync
ora2pg	Oracle only	One-way to PG	No	Free/OSS	Oracle-to-PostgreSQL
pg_chameleon	MySQL only	One-way to PG	Yes (CDC)	Free/OSS	Low-downtime MySQL migrations
DBConvert	Multiple	Bi-directional	Yes	Commercial	GUI-based, no CLI needed

pgloader wins on simplicity for one-time migrations from MySQL or SQL Server. If you need continuous replication or a managed service, AWS DMS or pg_chameleon fill that gap.

Running a Migration: From One-Liner to Load Files

The simplest pgloader invocation is a single command that takes a source and target connection string:

pgloader mysql://root@localhost/mydb postgresql:///mydb

That's it. pgloader connects to MySQL, discovers the schema, creates tables in PostgreSQL, and streams the data over. For remote servers with authentication:

pgloader mysql://user:pass@mysql-host:3306/sourcedb \
           postgresql://pguser:pass@pg-host:5432/targetdb

Load Files for Complex Migrations

Real migrations rarely work with defaults alone. pgloader uses a declarative configuration format called a "load file" where you specify casting rules, schema renaming, and filtering. Here is a practical example:

LOAD DATABASE
    FROM mysql://root@localhost/legacy_app
    INTO postgresql://deploy@localhost/new_app
  
    ALTER SCHEMA 'legacy_app' RENAME TO 'public'
  
    CAST column users.id TO uuid DROP TYPEMOD,
         type int WITH EXTRA auto_increment TO bigserial,
         type tinyint TO boolean USING tinyint-to-boolean
  
    INCLUDING ONLY TABLE NAMES MATCHING 'users', 'orders', 'products'
  
    SET maintenance_work_mem TO '512MB',
        work_mem TO '48MB';

The CAST section is where most customization happens. You can remap individual columns, override type conversions globally, or apply Common Lisp transformation functions for complex reformatting.

Loading CSV Files

pgloader also loads flat files. From stdin:

pgloader --type csv \
    --field "id,name,email,created_at" \
    --with "skip header = 1" \
    --with "fields terminated by ','" \
    - postgresql:///mydb?users < users.csv

This is useful when migrating data exports that don't come from a live database connection.

Common Pitfalls and How to Avoid Them

pgloader handles a lot automatically, but production migrations surface edge cases that trip up even experienced engineers.

Memory exhaustion on large tables. pgloader buffers rows in memory during transfer. On tables with hundreds of millions of rows, this can trigger "heap exhausted" errors. The fix: tune --prefetch-rows and --concurrency settings, or use rows per range in your load file to control batch sizes. Start with a value like 25000 and adjust based on your row width.

Data type casting surprises. pgloader's default type mapping works for most cases, but some conversions catch people off guard. MySQL tinyint(1) maps to boolean by default - fine if you are using it as a boolean, problematic if you are storing small integers. The bigint type in MS SQL may not map cleanly without an explicit CAST rule. Always run a dry-run first with --dry-run to see the generated schema before loading any data.

Encoding mismatches. Source databases with mixed or incorrectly declared character encodings produce garbled text in PostgreSQL. Specify the encoding explicitly in your connection string or load file. pgloader supports SET client_encoding TO 'UTF8' and similar directives.

Performance bottlenecks from shared disks. Running both the source and target database on the same machine - or the same disk - cripples throughput. The pgloader documentation recommends separate hardware for source and target with a high-bandwidth network between them. In benchmarks, migrations on dedicated infrastructure have achieved around 28 MB/s transfer rates for multi-gigabyte databases, completing 2.5 GB transfers in under 2 minutes.

Trigger and stored procedure gaps. pgloader migrates schema objects like tables, indexes, and constraints, but it does not convert stored procedures, triggers, or views with MySQL-specific SQL syntax. Those require manual porting or a tool like ora2pg (for Oracle sources). Plan for this work separately.

Key Takeaways

pgloader is a free, open-source CLI tool that migrates MySQL, MS SQL, SQLite, and file-based data into PostgreSQL with automatic schema conversion.
Its standout feature is resilient error handling: bad rows are logged and skipped rather than aborting the entire load.
For simple migrations, a single command with two connection strings is all you need. For complex cases, load files give you full control over type casting, schema renaming, and table filtering.
Always run --dry-run first to catch type mapping issues before they corrupt your data.
pgloader does not handle stored procedures, triggers, or complex views - budget time for manual conversion of those objects.
For ongoing replication or CDC requirements, pair pgloader with a tool like pg_chameleon or AWS DMS, since pgloader only does one-shot migrations.