What is PostgreSQL?

PostgreSQL is an open-source, object-relational database management system known for its standards compliance, reliability, and unusually wide feature set for an open-source database. It implements most of the SQL:2023 standard, supports ACID transactions with serializable isolation, handles JSON and semi-structured data as a first-class type, and -- through its extension architecture -- adapts cleanly to workloads that other databases need separate products to handle: geospatial (PostGIS), vector search (pgvector), time-series (TimescaleDB), and distributed SQL (Citus).

The project began as POSTGRES at UC Berkeley under Michael Stonebraker in 1986 -- as a successor to Ingres -- and was renamed PostgreSQL in 1996 when SQL support was added. Today it's developed by the PostgreSQL Global Development Group under the permissive PostgreSQL License, with a major release every year and minor releases every quarter. Stack Overflow's developer surveys have ranked it as the most-loved and most-used database for several consecutive years.

What Makes PostgreSQL Different

Most relational databases share the same surface (tables, SQL, transactions). The differences are underneath. A few PostgreSQL design choices show up everywhere in practice.

Object-relational, not just relational. Tables can inherit from other tables. Types are user-definable. Functions and operators can be overloaded. Arrays, composite types, ranges, and JSON are first-class column types -- not stringly-typed afterthoughts. This is what makes PostgreSQL feel "extensible" in a way that databases like MySQL don't.

Strong correctness and standards compliance. PostgreSQL implements more of the SQL standard than any other widely used open-source database -- window functions, common table expressions (including recursive), LATERAL joins, FILTER clauses, MERGE statements, full transactional DDL, deferred constraints. ACID guarantees go all the way through, including for index updates and DDL.

MVCC by design. Every transaction sees a consistent snapshot of the database. Readers don't block writers; writers don't block readers. The cost is that PostgreSQL accumulates dead row versions ("bloat") that VACUUM has to reclaim -- one of the operational characteristics every Postgres operator learns to monitor.

Extension architecture. Functions, types, operators, index methods, and even storage layers can be added as installable extensions without forking the database. PostGIS, pgvector, pg_partman, pg_stat_statements, pg_cron, TimescaleDB, Citus, and Apache AGE all extend PostgreSQL through this mechanism. This is the single biggest reason PostgreSQL has stayed relevant across decades of changing workload patterns.

Permissive license. The PostgreSQL License is similar to MIT/BSD. No vendor controls the core. AWS, Google, Microsoft, EnterpriseDB, Crunchy Data, Supabase, Neon, and many others build commercial offerings on top -- which is why the ecosystem is so deep.

Core Features

ACID transactions with multiple isolation levels. Read committed (default), repeatable read, and serializable -- with PostgreSQL's serializable level using Serializable Snapshot Isolation (SSI), which provides true serializability without the locking overhead of classic two-phase locking.

Rich SQL. Window functions, CTEs (including recursive), LATERAL joins, MERGE, FILTER, GROUPING SETS / CUBE / ROLLUP, full-text search, regular expressions, generated columns, partial and expression indexes.

Indexing. B-tree, hash, GiST, GIN, BRIN, SP-GiST, and (via pgvector) HNSW and IVFFlat. Index types matter -- the right choice can change query latency by orders of magnitude. GIN for full-text and JSONB. BRIN for huge, naturally ordered tables. HNSW for vector similarity.

Partitioning. Native declarative partitioning by range, list, or hash. Combined with extensions like pg_partman, partition management for time-series and high-volume tables becomes tractable.

Replication. Streaming physical replication for read replicas and high availability, plus logical replication (publish/subscribe) for selective table-level replication, zero-downtime major version upgrades, and CDC. Tools like pgBackRest and Barman handle PITR-capable backups.

Foreign Data Wrappers (FDW). Query external data sources -- other Postgres servers, MySQL, MongoDB, S3, Snowflake -- as if they were local tables. SQL standard SQL/MED.

JSON and JSONB. Full document storage and querying alongside relational data, with JSONB providing indexed, binary-encoded storage. PostgreSQL is genuinely usable as a document store, though it doesn't replace MongoDB for document-heavy workloads.

Procedural languages. Functions and procedures in PL/pgSQL, PL/Python, PL/Perl, PL/JavaScript (via plv8), and others. Stored procedures with full transaction control (introduced in PG 11).

The Extensions That Made PostgreSQL Universal

PostgreSQL's extension ecosystem is what lets it span workloads that would otherwise need separate databases:

pgvector. Adds a vector type and HNSW / IVFFlat indexes for similarity search. It has become one of the default vector stores for RAG applications -- particularly when teams want to keep their primary database as the source of truth for both relational and vector data. AWS Aurora PostgreSQL, Google Cloud SQL, and Azure all support pgvector natively.

PostGIS. The dominant open-source geospatial database. Adds geometry and geography types, spatial indexes, and hundreds of spatial functions. Powers GIS systems, mapping platforms, and location-based applications.

TimescaleDB. Time-series extension that adds hypertables, automatic partitioning by time, continuous aggregates, and compression. Lets PostgreSQL handle workloads that would otherwise need a dedicated time-series database.

Citus. Horizontal scaling extension that shards tables across multiple PostgreSQL nodes. Microsoft acquired Citus Data in 2019 and offers it as a managed service. Useful for multi-tenant SaaS and large analytical workloads that have outgrown single-node Postgres.

pg_stat_statements. Query-level statistics: which queries ran, how often, how long they took, how much I/O they did. Essential for performance tuning -- one of the first extensions to enable on any production cluster.

Apache AGE. Graph database extension on top of PostgreSQL, supporting openCypher queries.

Other production-relevant extensions: pg_partman (partition management), pg_cron (in-database scheduling), pglogical (advanced logical replication), pg_repack (online table reorganization), pgaudit (detailed audit logging), pg_bouncer (connection pooling -- runs as a separate process but is part of essentially every production deployment).

PostgreSQL vs MySQL

The two dominant open-source relational databases overlap in obvious ways and diverge in important ones.

	PostgreSQL	MySQL
Engine model	Single storage engine, MVCC throughout	Pluggable engines; InnoDB is the default
Standards compliance	High; closer to ANSI SQL	Looser; many MySQL-specific quirks
Transactional DDL	Yes -- DDL is fully transactional	No -- DDL commits implicitly
Concurrency	MVCC with SSI for serializable	MVCC, but serializable uses locking
JSON support	Native JSONB with indexes	JSON type, less mature
Replication	Streaming + logical, mature	Robust, widely deployed; built-in async + GTID
Extensions	First-class	Limited
Foreign keys	Always enforced	Historically loose; better in recent versions
Vector search	pgvector	No native equivalent
Typical use	Complex workloads, OLTP+OLAP mixes, GIS, vector	High-throughput simple OLTP, especially read-heavy web apps

Both are excellent. MySQL still has higher raw simple-read throughput in many benchmarks; PostgreSQL has the richer feature set and stronger correctness guarantees. For complex schemas, mixed workloads, GIS, time-series, or vector search, PostgreSQL is the more common choice today.

PostgreSQL in the Cloud

Most production PostgreSQL today runs on managed services:

Amazon RDS for PostgreSQL -- AWS's general managed PostgreSQL.
Amazon Aurora PostgreSQL -- AWS's cloud-native rewrite with distributed storage, faster failover, and storage that scales to 256 TiB.
Amazon Aurora DSQL -- AWS's serverless distributed PostgreSQL-compatible database (GA May 2025) for active-active multi-region deployments.
Google Cloud SQL for PostgreSQL and AlloyDB -- GCP's standard and high-performance managed options.
Azure Database for PostgreSQL -- Microsoft's managed offering, with the Cosmos DB for PostgreSQL variant providing Citus-based horizontal scaling.
Supabase -- developer-friendly platform built on PostgreSQL with auth, real-time, and storage layered on top.
Neon -- serverless PostgreSQL with branching, scale-to-zero, and bottomless storage on S3.
Crunchy Bridge, EnterpriseDB Cloud, Aiven, Timescale Cloud, Tembo -- specialized managed offerings with their own takes on operations, extensions, and pricing.

The PostgreSQL wire protocol has also become a target for non-Postgres systems -- CockroachDB, YugabyteDB, Aurora DSQL, and Greenplum all speak it, even though their storage engines are entirely different. "Postgres-compatible" is now a product category.

Common Use Cases

Transactional applications. The dominant use case. Web and mobile backends, SaaS platforms, e-commerce, fintech. ACID guarantees, foreign keys, and rich SQL make it well-suited for application code that depends on the database for correctness.

Multi-tenant SaaS. Schema-per-tenant or row-level security (RLS) patterns are well-supported. Citus or Aurora Limitless extends multi-tenancy across shards when single-node capacity runs out.

Geospatial applications. PostGIS is the open-source standard. Mapping, logistics, real estate, mobility.

Vector search and RAG backends. pgvector lets teams co-locate vectors with the relational data they describe -- often simpler than running a separate vector database.

Analytical workloads (with caveats). Postgres handles moderate analytical workloads well -- aggregations over tens of millions of rows are fine. For larger or more concurrent analytical workloads, columnar engines like ClickHouse or cloud warehouses like Snowflake are typically the right destination, fed from Postgres via CDC or ELT.

Time-series and IoT. With TimescaleDB or native partitioning, Postgres handles substantial time-series workloads without giving up SQL or the relational layer.

Source of truth for data pipelines. Logical replication and tools like Debezium make Postgres a clean CDC source. Changes flow into Kafka topics, then into warehouses, lakes, search engines, and analytical stores.

Challenges and Operational Realities

VACUUM and bloat. MVCC creates dead row versions that VACUUM has to clean up. Default autovacuum settings are conservative; high-write tables routinely need tuning. Long-running transactions and slow replication can block VACUUM and cause bloat to spiral. This is the single most common operational issue in production PostgreSQL.

Connection management. PostgreSQL uses a process-per-connection model, which makes connections expensive. PgBouncer or RDS Proxy in front of the database is standard for any high-concurrency application.

Replication lag and failover. Streaming replication is asynchronous by default. Failover with tools like Patroni, repmgr, or managed services is robust but requires planning -- particularly for applications that can't tolerate stale reads on replicas.

Major version upgrades. Major versions are mostly backward compatible but require pg_upgrade or logical replication to move between them. Managed services handle this with some convenience, but the underlying mechanics still apply.

Write scaling beyond a single node. Vanilla PostgreSQL is single-writer. Beyond a certain volume, options narrow to: vertical scaling, read replicas (for reads only), Citus / Aurora Limitless / Cosmos DB for PostgreSQL (sharded), CockroachDB / YugabyteDB / Aurora DSQL (Postgres-compatible distributed), or offloading specific workloads (analytics to a warehouse, search to OpenSearch, real-time to ClickHouse).

Index design. Indexes are the single biggest lever on query performance. The flip side: too many indexes slow writes and bloat the cache. EXPLAIN ANALYZE, pg_stat_statements, and pg_stat_user_indexes are the standard tools for understanding what's actually happening.

Where PostgreSQL Fits in the Broader Data Stack

PostgreSQL is the canonical operational database for most modern applications. It rarely lives alone in production -- it sits alongside specialized data systems that handle workloads PostgreSQL isn't optimized for:

Analytical queries scanning billions of rows live in ClickHouse, Snowflake, or a lakehouse.
Full-text search and log analytics run on Elasticsearch or OpenSearch.
Streaming and event integration go through Kafka topics and Flink.
Vector search either stays in Postgres (pgvector) or moves to a dedicated vector database when scale demands it.
Change data capture from Postgres via Debezium feeds data pipelines into all of the above.

The architecture question for most teams isn't "Postgres or something else?" It's "what stays in Postgres, what gets offloaded, and how does data flow between them?"

BigDataBoutique and PostgreSQL

We design and operate PostgreSQL deployments alongside the broader data stack -- replication topologies, performance tuning, extension selection, CDC integration with Kafka and Flink, and offloading to analytical and search engines. See our data engineering consulting services, or get in touch to discuss your architecture.