Cloud computing is the on-demand delivery of computing resources -- servers, storage, databases, networking, software, analytics, and AI services -- over the internet, billed by consumption rather than upfront purchase. Instead of buying and racking physical hardware, organizations provision virtual resources through an API in seconds, scale them up or down with demand, and pay only for what they actually use. It's the shift from capital expenditure to operating expenditure, and from infrastructure-as-a-thing-you-own to infrastructure-as-a-service.
The U.S. National Institute of Standards and Technology (NIST), in Special Publication 800-145 (2011), defined cloud computing through five essential characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. That definition is now over a decade old and still holds. What's changed is scale -- the major hyperscalers now operate tens of millions of servers across hundreds of data centers globally, and serverless and managed services have pushed the abstraction far above raw compute.
How Cloud Computing Works
Underneath, cloud providers operate large data centers full of physical servers, storage arrays, and networking gear. On top of that physical infrastructure, virtualization (hypervisors, containers) and orchestration (control planes, schedulers) carve the hardware into logical units that customers can rent in fine-grained increments -- a CPU core, a gigabyte of memory, a millisecond of function execution.
Customers interact through APIs, CLIs, and consoles. A RunInstances call provisions a virtual machine in seconds. A CreateBucket call allocates effectively unlimited object storage. A managed service like AWS RDS or BigQuery hides the underlying VMs entirely -- the customer requests a database or a query, and the provider's automation handles provisioning, patching, scaling, and replication.
The economic model is metered consumption: per-second compute billing, per-GB storage, per-request API calls, per-token model invocations. There's no contract negotiation per VM. There's also no upfront license. This is what makes cloud computing operationally different from buying servers, even when the total cost works out comparable.
Service Models: IaaS, PaaS, SaaS
The NIST taxonomy distinguishes three service models, each at a different level of abstraction.
IaaS (Infrastructure as a Service). Virtual machines, block storage, virtual networks, load balancers, firewalls. The customer manages the OS, the middleware, the runtime, and the application. Examples: Amazon EC2, Azure Virtual Machines, Google Compute Engine. IaaS gives the most control and the most operational responsibility.
PaaS (Platform as a Service). A managed runtime for applications, databases, message queues, search engines, ML platforms. The customer manages the application; the provider manages the platform underneath. Examples: AWS RDS, Azure App Service, Google Cloud Run, Amazon OpenSearch Service, Amazon MSK, Databricks. PaaS is where most production workloads now live -- the operational savings outweigh the loss of low-level control for the vast majority of teams.
SaaS (Software as a Service). End-user applications delivered over the internet. The customer just uses the software. Examples: Salesforce, Slack, Microsoft 365, Snowflake, Datadog. The line between PaaS and SaaS is blurry for developer tools and data platforms that fit both descriptions.
Two newer categories worth naming:
- FaaS (Function as a Service) / serverless. AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers. Code runs in response to events; the provider handles all scaling and infrastructure. Pricing is per request and per millisecond of execution.
- MaaS (Model as a Service). Foundation model APIs like Amazon Bedrock, Azure OpenAI Service, Google Vertex AI, Anthropic API, and OpenAI API. Customers consume large language models through an API and pay per token. This category effectively didn't exist until 2023.
Deployment Models
Public cloud. Shared infrastructure operated by a provider, accessed over the public internet (or via private connectivity like Direct Connect). The dominant model. AWS, Azure, GCP, plus smaller players like Oracle Cloud, IBM Cloud, Alibaba Cloud, OVHcloud, DigitalOcean.
Private cloud. Cloud-style infrastructure operated exclusively for a single organization, either on-premises or in a dedicated facility. VMware Cloud Foundation, OpenStack, Nutanix. Often chosen for regulatory, data sovereignty, or latency reasons.
Hybrid cloud. A deliberate mix of public and private cloud (and often on-premises) infrastructure, with workloads placed where they fit best. AWS Outposts, Azure Arc, Google Anthos provide consistent control planes across this mix.
Multi-cloud. Using more than one public cloud provider concurrently, either for resilience, regulatory reasons, vendor leverage, or because different services are best-of-breed on different providers. Multi-cloud is harder than single-cloud, particularly for data-gravity workloads -- egress costs and operational complexity add up fast.
Major Providers
The market has consolidated around three hyperscalers:
Amazon Web Services (AWS). Launched in 2006 with S3 and EC2, AWS pioneered modern cloud computing. The largest provider by revenue and breadth of service. 200+ services across compute, storage, databases, analytics, ML, IoT, and more.
Microsoft Azure. Strong in enterprise and Microsoft-shop environments, with deep integration to Office 365, Active Directory, and the broader Microsoft developer ecosystem. Significant share in regulated industries.
Google Cloud Platform (GCP). Strong in data analytics (BigQuery), AI/ML (Vertex AI), and Kubernetes (which Google originally open-sourced). Generally smaller in raw revenue than AWS and Azure but with deep technical strengths.
Beyond the big three: Oracle Cloud Infrastructure (OCI) for Oracle-centric workloads, IBM Cloud, Alibaba Cloud (dominant in China and growing in APAC), and a long tail of regional and specialized providers. Plus edge-focused providers (Cloudflare, Fastly) and AI-focused providers (CoreWeave, Lambda) that have grown sharply with the GPU demand from training and inference workloads.
Why Organizations Use Cloud Computing
Elasticity. Match capacity to demand. Scale up during traffic spikes, scale down overnight. No capacity planning for peak load on infrastructure that sits idle the rest of the time.
Speed. New environments in minutes instead of months. New services and new geographies are an API call away. Time to market shrinks dramatically.
Capex to opex. No multi-year hardware purchases. No data center leases. Pay as you go, scale as you grow.
Managed services. Operating databases, message brokers, search engines, ML platforms, and analytics services is operationally heavy. Managed cloud versions (RDS, MSK, OpenSearch Service, BigQuery, Snowflake) trade some control for dramatic reductions in operational burden.
Global reach. Deploying to multiple regions worldwide is straightforward. Latency-sensitive applications can serve users from nearby data centers without building or leasing facilities.
Innovation surface. New capabilities -- foundation model APIs, vector search, real-time analytics, edge computing -- show up as cloud services first. Building on cloud means staying close to the frontier without owning the underlying R&D.
Challenges
Cost. Cloud bills are easy to grow and hard to predict. Unmonitored growth in compute, data egress, storage, and per-request services compounds. FinOps -- the discipline of managing cloud spend -- has emerged as a serious practice. Reserved capacity, savings plans, autoscaling, and architectural choices (right-sized instances, tiered storage, cross-region traffic management) all matter.
Vendor lock-in. Provider-specific APIs, managed services, and proprietary formats create switching costs. Building portable architectures (open formats, Kubernetes, open-source databases) reduces lock-in at the cost of giving up some managed-service convenience.
Data egress and gravity. Storage is cheap; moving data out of a cloud is expensive. Network egress fees can dominate the bill on poorly designed architectures, and migrating multi-petabyte datasets between clouds is a major project.
Operational complexity. Cloud doesn't eliminate operations -- it changes what you operate. IAM policies, network architecture, security groups, observability, cost monitoring, and managing dozens of managed services all become the operational surface.
Security and compliance. The shared responsibility model: the provider secures the cloud, the customer secures what's in the cloud. Misconfigured S3 buckets, leaked credentials, and permissive IAM policies are responsible for a long history of high-profile breaches. Compliance (HIPAA, PCI DSS, SOC 2, GDPR, FedRAMP) is supported but requires deliberate architecture and audit.
Reliability and the failure modes of managed services. Provider outages affect entire regions of customers at once. Architecting for resilience -- multi-AZ, multi-region, graceful degradation -- is the customer's job, not the provider's.
Cloud Computing and the Data Stack
The cloud has reshaped data architectures more than almost any other workload class. The major shifts:
- Storage decoupled from compute. Object storage (S3, GCS, Azure Blob) holds the data; compute engines scale independently. This is the foundation under data lakes, lakehouses, and modern warehouses alike.
- Managed analytical engines. Snowflake, BigQuery, Redshift, Databricks SQL. Customers run analytics without operating clusters.
- Managed streaming. Amazon MSK, Confluent Cloud, Azure Event Hubs, Google Pub/Sub. Kafka topics and equivalents as a service.
- Managed search. Amazon OpenSearch Service, Elastic Cloud. Production search infrastructure as a service.
- Managed AI/ML. Amazon Bedrock, Vertex AI, Azure OpenAI Service. Foundation models behind an API, with managed RAG, agents, and guardrails.
- Serverless and event-driven architectures. Lambda, Cloud Run, Azure Functions tied together by event buses and message queues.
A modern data platform might combine S3 for storage, Apache Iceberg for table format, Apache Flink on managed infrastructure for streaming, Snowflake or ClickHouse for analytics, OpenSearch for search, and Bedrock for GenAI -- all consumption-billed, all elastic, all operated by the providers. The architecture decisions are no longer about whether to use cloud; they're about which services to combine and how to keep costs and complexity under control.
BigDataBoutique and Cloud Data Platforms
We design and operate cloud data platforms on AWS, GCP, and Azure -- from architecture and migration to ongoing optimization and cost management. As an AWS Advanced Tier Services Partner with Amazon OpenSearch Service Delivery and AWS AI Services Competency, we have deep experience with the AWS analytics, search, and AI stack, and work extensively with Snowflake, Databricks, ClickHouse, and the broader cloud data ecosystem. See our services page, or get in touch to discuss your architecture.