Hands-on Presto: Fast SQL on Anything Training Course

Next courses

January 26, 2022 — Virtual

Length:

1 day

Max students in class:

Delivery method:

Instructor-led, hands-on exercises

Language:

English / Hebrew

Laptop:

Bring your own (installation instructions will be sent prior to course start)

Lunch:

Included

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources.

Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, Uber and many more, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.

Join industry giants who use Presto to provide human analysts and automatic processes alike access for querying data at huge scales, across many data-sources (S3, SQL databases, NoSQL databases, and more).

Objectives

Participants will learn and experience Presto using hands-on exercises demonstrating Presto's key capabilities:

Install, deploy and configure Presto.
Query data on S3 and HDFS using standard SQL.
Execute SQL queries across multiple datasources using Presto's query federation.
Understand what it takes to deploy and use Presto in real-word scenarios.

Prerequisites

No programming experience is required. Basic Linux skills and SQL knowledge are required.

Syllabus

Module 1 - Introduction to Presto

Big Data, Data Warehousing, Data Lakes and Clouds.
What is Presto and why is it needed.
Use cases.
Presto Architecture.
Catalogs, Schemas and Tables.
Installation and configuration.

Module 2 - The Presto Ecosystem

The Presto CLI and Web UI.
Data Sources and Connectors.
Lab: Using the Apache Hive connector to query data on HDFS and S3.
JDBC and ODBC connectivity.
Using Presto from BI Tools and IDEs.
Lab: Query Presto using Superset, Redash or Zeppelin.

Module 3 - Real-world Presto

Partitioning and Bucketing.
File formats: Avro, ORC, Parquet.
Lab: Analyzing real data at scale on S3.
Query planning and execution.
Cost-based optimizations.
Query performance monitoring and tuning.
Understanding joins and spill to disk.

Module 4 - Query Federation

More built-in connectors: MySQL, PostgreSQL.
Query relational data using Presto.
Lab: Executing cross data-sources queries with Presto.

Module 5 - Cluster and Data Source Administration

Deployment options and administrative tools.
Cluster best practices and high-availability.
Resource groups.
Security overview: Authentication, Authorization and Encryption.

Hands-on Presto: Fast SQL on Anything

Everything you need to know about Presto SQL to get started querying and analyzing data on S3, HDFS and pretty much anywhere.

Next courses

Objectives

Prerequisites

Syllabus

Ready to get started?

Hands-on Presto: Fast SQL on Anything

Everything you need to know about Presto SQL to get started querying and analyzing data on S3, HDFS and pretty much anywhere.

Next courses

Objectives

Prerequisites

Syllabus

Ready to get started?

Related courses