Hands-on Presto: Fast SQL on Anything

Everything you need to know about Presto SQL to get started querying and analyzing data on S3, HDFS and pretty much anywhere.

1 day
20
Instructor-led, hands-on exercises
English / Hebrew
Bring your own (installation instructions will be sent prior to course start)
Included

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources.

Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, Uber and many more, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.

Join industry giants who use Presto to provide human analysts and automatic processes alike access for querying data at huge scales, across many data-sources (S3, SQL databases, NoSQL databases, and more).

Objectives

Participants will learn and experience Presto using hands-on exercises demonstrating Presto's key capabilities:

  • Install, deploy and configure Presto.
  • Query data on S3 and HDFS using standard SQL.
  • Execute SQL queries across multiple datasources using Presto's query federation.
  • Understand what it takes to deploy and use Presto in real-word scenarios.

Prerequisites

No programming experience is required. Basic Linux skills and SQL knowledge are required.

Syllabus

  • Big Data, Data Warehousing, Data Lakes and Clouds.
  • What is Presto and why is it needed.
  • Use cases.
  • Presto Architecture.
  • Catalogs, Schemas and Tables.
  • Installation and configuration.
  • The Presto CLI and Web UI.
  • Data Sources and Connectors.
  • Lab: Using the Apache Hive connector to query data on HDFS and S3.
  • JDBC and ODBC connectivity.
  • Using Presto from BI Tools and IDEs.
  • Lab: Query Presto using Superset, Redash or Zeppelin.
  • Partitioning and Bucketing.
  • File formats: Avro, ORC, Parquet.
  • Lab: Analyzing real data at scale on S3.
  • Query planning and execution.
  • Cost-based optimizations.
  • Query performance monitoring and tuning.
  • Understanding joins and spill to disk.
  • More built-in connectors: MySQL, PostgreSQL.
  • Query relational data using Presto.
  • Lab: Executing cross data-sources queries with Presto.
  • Deployment options and administrative tools.
  • Cluster best practices and high-availability.
  • Resource groups.
  • Security overview: Authentication, Authorization and Encryption.

Ready to get started?

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.