
Introduction to BigData and Cloud Technologies
BigData and Cloud explained with real-world examples in this intensive 1-day workshop
How can we store or process large volumes of data? how to deal with a massive stream of events coming at high velocity? what really is BigData? How does a BigData-ready system look like? and how can Clouds help?
The topics of BigData and Cloud technologoies are being mentioned a lot, but it's hard to know where to start or what technologies to use.
Join our internationally renowned instructors for a full day packed of knowledge sharing. Let us give you an overview with short deep-dives into the vast landscapes of BigData, everything you need to get started with the technologies that changed the world.
Objectives
This course is aimed at giving a good overview for developers and decision-makers new to the field, as well as giving insights and valuable pointers to developers with practical experience.
- Understanding the challenges with BigData, and approaches for solving them.
- Storage, Compute and Stream Processing - and an overview of commonly used technologies.
- Showcasing the characteristics and architecture of a BigData-ready system.
- Using Cloud technologies, and overview of the notable ones.
Prerequisites
None.
Syllabus
- What is BigData? and more importantly, what is not BigData?
- Dealing with the challenges of Volume, Velocity and Variety of data.
- The Hadoop ecosystem and it's current state
- Why do we need new technologies?
- Distributed file systems (HDFS, consistent hashing, and more)
- File formats, and why they matter
- NoSQL and the CAP Theorem
- Properties of modern storage solution (Schemaless, redundancy, relaxed consistency)
- Polyglot persistence
- Overview of NoSQL technologies
- Batch processing and why we can't do ad-hoc querying.
- The Map/Reduce model and locality of data
- Hadoop's MapReduce and YARN
- Computation frameworks like Apache Spark and Flink
- Monitoring of batch jobs and Spark computations
- Machine learning
- Distributed systems, microservices, containers and 'serverless'
- Discovery, synchronization and configuration management
- Queue systems and commit logs
- Data workflows and pipelines
- Design guidelines (explicit consistency expectations, idempotence, and more)
- Monitoring and alerting (ELK, Graphite, Grafana, Graylog, Redash, Reimann and more)
- Data warehousing
- Micro batching
- Stream processing (Apache Storm, Heron and similar technologies)
- Lambda architecture
- Why cloud?
- Overview of Amazon Web Services
- Overview of Google Cloud Platform
- Overview of Microsoft Azure
- Comparison and highlight of notable strengths of each
- BigData on the cloud
Q&A panel with our experts - ask us anything.