Ask me anything

Monthly virtual office hours on BigData technologies and architecture

We know how challenging the BigData landscape can be and now, when we are all working remotely because of Covid19, we believe it's the perfect time to move our office hours to a monthly virtual event.

View other AMA sessions

Monthly, every Last Wednesday

calendar Add to calendar >
08:00 PST 11:00 EST 16:00 GMT 17:00 CET
May 13th, 2020

Kafka Streams: a gentle comparison with other real-time frameworks

In this session we will introduce Kafka Streams, a client library for building real-time processing applications, where the input and output data are stored in Kafka clusters. We will compare it with other popular real-time frameworks such as Flink and Spark Structured Streaming and talk about when to use which one.


Every last Wednesday of the month

08:00 PST 11:00 EST 16:00 GMT 17:00 CET

Our team of experts will be here to answer all your BigData questions, live. Once every month we will host a live event, beginning with a short presentation on a bleeding edge topic, and then follow up with a Q&A session that is open for all.

Previous sessions:

  • May 6th, 2020

    Alerting with Elasticsearch and the Elastic Stack

    The Elastic Stack is being used almost everywhere today for application and system monitoring. In this session we will show you how to add alerting to any Elastic-based monitoring system, so you can also get alerted via Email, Slack, PagerDuty and more when any of the alerting rules you defined gets triggered.

  • May 20th, 2020

    Big Data Architectures on Amazon Web Services (AWS)

    This session will showcase typical Big Data architectures on AWS and show you how to build them yourself. From building Data Warehouses and Data Lakes to make huge amounts of data queryable, orchestrating data pipelines and ETL processes, ingesting data at scale, to handling and computing on high-velocity data streams. These are huge tasks but are relatively easy to get done with AWS, and this session will show you where to begin.

  • May 27th, 2020

    Elasticsearch: Performance and Stability in Production

    There are so many Elasticsearch clusters out there, and many of them suffer from performance and stability issues because of mis-configuration or incorrect capacity planning. In this session we will look at the common errors people make when deploying Elasticsearch clusters, and offer best-practices, do's and don'ts so it doesn't happen to you as well.

  • June 3rd, 2020

    Avro, Parquet or JSON? What to use and, more importantly, how to manage schemas

    In this session, we'll review the differences between the most important Big Data file formats for Event Streaming, their pros and cons and how to choose the best fit for a specific use case. We'll also take a look to the proper architecture to provide greater control over data quality using Schema Management. Need to add a new column to a downstream database? You don’t need an involved change process and at least 4 meetings to coordinate 15 teams. Join us to learn how it's possible to reduce operational complexity in the application development cycle.

  • June 10th, 2020

    On storage system in Apache Spark

    This AMA session will begin with a very short introduction to Storage System and BlockManager. During this session we are going to show you when and how Spark saves data to disk using the storage system. It's going to be fairly low-level, but there will be enough high-level info that anybody should benefit. This session can get interactive so expect questions to drive how low / high we end up discussing.

  • June 17th, 2020

    How to expose Big Data efficiently

    For any Big Data architecture, the main goal is to make available data to their users which will be very hard to use in traditional architectures because of size or latency requirements. In this AMA, we'll cover how to expose data efficiently in terms of performance and governance. We'll review some interesting patterns and technologies which makes easier for your users to consume the data previously processed in your pipelines.

  • July 1st, 2020

    The State of Cloud Machine Learning

    The Machine Learning ecosystem is booming in recent years and with new product and technology announcements coming every week it’s easy to get lost. We invite you to join us as Gad, Director of Machine Learning, will explain what’s worth looking at and arm you with knowledge on how to choose the right tool for the task.

  • July 22nd, 2020

    Usage patterns for Kafka

    Kafka is a key component in data architectures because it's the enabler for easy decoupling between systems and performance improvements like adding backpressure management to existing components. In this Ask Me Anything session we'll review some messaging patterns as Pub-Sub and Observer and how they are related to architectural patterns as CQRS, Event Sourcing and Event Collaboration. We'll cover the advantages/disadvantages of each pattern and we'll learn to identify the best opportunities to use them.

  • July 29th, 2020

    SQL Query Anything, Anywhere with Starburst Presto

    Build an Open Source Data Access Layer to federate Kafka, Data Lakes, and more. Starburst Presto provides a federated "Single Source of Access" to create a multi-node, elastically scaling cluster to pull data from data warehouses, data lakes, relational data, NoSQL, and Kafka queues. Users run a single SQL query that joins data from all of them merging the data on the fly into a single result-set. Join in on this AMA to learn how to implement this in your environment.

  • September 23rd, 2020

    Introduction to Delta Lake SQL

    Let's talk about the latest of Delta Lake 0.7.0 and how much you can use it with SQL only. We will begin with a short Delta Lake intro and then dive into all the goodies of DDL and DML commands (like CREATE, ALTER, DROP, SELECT, UPDATE, DELETE, MERGE, EXPLAIN) which are supported by Delta Lake. We will review and demo all live, and let's see where it goes from there with your questions!

  • October 28th, 2020

    Modern full-text search with Elasticsearch

    The field of information-retrieval and text search has come a long way since its inception, several dozen years ago. Join us on this session, where we will discuss the modern text search practice with Elasticsearch, the Lucene-based search engine server and today's de-facto standard for full-text search applications. We will start from the basic keyword search - analyzers, term normalization, stemming and morphologic properties. We will, of course, discuss the common challenges it has, such as boosts, synonyms, ontologies, phrases and how to deal with them. Continuing from there, we will review the modern and future approaches for full-text search, from vector search to word embedding methods like BERT, and how those come into play. We will also discuss how we can improve precision and recall by using judgment lists, click-streams and search logs.

Propose a topic for a future session

Contact Us