This week we dive into the practical side of AI. Build and think with LLMs, deploy OpenAI customer service agents, and master hybrid search with Reciprocal Rank Fusion.

The initial shockwave of generative AI is over, and a new phase is dawning: the era of the engineer. As the hype cycle matures, the conversation is shifting from dazzling, theoretical demos to the deliberate, disciplined work of building real-world applications. This week, we're seeing a clear trend across the industry - a move away from abstract potential and toward solving the tangible, often tricky, challenges of implementation. The focus is now on creating robust data pipelines, refining search results, and building AI agents that deliver measurable business value. It's an exciting time where applied science is officially taking center stage.

This week’s digest is your guide to navigating this practical frontier. We’ll kick things off with a powerful blueprint from OpenAI: a ready-to-deploy customer service agent, complete with over 5,300 stars on GitHub, that shows exactly how to solve concrete business problems. We'll then dive into a surprisingly simple technique called Reciprocal Rank Fusion that promises to end the "never-ending battle" of hybrid search boosting. Finally, we'll distill a year’s worth of hands-on lessons in building with LLMs and unpack Stanford's definitive guide to the model construction process, from pre-training to human-preference alignment.

We'll wrap up discussing the Illusion of Thinking - an innovative whitepaper by Apple that draws the line between LLMs and what we call "thinking".

The Weekly Chunk, like always, isn't just a summary of news; it’s a toolkit for the modern AI builder. Let's start.


OpenAI open-sourced another agent demo, this time a Customer Service Agent, using the Agents SDK to route airline customer requests between specialized agents like Triage Agent, Seat Booking Agent, Flight Status Agent, Cancellation Agent, and FAQ Agent with Relevance and Jailbreak Guardrails, a Python backend and Next.js UI for agent orchestration visualization and chat interface.

Go to Github

This talk by Philipp Krenn from Elastic explains Reciprocal Rank Fusion (RRF) as a solution for hybrid search challenges. It begins by highlighting the flaws of common approaches: normalizing scores is a 'never-ending battle' because scores are relative to the document set, and manual boosting is more of a 'guideline than an actual rule,' lacking precision.

RRF offers a simple, effective alternative. It combines multiple result lists by blending document *ranks*, not their scores. The RRF score for each item is calculated by summing `1 / (k + rank)` across all lists, where 'k' is a constant (default 60) that tunes the influence of lower-ranked items. The final results are sorted by this new score. A key side effect is that final results have a 'null' score, as RRF only determines the order.

Watch Now

Building with LLMs is easier than ever, but shipping real, useful products remains deceptively hard. Over the past year, we've faced the challenges firsthand and compiled key lessons to help developers move beyond demos and build systems that truly work.

This whitepaper distills those insights into tactical advice, operational lessons, and strategic principles for anyone working with LLMs: from weekend hackers to startup founders. Dive in to avoid common pitfalls, iterate faster, and build smarter with LLMs.

Read More

This lecture details the practical components of building LLMs. The process begins with pre-training, where a model learns from vast internet data to predict the next token (autoregressive language modeling). This phase requires a robust data pipeline for cleaning, filtering, and deduplication, and its success is governed by scaling laws, which predictably link compute, data, and model size to performance.

The second phase is post-training (or alignment), which transforms the base model into a useful assistant. It starts with Supervised Fine-Tuning (SFT) on a small, high-quality dataset to teach the model the desired conversational format. This is followed by preference optimization, using methods like Direct Preference Optimization (DPO) to align the model's outputs with human judgments, making it more helpful and safe.

Watch Now

Even when reasoning language models look smart - explaining their steps, planning flows, and calling tools, they aren’t truly "thinking". New research from Apple shows that, despite all this, today’s most advanced models can still collapse on genuinely complex questions. If we want to use LLMs, RAG, and reasoning-focused models safely and effectively, we have to understand what they’re good at - and where they’re still likely to struggle or fail.

Watch Now