<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title><![CDATA[BigData Boutique Blog]]></title>
    <description><![CDATA[Expert insights on Elasticsearch, OpenSearch, Flink, Spark and big data technologies]]></description>
    <link>https://bigdataboutique.com</link>
    <image>
      <url>https://bigdataboutique.com/images/og.png</url>
      <title><![CDATA[BigData Boutique Blog]]></title>
      <link>https://bigdataboutique.com</link>
    </image>
    <generator>BigData Boutique RSS Generator</generator>
    <lastBuildDate>Tue, 16 Jun 2026 15:27:00 GMT</lastBuildDate>
    <atom:link href="https://bigdataboutique.com/blog/rss.xml" rel="self" type="application/rss+xml" />
    <copyright><![CDATA[Copyright 2026 BigData Boutique]]></copyright>
    <language><![CDATA[en]]></language>
    <item>
      <title><![CDATA[Apache Iceberg on AWS: Glue Catalog, Athena, EMR, and S3 Tables]]></title>
      <description><![CDATA[A practical guide to running Apache Iceberg on AWS - choosing between the Glue Data Catalog and S3 Tables, querying with Athena and EMR, and wiring the pieces into an AWS-native lakehouse over the Iceberg REST protocol.]]></description>
      <link>https://bigdataboutique.com/blog/apache-iceberg-on-aws</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/apache-iceberg-on-aws</guid>
      <category><![CDATA[AWS]]></category>
      <category><![CDATA[Apache Iceberg]]></category>
      <pubDate>Fri, 12 Jun 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/apache-iceberg-on-aws.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[AI Guardrails: Implementing Safety for Production LLM Apps]]></title>
      <description><![CDATA[A practitioner's guide to LLM guardrails as a layered defense architecture - input validation, output filtering, behavioral policy, and runtime observability - with open-source and cloud-native options compared, an OWASP-aligned threat model, and a four-layer reference architecture for production.]]></description>
      <link>https://bigdataboutique.com/blog/ai-guardrails-implementing-safety-production-llm-apps</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/ai-guardrails-implementing-safety-production-llm-apps</guid>
      <category><![CDATA[LLM]]></category>
      <category><![CDATA[AI Safety]]></category>
      <category><![CDATA[Guardrails]]></category>
      <category><![CDATA[Security]]></category>
      <category><![CDATA[Gen AI]]></category>
      <dc:creator><![CDATA[Itamar Syn-Hershko]]></dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/ai-guardrails-implementing-safety-production-llm-apps.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Hybrid Search Explained: Combining Vector and Keyword Retrieval]]></title>
      <description><![CDATA[Hybrid search runs lexical (BM25) and dense vector retrieval side by side and fuses the two ranked lists into one. This guide covers the architecture, OpenSearch and Elasticsearch implementations, fusion choices, weight tuning, and how to measure the uplift.]]></description>
      <link>https://bigdataboutique.com/blog/hybrid-search-explained</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/hybrid-search-explained</guid>
      <category><![CDATA[OpenSearch]]></category>
      <category><![CDATA[Elasticsearch]]></category>
      <pubDate>Mon, 08 Jun 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/hybrid-search-explained.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[The Medallion Architecture: Bronze, Silver, and Gold on Open Lakehouses]]></title>
      <description><![CDATA[How to implement the Bronze, Silver, and Gold medallion pattern on open table formats - Apache Iceberg, Delta Lake, and Apache Hudi - without Databricks lock-in, plus the trade-offs and when to skip Bronze entirely.]]></description>
      <link>https://bigdataboutique.com/blog/medallion-architecture-lakehouse</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/medallion-architecture-lakehouse</guid>
      <category><![CDATA[Databricks]]></category>
      <category><![CDATA[Apache Iceberg]]></category>
      <pubDate>Sun, 07 Jun 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/medallion-architecture-lakehouse.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[How to Choose a Database for Your Next Project]]></title>
      <description><![CDATA[A practical, engineering-first framework for picking a database: start from access patterns and consistency needs, understand ACID vs BASE and CAP/PACELC, weigh operational cost, and know when to reach past Postgres - with a decision checklist.]]></description>
      <link>https://bigdataboutique.com/blog/how-to-choose-a-database-for-your-next-project</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/how-to-choose-a-database-for-your-next-project</guid>
      <category><![CDATA[PostgreSQL]]></category>
      <category><![CDATA[ClickHouse]]></category>
      <category><![CDATA[Elasticsearch]]></category>
      <category><![CDATA[OpenSearch]]></category>
      <pubDate>Tue, 02 Jun 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/how-to-choose-a-database-for-your-next-project.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Stemming in Elasticsearch and OpenSearch: Why Light Beats Aggressive in the Hybrid Search Era]]></title>
      <description><![CDATA[A practical guide to stemming in Elasticsearch and OpenSearch in 2026. Why the Porter stemmer's aggressive defaults hurt precision once you add a vector retrieval leg, when light stemming is the right choice for hybrid search, and how to handle multilingual corpora.]]></description>
      <link>https://bigdataboutique.com/blog/stemming-in-elasticsearch-and-opensearch-hybrid-search</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/stemming-in-elasticsearch-and-opensearch-hybrid-search</guid>
      <category><![CDATA[Elasticsearch]]></category>
      <category><![CDATA[OpenSearch]]></category>
      <dc:creator><![CDATA[Itamar Syn-Hershko]]></dc:creator>
      <pubDate>Mon, 01 Jun 2026 08:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/stemming-in-elasticsearch-and-opensearch-hybrid-search.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[HNSW vs IVFFlat: How to Choose the Right Vector Index]]></title>
      <description><![CDATA[HNSW vs IVFFlat comparison: recall, memory, build time, and write workloads. Decision rules, parameter tuning, and examples.]]></description>
      <link>https://bigdataboutique.com/blog/hnsw-vs-ivfflat-how-to-choose-the-right-vector-index</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/hnsw-vs-ivfflat-how-to-choose-the-right-vector-index</guid>
      <category><![CDATA[Vector Search]]></category>
      <category><![CDATA[GenAI]]></category>
      <dc:creator><![CDATA[Itamar Syn-Hershko]]></dc:creator>
      <pubDate>Fri, 29 May 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/hnsw-vs-ivfflat-how-to-choose-the-right-vector-index.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Amazon Bedrock Explained: Foundation Models, Agents, Knowledge Bases]]></title>
      <description><![CDATA[A practical guide to Amazon Bedrock for engineers - what it is, how its building blocks fit together, and how to decide when Bedrock is the right choice versus calling provider APIs directly or self-hosting.]]></description>
      <link>https://bigdataboutique.com/blog/amazon-bedrock-explained-foundation-models-agents-knowledge-bases</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/amazon-bedrock-explained-foundation-models-agents-knowledge-bases</guid>
      <category><![CDATA[AWS]]></category>
      <category><![CDATA[GenAI]]></category>
      <category><![CDATA[RAG]]></category>
      <pubDate>Mon, 25 May 2026 08:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/amazon-bedrock-explained-foundation-models-agents-knowledge-bases.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[ETL Process Optimization in 2026: A Practitioner's Field Guide]]></title>
      <description><![CDATA[A diagnostic-first guide to ETL process optimization for teams running Spark, Flink, and Airflow in production - how to find the real bottleneck, fix it, and prove the result in both runtime and cloud spend.]]></description>
      <link>https://bigdataboutique.com/blog/etl-process-optimization</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/etl-process-optimization</guid>
      <category><![CDATA[Data Engineering]]></category>
      <category><![CDATA[Apache Spark]]></category>
      <category><![CDATA[Apache Flink]]></category>
      <category><![CDATA[Apache Airflow]]></category>
      <category><![CDATA[Apache Iceberg]]></category>
      <pubDate>Thu, 21 May 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/etl-processes.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[CloudWatch Logs to OpenSearch: A Practical Setup Guide with Terraform and AWS CLI]]></title>
      <description><![CDATA[Step-by-step guide to streaming AWS CloudWatch Logs into Amazon OpenSearch Service using subscription filters, Lambda, and Firehose - with complete Terraform and CLI examples.]]></description>
      <link>https://bigdataboutique.com/blog/cloudwatch-logs-to-opensearch</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/cloudwatch-logs-to-opensearch</guid>
      <category><![CDATA[OpenSearch]]></category>
      <category><![CDATA[Amazon OpenSearch Service]]></category>
      <category><![CDATA[AWS]]></category>
      <category><![CDATA[CloudWatch]]></category>
      <dc:creator><![CDATA[Kobi Lemberg]]></dc:creator>
      <pubDate>Tue, 19 May 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/cloudwatch-to-opensearch.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Reciprocal Rank Fusion (RRF): How It Works and When to Use It]]></title>
      <description><![CDATA[Reciprocal Rank Fusion (RRF) combines BM25 and vector search rankings into one list using a single formula. Learn how RRF works, why k=60, and when to use it.]]></description>
      <link>https://bigdataboutique.com/blog/reciprocal-rank-fusion-how-it-works-and-when-to-use-it</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/reciprocal-rank-fusion-how-it-works-and-when-to-use-it</guid>
      <category><![CDATA[Hybrid Search]]></category>
      <category><![CDATA[RRF]]></category>
      <category><![CDATA[Vector Search]]></category>
      <category><![CDATA[BM25]]></category>
      <category><![CDATA[RAG]]></category>
      <dc:creator><![CDATA[Itamar Syn-Hershko]]></dc:creator>
      <pubDate>Mon, 18 May 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/reciprocal-rank-fusion-rrf.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Digipal: Building a Cloud-Native IoT Asset Tracking Platform on AWS from Day One]]></title>
      <description><![CDATA[How Digipal partnered with BigData Boutique to go from IoT-enabled pallets to a production-ready, hardware-agnostic asset tracking platform on AWS — fast enough to move customer deals forward.]]></description>
      <link>https://bigdataboutique.com/blog/digipal-cloud-native-asset-tracking-platform-aws</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/digipal-cloud-native-asset-tracking-platform-aws</guid>
      <category><![CDATA[AWS]]></category>
      <category><![CDATA[Case Study]]></category>
      <category><![CDATA[ClickHouse]]></category>
      <pubDate>Thu, 14 May 2026 08:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/digipal-case-study.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Multimodal RAG in 2026: Retrieval Over Images, PDFs, and Text]]></title>
      <description><![CDATA[Multimodal RAG retrieves answers grounded in figures, tables, and page layout - not just paragraphs. This guide compares caption-and-index, unified vision embeddings (Cohere Embed 4, voyage-multimodal-3), and page-as-image retrieval (ColPali, ColQwen2) with reference architectures on OpenSearch.]]></description>
      <link>https://bigdataboutique.com/blog/multimodal-rag-retrieval-over-images-pdfs-and-text</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/multimodal-rag-retrieval-over-images-pdfs-and-text</guid>
      <category><![CDATA[RAG]]></category>
      <category><![CDATA[GenAI]]></category>
      <category><![CDATA[OpenSearch]]></category>
      <dc:creator><![CDATA[Kobi Lemberg]]></dc:creator>
      <pubDate>Tue, 12 May 2026 08:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/multi-modal-rag.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[OpenSearch PPL Examples: 30+ Copy-Paste Queries for Logs, Metrics, and Traces]]></title>
      <description><![CDATA[Hands-on OpenSearch PPL examples covering search, stats, eval, span, join, timechart, and more - ready for logs, metrics, and traces.]]></description>
      <link>https://bigdataboutique.com/blog/opensearch-ppl-examples-copy-paste-queries</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/opensearch-ppl-examples-copy-paste-queries</guid>
      <category><![CDATA[OpenSearch]]></category>
      <category><![CDATA[Observability]]></category>
      <category><![CDATA[Log Analytics]]></category>
      <dc:creator><![CDATA[Itamar Syn-Hershko]]></dc:creator>
      <pubDate>Mon, 11 May 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/opensearch-ppl-examples.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Fine-Tuning LLMs in 2026: When RAG Isn't Enough (and When It Still Is)]]></title>
      <description><![CDATA[Most teams should not fine-tune. This guide explains when fine-tuning actually beats prompting and RAG, why LoRA and QLoRA are the only realistic paths in 2026, and how to decide between SFT, DPO, ORPO, and OpenAI's RFT.]]></description>
      <link>https://bigdataboutique.com/blog/fine-tuning-llms-when-rag-isnt-enough</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/fine-tuning-llms-when-rag-isnt-enough</guid>
      <category><![CDATA[LLM]]></category>
      <category><![CDATA[Fine-Tuning]]></category>
      <category><![CDATA[LoRA]]></category>
      <category><![CDATA[QLoRA]]></category>
      <category><![CDATA[Gen AI]]></category>
      <category><![CDATA[RAG]]></category>
      <pubDate>Sun, 10 May 2026 08:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/fine-tuning-llms-2026.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Datadog Alternatives: Open-Source and Commercial Options for Monitoring and Observability]]></title>
      <description><![CDATA[A practical comparison of Datadog alternatives for engineering teams. Covers open-source options like OpenSearch, the Grafana/Prometheus stack, and ELK, plus commercial platforms like New Relic and Dynatrace - with pricing models, trade-offs, and a decision framework.]]></description>
      <link>https://bigdataboutique.com/blog/datadog-alternatives-open-source-and-commercial-monitoring-options</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/datadog-alternatives-open-source-and-commercial-monitoring-options</guid>
      <category><![CDATA[OpenSearch]]></category>
      <category><![CDATA[Observability]]></category>
      <category><![CDATA[OpenTelemetry]]></category>
      <pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/datadog-alternatives.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[OpenSearch and Elasticsearch Pricing Guide: AWS, Elastic Cloud, and Self-Managed Costs Compared]]></title>
      <description><![CDATA[A detailed breakdown of AWS OpenSearch Service pricing, OpenSearch Serverless OCU costs, Elastic Cloud subscription tiers, and self-managed Elasticsearch expenses - with concrete numbers and cost optimization strategies for 2026.]]></description>
      <link>https://bigdataboutique.com/blog/opensearch-and-elasticsearch-pricing-guide</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/opensearch-and-elasticsearch-pricing-guide</guid>
      <category><![CDATA[OpenSearch]]></category>
      <category><![CDATA[Amazon OpenSearch Service]]></category>
      <category><![CDATA[Elasticsearch]]></category>
      <category><![CDATA[Elastic Cloud]]></category>
      <category><![CDATA[AWS]]></category>
      <pubDate>Tue, 05 May 2026 10:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/assets-blog/opensearch-pricing-guide.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[EverC: Regaining Control of Data Platform Costs on AWS with EMR on EKS]]></title>
      <description><![CDATA[How EverC cut data platform costs on AWS by migrating Spark workloads to EMR on EKS - without disrupting analyst workflows or operational flexibility.]]></description>
      <link>https://bigdataboutique.com/blog/everc-regaining-control-of-data-platform-costs-with-emr-on-eks</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/everc-regaining-control-of-data-platform-costs-with-emr-on-eks</guid>
      <category><![CDATA[AWS]]></category>
      <category><![CDATA[EMR]]></category>
      <category><![CDATA[EKS]]></category>
      <category><![CDATA[Spark]]></category>
      <category><![CDATA[Cost Optimization]]></category>
      <category><![CDATA[Kubernetes]]></category>
      <dc:creator><![CDATA[Zevi Reinitz]]></dc:creator>
      <pubDate>Thu, 30 Apr 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/everc-emr-on-eks.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[Zero-Downtime ECS Deployments with Automatic PostgreSQL Migrations]]></title>
      <description><![CDATA[How to safely sequence Alembic migrations ahead of ECS rolling deployments using ECR EventBridge events, Step Functions, and digest-pinned Fargate tasks - no CI runner required after the image push.]]></description>
      <link>https://bigdataboutique.com/blog/zero-downtime-ecs-deployments-with-postgresql-migrations</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/zero-downtime-ecs-deployments-with-postgresql-migrations</guid>
      <category><![CDATA[AWS]]></category>
      <category><![CDATA[ECS]]></category>
      <category><![CDATA[PostgreSQL]]></category>
      <category><![CDATA[DevOps]]></category>
      <category><![CDATA[Infrastructure]]></category>
      <dc:creator><![CDATA[Kobi Lemberg]]></dc:creator>
      <pubDate>Wed, 29 Apr 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/ecs-postgresql-update.webp" length="0" type="image/webp" />
    </item>
    <item>
      <title><![CDATA[S3 Vectors with OpenSearch: Cost-Efficient Hybrid Vector Search]]></title>
      <description><![CDATA[Amazon OpenSearch Service can back vector indexes with S3 Vectors for cheap hybrid search at scale. Setup, query examples, limits, cost-latency tradeoff.]]></description>
      <link>https://bigdataboutique.com/blog/opensearch-with-s3-vectors-cost-efficient-hybrid-search</link>
      <guid isPermaLink="false">https://bigdataboutique.com/blog/opensearch-with-s3-vectors-cost-efficient-hybrid-search</guid>
      <category><![CDATA[OpenSearch]]></category>
      <category><![CDATA[Amazon OpenSearch Service]]></category>
      <category><![CDATA[Vector Search]]></category>
      <category><![CDATA[RAG]]></category>
      <category><![CDATA[AWS S3]]></category>
      <dc:creator><![CDATA[Lior Friedler]]></dc:creator>
      <pubDate>Tue, 28 Apr 2026 14:00:00 GMT</pubDate>
      <enclosure url="https://bigdataboutique.com/blog-images/opensearch-with-s3-vectors-cost-efficient-hybrid-search.webp" length="0" type="image/webp" />
    </item>
  </channel>
</rss>