Harbinger Explorer

Back to Knowledge Hub
Engineering
Published:

Real-Time Analytics Architecture: Lambda vs Kappa

9 min read·Tags: real-time analytics, lambda architecture, kappa architecture, ClickHouse, Apache Druid, Apache Pinot, OLAP, streaming

Your dashboards are showing yesterday's numbers. Your fraud team is reviewing alerts an hour after the transaction. Your ops team sees incidents in the monitoring tool before the analytics platform does. If that sounds familiar, you have a real-time analytics architecture problem — and the solution starts with choosing between two competing philosophies, then picking the right query engine to serve results fast.

TL;DR

Lambda architecture runs batch and streaming in parallel — accurate but operationally expensive. Kappa architecture unifies everything in a single streaming pipeline — simpler but demanding. For the OLAP serving layer, ClickHouse, Apache Druid, and Apache Pinot each dominate a different use case.


The Core Problem: Processing Latency

Traditional data warehouses are built for batch. Nightly loads, hourly refreshes, multi-hour transformation pipelines. That's fine for trend reporting, but it breaks down when your business needs:

  • Fraud detection at transaction time
  • Live dashboard updates during peak traffic events
  • Real-time inventory tracking across thousands of SKUs
  • Operational monitoring that catches anomalies in seconds

The gap between "event happens in the source system" and "analyst sees it in a dashboard" is processing latency. Cutting that latency means rethinking both how you move data and how you serve it.

Lambda Architecture: Batch + Speed Layers

Lambda architecture, popularized by Nathan Marz around 2011, solves the latency problem by running two parallel pipelines simultaneously.

Loading diagram...

① The batch layer reprocesses the full historical dataset on a schedule — accurate, handles late-arriving data, but slow. ② The speed layer processes events in near-real-time, covering the gap since the last batch run. ③ The serving layer merges both views at query time, giving analysts fresh data with eventual accuracy.

The core insight: the speed layer tolerates approximation because the batch layer overwrites it with accurate results periodically. You always have fresh data. You always have accurate data. Just not always at the same time.

Lambda Trade-offs

DimensionReality
LatencySub-minute (speed layer), hours (batch layer)
AccuracyBatch is ground truth; speed layer may approximate
Operational complexityHigh — two codebases, two deployment pipelines
DebuggingPainful — bugs must be fixed in two places
ReprocessingEfficient via the batch layer
Team requirementsBoth batch and streaming expertise

When Lambda works: You already have a mature batch pipeline and are adding streaming on top. Your team has both skill sets. Your aggregations are complex enough to be painful in pure streaming.

When Lambda fails you: Your business logic changes frequently (now you update it twice). You're starting fresh. You don't have the operational capacity to run two systems.

Kappa Architecture: Streaming-Only

Kappa architecture, proposed by Jay Kreps (co-creator of Kafka) in 2014, eliminates the batch layer entirely. Everything is a stream, including reprocessing.

Loading diagram...

① A durable message log (Kafka with extended retention, or an S3-backed log) is the system of record. ② The stream processor handles all transformations — real-time and historical. ③ Reprocessing works by replaying the log through a new version of your streaming job with the same code.

One codebase. One pipeline. Same logic for historical and real-time data.

Kappa Trade-offs

DimensionReality
LatencySub-minute consistently
AccuracyDepends entirely on stream processor correctness
Operational complexityLower than Lambda — one pipeline
ReprocessingPossible via log replay, slower than batch at petabyte scale
Storage costsLong Kafka retention adds up quickly
Team requirementsStreaming expertise — steeper learning curve

When Kappa works: Greenfield systems. Teams with real streaming skills. Business logic that changes often. Consistent sub-minute latency SLAs across all query types.

When Kappa struggles: Petabyte-scale historical reprocessing — replaying that through Kafka is painful. Very complex aggregations (full outer joins over unbounded windows). Teams new to streaming.

Lambda vs. Kappa: Direct Comparison

LambdaKappa
Processing modelBatch + streaming in parallelStreaming only
Number of codebasesTwoOne
Historical reprocessingFast (batch layer)Log replay (slower at scale)
Operational overheadHighModerate
Latency profileMixed (sub-minute + hours)Consistently sub-minute
Best forAdding streaming to existing batchGreenfield real-time systems

The industry trend since 2020 has been toward Kappa-style architectures. Stream processors have matured. Object storage has made long-term log retention cheaper. And most teams discover that maintaining two parallel codebases is unsustainable. But Lambda remains valid if you have complex historical queries or a large existing batch investment that you can't abandon.

OLAP Engines: ClickHouse, Druid, and Pinot

Both Lambda and Kappa need a fast serving layer — a system that answers analytical queries at low latency against large datasets. The three dominant choices are ClickHouse, Apache Druid, and Apache Pinot. They look similar from the outside, but they're optimized for different things.

ClickHouse

ClickHouse is a column-oriented OLAP database originally built at Yandex, now open source and backed by ClickHouse Inc. It's optimized for scan-heavy analytical queries with a strong emphasis on raw query speed and SQL expressiveness.

Strengths:

  • Exceptional ad-hoc query performance — frequently wins benchmarks against much larger systems
  • Familiar SQL dialect — analysts can query it directly without specialized knowledge
  • Efficient compression and vectorized execution reduce both storage and compute costs
  • Streaming ingestion via the Kafka table engine
  • Managed option: ClickHouse Cloud (consumption-based pricing) [PRICING-CHECK — Last verified: April 2026]

Weaknesses:

  • Joins are relatively slower — works best with denormalized or pre-joined data
  • Streaming ingestion is available but not as low-latency as Druid or Pinot's native paths
  • At extreme scale, cluster management requires expertise

Best for: Ad-hoc analytics, log analytics, time-series dashboards, teams that need fast SQL without high operational complexity. The practical default for most new real-time analytics setups in 2026.


Apache Druid

Apache Druid is a distributed data store built from the ground up for sub-second OLAP queries on real-time and historical event data. It ingests directly from Kafka with data visible in seconds.

Strengths:

  • Native Kafka ingestion — truly real-time, not micro-batch
  • Pre-aggregation (rollup) at ingestion time — stores aggregated metrics, not raw events, enabling extremely fast queries
  • Automatic data tiering: recent data in memory, older data in deep storage (S3/GCS)
  • Proven at massive scale (used at Meta, Netflix, Lyft)

Weaknesses:

  • Operational complexity is high — six different node types (Broker, Coordinator, Historical, MiddleManager, Overlord, Router)
  • SQL support is improving but still less expressive than ClickHouse
  • Rollup destroys raw event granularity unless explicitly disabled
  • Steep learning curve for both setup and query model

Best for: Large-scale event analytics, sub-second dashboards on streaming data, teams building internal analytics at significant scale. If you're not at Druid-warranted scale, the operational cost isn't worth it.


Apache Pinot

Apache Pinot was originally built at LinkedIn and later adopted at Uber. Its design focus is high-concurrency, low-latency queries for user-facing analytics products — think "who viewed your profile" at LinkedIn scale.

Strengths:

  • Excellent under high query concurrency (thousands of QPS)
  • Native Kafka ingestion, similar to Druid
  • Star-Tree index for pre-aggregated queries on high-cardinality dimensions
  • Good tenant isolation — useful for multi-tenant analytics products

Weaknesses:

  • Less mature SQL support compared to ClickHouse
  • Operational complexity comparable to Druid
  • Optimized for predefined query patterns — ad-hoc exploration is not its strength
  • Smaller community than ClickHouse

Best for: User-facing analytics products embedded in applications. If you're building a feature that shows users their own analytics at scale, Pinot is purpose-built for this. If you're building internal dashboards, ClickHouse is likely a better fit.


Engine Comparison

ClickHouseApache DruidApache Pinot
Primary strengthAd-hoc SQL speedReal-time event analyticsHigh-concurrency user-facing
Streaming ingestionVia Kafka engineNative (true real-time)Native (true real-time)
Operational complexityLow–MediumHighHigh
SQL expressivenessHighMediumMedium
Pre-aggregationOptionalCore to designOptional (Star-Tree)
Ad-hoc explorationExcellentLimitedLimited
Community sizeLargeMediumMedium
Best use caseDashboards, log analyticsEvent analytics at scaleUser-facing analytics products

The Practical Decision Path

When designing a real-time analytics stack, work through these questions in order:

1. What's your latency SLA? Sub-second, sub-minute, or sub-hour? This determines whether you need streaming ingestion or whether micro-batch is acceptable.

2. What's already in production? If you have a mature Spark batch pipeline, Lambda (adding a speed layer) is lower risk than a full Kappa rewrite. If you're building fresh, start Kappa.

3. What are your query patterns? Ad-hoc exploration → ClickHouse. Time-series event analytics at scale → Druid. High-concurrency user-facing queries → Pinot.

4. What are your team's streaming skills? Be honest. Kappa with Flink in production requires real expertise. Operators who've never debugged a watermark issue will struggle.

For most teams in 2026, the pragmatic default is: Kafka → Flink (or Spark Structured Streaming) → ClickHouse. It's a Kappa-style architecture with manageable operational overhead and excellent SQL tooling for analysts.

Exploring Real-Time Data Before the Infrastructure Is Ready

Not every team has a Druid cluster ready to query. While building out your real-time infrastructure, you often need to explore event data quickly — from API exports, CSV snapshots, or uploaded event samples. Harbinger Explorer lets you query that data directly in the browser using DuckDB WASM, with natural-language queries that generate SQL automatically. It won't replace a production OLAP engine, but it removes the friction from exploratory analysis while the real architecture is taking shape.

The Architecture That Actually Gets Built

Lambda vs. Kappa is a genuine engineering choice, not a marketing debate. Lambda is lower risk when you're extending an existing system. Kappa is cleaner for new builds. And your OLAP engine choice matters more than most teams realize — pick it based on query patterns, not benchmarks from a different company's workload.

Define your latency SLA. Audit your team's streaming skills honestly. Then choose the simplest architecture that meets the requirement — not the one that sounds most impressive in a design doc.


Continue Reading


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...