cloud-architecture

Published: Apr 9, 2026

Real-Time Feature Store Architecture for MLOps

10 min read·Tags: feature-store, mlops, real-time-ml, databricks, feast, machine-learning, cloud-architecture, ai-agents

Real-Time Feature Store Architecture for MLOps

Your model scores 95% in the notebook. In production, it quietly degrades to 78% — and nobody notices for three weeks. The culprit isn't the model. It's the features.

Real-time feature stores solve the hardest infrastructure problem in production ML: serving consistent, fresh features at low latency while keeping training and serving pipelines in sync. If you're building anything beyond batch-only models, this is the architecture layer you can't skip.

TL;DR for Busy Architects

A feature store separates feature engineering from model training and model serving — creating a reusable, governed feature layer
The core architectural pattern is a dual-store design: an offline store for training (columnar, high-throughput) and an online store for inference (key-value, low-latency)
Training-serving skew — the #1 silent killer of ML systems — is eliminated by computing features once and serving them consistently
After Databricks' acquisition of Tecton (~$900M, Aug 2025), the feature store landscape has consolidated around Databricks/Tecton, Feast (open-source), SageMaker Feature Store, and Vertex AI Feature Store
Your choice depends on cloud commitment, latency requirements, and whether you need streaming transformations

Why Feature Stores Exist: The Problem They Solve

Most ML teams start the same way: a data scientist writes feature logic in a notebook, trains a model, hands it off. The engineering team rewrites that logic in a serving language. Now you have two implementations of the same feature — one in Python/Pandas, one in Java/Go/SQL.

They drift. Silently.

This is training-serving skew, and it's responsible for more production ML failures than bad models. A feature store eliminates this by becoming the single source of truth for feature computation, storage, and retrieval.

Beyond skew prevention, feature stores solve three other architectural problems:

Problem	Without Feature Store	With Feature Store
Feature reuse	Each team rebuilds the same features	Compute once, share across models
Point-in-time correctness	Accidental data leakage in training	Time-travel queries prevent future data from leaking into past
Freshness guarantees	Batch features updated daily at best	Streaming pipelines deliver sub-second freshness
Feature discovery	"Does anyone have a user_churn_score feature?" (Slack message)	Searchable catalog with lineage and metadata

Core Architecture: The Dual-Store Pattern

Every production feature store follows the same fundamental pattern — regardless of vendor. Understanding this pattern is more important than picking a tool.

Loading diagram...

① Data Sources feed raw events and records into the feature engineering layer — transactional databases, event streams (Kafka, Kinesis), and data lake tables.

② Feature Engineering computes features through three patterns: batch (scheduled Spark/SQL jobs), streaming (continuous Flink/Spark Streaming), and on-demand (computed at request time for features that can't be pre-materialized).

③ The Dual Store is the architectural core. The offline store (columnar formats like Delta Lake or Parquet) serves training with point-in-time correct historical data. The online store (key-value stores like Redis or DynamoDB) serves inference with the latest feature values at sub-10ms latency.

④ Consumers pull features consistently from both stores — training jobs read from offline, inference services read from online, monitoring reads from both.

🟦 Blue = Sources 🟨 Amber = Compute 🟩 Green = Storage 🟪 Purple = Consumers

Offline Store: Optimized for Training

The offline store holds the full feature history. It answers the question: "What were this user's features at 3:47 PM on March 12th?"

This point-in-time correctness is non-negotiable for valid model training. Without it, future information leaks into training data (label leakage), and your model performs unrealistically well in backtests but fails in production.

Technology choices:

Delta Lake / Iceberg — Best for Lakehouse architectures. Time-travel built in. Works with Spark natively.
BigQuery — Strong choice for GCP-native teams. Partitioned tables with snapshot semantics.
S3 + Parquet + Athena — Budget option for AWS. Works but requires more glue code for time-travel.

Online Store: Optimized for Serving

The online store holds only the latest feature values per entity (user, product, session). It answers: "What are this user's features right now?"

Latency targets: p99 < 10ms for most real-time ML use cases (fraud detection, recommendations, pricing).

Technology choices:

Redis / Valkey — Lowest latency, highest cost per GB. Best for small-to-medium feature sets.
DynamoDB — AWS-native, auto-scaling, predictable pricing at scale. Good for large feature sets.
Bigtable — GCP-native, excellent for wide rows with many features per entity.
Cassandra — Multi-cloud, open-source. Higher operational burden but no vendor lock-in.

Three Feature Freshness Patterns

Not every feature needs sub-second freshness. Choosing the right pattern per feature is the most impactful architectural decision you'll make.

Pattern 1: Batch Features (Minutes to Hours)

Computed on a schedule (hourly, daily). Examples: 30-day purchase count, average session duration, credit score.

Architecture: Spark/SQL job → writes to offline store → materialization job copies latest values to online store.

When to use: Features that change slowly, where staleness of hours is acceptable. This covers 70-80% of features in most systems.

Pattern 2: Streaming Features (Seconds to Minutes)

Computed continuously from event streams. Examples: rolling 5-minute click count, real-time cart value, session event count.

Architecture: Kafka/Kinesis → Flink/Spark Streaming → writes to both offline and online stores simultaneously.

When to use: Features derived from user behavior in active sessions. Fraud detection, real-time personalization, dynamic pricing.

Pattern 3: On-Demand Features (Request-Time)

Computed at inference time, not pre-materialized. Examples: time since last event, geo-distance between user and merchant, request-specific embeddings.

Architecture: Feature logic runs inside the serving layer when a prediction request arrives. Results may be cached but aren't stored long-term.

When to use: Features that depend on request context (the incoming transaction itself), features that change too fast to pre-compute, or features with too many entity combinations to store.

Pattern	Freshness	Latency Added	Compute Cost	Complexity
Batch	Minutes–hours	~0ms (pre-materialized)	Low (scheduled)	Low
Streaming	Seconds	~0ms (pre-materialized)	Medium–high (always-on)	Medium
On-demand	Real-time	5–50ms (computed)	Per-request	High

The pragmatic approach: Start with batch for everything. Promote to streaming only when staleness measurably hurts model performance. Use on-demand only when pre-computation is impossible.

Feature Store Comparison: 2026 Landscape

The Databricks acquisition of Tecton (August 2025) reshaped this market significantly. Here's where things stand.

Databricks Feature Store (+ Tecton)

Since acquiring Tecton for approximately $900M, Databricks has integrated Tecton's real-time feature serving into the Lakehouse platform. Features are now first-class citizens in Unity Catalog, with lineage tracked alongside tables and models in MLflow.

Strengths:

Deepest integration with Delta Lake, Unity Catalog, and MLflow
Tecton's streaming transformation engine now built-in
Single governance layer for data and features
Batch + streaming + on-demand all supported natively

Trade-offs:

Databricks lock-in — features live in the Lakehouse ecosystem
Pricing can be steep for streaming features (always-on compute) [PRICING-CHECK]
Smaller community than Feast for custom extensions

Best for: Teams already on Databricks who want a fully managed, integrated feature platform.

Feast (Open Source)

Feast remains the leading open-source feature store. It's provider-agnostic — you bring your own offline store (BigQuery, Snowflake, Redshift, file-based) and online store (Redis, DynamoDB, SQLite, Postgres).

Strengths:

Cloud-agnostic — runs anywhere, no vendor lock-in
Strong community, extensive provider ecosystem
Simple deployment (pip install, no dedicated infrastructure)
Good for teams that already have feature engineering pipelines

Trade-offs:

No built-in streaming transformations — you manage Flink/Spark Streaming separately
Feature computation is your responsibility; Feast handles storage and serving
Monitoring and observability require additional tooling
Enterprise support is limited (no backing company since Tecton acquisition)

Best for: Multi-cloud teams, organizations that want full control, teams with existing feature pipelines that need a serving layer.

SageMaker Feature Store (AWS)

Tightly integrated with the SageMaker ML platform. Offers both online and offline stores with automatic syncing.

Strengths:

Native AWS integration (IAM, S3, Glue, Athena)
Automatic offline-online sync
Feature groups with schema enforcement
SageMaker Pipelines integration for end-to-end MLOps

Trade-offs:

AWS-only — no multi-cloud story
Limited streaming support compared to Databricks/Tecton
Online store latency higher than Redis-based alternatives (~10-20ms p99) [VERIFY]
Less flexible than Feast for custom backends

Best for: All-in AWS teams using SageMaker for training and deployment.

Vertex AI Feature Store (GCP)

Google's managed feature store, integrated with Vertex AI and BigQuery.

Strengths:

Native BigQuery integration as offline store
Bigtable-backed online serving (low latency at scale)
Feature monitoring with drift detection built-in
Strong integration with Vertex AI Pipelines

Trade-offs:

GCP-only
Less mature streaming support than Databricks
Pricing complexity — multiple meters (storage, serving, sync) [PRICING-CHECK]
Smaller ecosystem than Feast or Databricks

Best for: GCP-native teams using Vertex AI for ML workflows.

Comparison Matrix

Capability	Databricks + Tecton	Feast (OSS)	SageMaker FS	Vertex AI FS
Cloud	Multi (Databricks-hosted)	Any	AWS only	GCP only
Offline store	Delta Lake	Pluggable	S3 + Glue	BigQuery
Online store	Integrated (Tecton)	Pluggable	Proprietary	Bigtable
Streaming transforms	✅ Native	❌ BYO	⚠️ Limited	⚠️ Limited
On-demand transforms	✅	⚠️ Alpha	❌	❌
Feature catalog	Unity Catalog	Basic registry	SageMaker	Vertex AI
Point-in-time joins	✅	✅	✅	✅
Drift detection	✅	❌ BYO	⚠️ Basic	✅
Open source	❌	✅	❌	❌
Pricing model	DBU-based	Free (infra costs)	Per GB + requests	Per GB + requests

Decision Framework: When to Choose What

Skip the feature matrix — here's how to actually decide.

Do You Even Need a Feature Store?

No, if:

You have fewer than 5 models in production
All your models use batch features with daily freshness
A single team owns all ML — no feature sharing needed
You're pre-product-market-fit and iterating on models weekly

Yes, if:

Multiple teams need the same features (user profiles, product attributes)
You need real-time features (sub-minute freshness)
Training-serving skew has caused production incidents
You're scaling beyond 10+ production models

Which One?

START
  │
  ├─ Already on Databricks? ──→ Databricks Feature Store (post-Tecton)
  │
  ├─ All-in on AWS SageMaker? ──→ SageMaker Feature Store
  │
  ├─ All-in on GCP Vertex AI? ──→ Vertex AI Feature Store
  │
  ├─ Multi-cloud or want control? ──→ Feast
  │
  └─ Need streaming transforms + no Databricks? ──→ Feast + Flink (self-managed)

The honest take: If you're on Databricks, the Tecton integration makes it hard to justify anything else. If you're not, Feast gives you the most flexibility. The cloud-native options (SageMaker, Vertex AI) are fine if you're fully committed to one cloud and want minimal operational overhead.

Architecture Anti-Patterns to Avoid

❌ Recomputing features in the serving path

If your model endpoint runs a SQL query against your data warehouse for every prediction, you don't have a feature store — you have a latency problem. Pre-materialize into an online store.

❌ Separate feature logic for training and serving

The moment you have features_training.py and features_serving.java, you've created a skew factory. One definition, two stores.

❌ Storing raw data in the online store

The online store should hold computed features, not raw events. Raw events belong in your event stream or data lake. The online store is a serving cache, not a database.

❌ Over-engineering freshness

Not every feature needs streaming. If your "user_lifetime_value" feature updates daily and your model works fine with that, don't build a streaming pipeline for it. Reserve streaming compute budget for features where freshness actually moves the needle.

❌ Skipping point-in-time joins in training

If your training pipeline joins features by entity ID without respecting event timestamps, your model is seeing the future. This is the most common and most damaging mistake in feature engineering.

Real-World Architecture: Fraud Detection

To make this concrete, here's how a real-time fraud detection system typically uses a feature store:

Feature	Pattern	Freshness	Example
Account age	Batch	Daily	847 days
30-day transaction count	Batch	Hourly	42
5-min transaction velocity	Streaming	Seconds	3 txns in last 5 min
Geo-distance to merchant	On-demand	Real-time	2.3 km
Device fingerprint match	On-demand	Real-time	0.92 similarity

The model receives all five features in a single API call to the feature store's online serving endpoint. Total latency budget: < 15ms. The feature store handles the complexity of fetching from different storage backends and freshness tiers.

What's Next: Feature Stores and AI Agents

The Databricks-Tecton acquisition wasn't just about ML models — it was explicitly about powering AI agents with real-time data. As LLM-based agents become production workloads, they need the same feature infrastructure that ML models do: fresh context, low latency, and consistent data.

Expect feature stores to evolve from "ML feature serving" to "real-time context serving" — providing any AI system (model or agent) with the entity-level context it needs to make decisions.

If you're exploring how your existing data sources feed into ML features, tools like Harbinger Explorer can help you profile and query raw API data directly in the browser — useful for the discovery phase before formalizing features in a store.

Key Takeaways

The dual-store architecture (offline for training, online for serving) is the proven pattern. Start with batch features, add streaming only when freshness measurably improves model performance, and use on-demand transforms as a last resort for request-dependent features.

Your next step: audit your current ML serving pipeline. If feature logic exists in more than one place, you have a skew problem — and a feature store should be your next infrastructure investment.

Continue Reading

Markers:

[VERIFY] SageMaker Feature Store online latency p99 ~10-20ms — needs verification against current AWS docs
[PRICING-CHECK] Databricks Feature Store streaming compute pricing post-Tecton integration
[PRICING-CHECK] Vertex AI Feature Store multi-meter pricing (storage, serving, sync)

Continue Reading

cloud-architecture14 min

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

A comprehensive technical guide to building GDPR-compliant cloud data platforms — covering pseudonymisation architecture, Terraform infrastructure, Kubernetes deployments, right-to-erasure workflows, and cloud provider comparison tables.

Apr 3, 2026Read

cloud-architecture11 min

Cloud Cost Allocation Strategies for Data Teams

A practitioner's guide to cloud cost allocation for data teams—covering tagging strategies, chargeback models, Spot instance patterns, query cost optimization, and FinOps tooling with real Terraform and CLI examples.

Apr 3, 2026Read

cloud-architecture13 min

API Gateway Architecture Patterns for Data Platforms

A deep-dive into API gateway architecture patterns for data platforms — covering data serving APIs, rate limiting, authentication, schema versioning, and the gateway-as-data-mesh pattern.

Apr 3, 2026Read

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Real-Time Feature Store Architecture for MLOps

TL;DR for Busy Architects

Why Feature Stores Exist: The Problem They Solve

Core Architecture: The Dual-Store Pattern

Offline Store: Optimized for Training

Online Store: Optimized for Serving

Three Feature Freshness Patterns

Pattern 1: Batch Features (Minutes to Hours)

Pattern 2: Streaming Features (Seconds to Minutes)

Pattern 3: On-Demand Features (Request-Time)

Feature Store Comparison: 2026 Landscape

Databricks Feature Store (+ Tecton)

Feast (Open Source)

SageMaker Feature Store (AWS)

Vertex AI Feature Store (GCP)

Comparison Matrix

Decision Framework: When to Choose What

Do You Even Need a Feature Store?

Which One?

Architecture Anti-Patterns to Avoid

❌ Recomputing features in the serving path

❌ Separate feature logic for training and serving

❌ Storing raw data in the online store

❌ Over-engineering freshness

❌ Skipping point-in-time joins in training

Real-World Architecture: Fraud Detection

What's Next: Feature Stores and AI Agents

Key Takeaways

Continue Reading

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Try Harbinger Explorer for free

Command Palette