Real-Time Feature Store Architecture for MLOps
Real-Time Feature Store Architecture for MLOps
Your model scores 95% in the notebook. In production, it quietly degrades to 78% — and nobody notices for three weeks. The culprit isn't the model. It's the features.
Real-time feature stores solve the hardest infrastructure problem in production ML: serving consistent, fresh features at low latency while keeping training and serving pipelines in sync. If you're building anything beyond batch-only models, this is the architecture layer you can't skip.
TL;DR for Busy Architects
- A feature store separates feature engineering from model training and model serving — creating a reusable, governed feature layer
- The core architectural pattern is a dual-store design: an offline store for training (columnar, high-throughput) and an online store for inference (key-value, low-latency)
- Training-serving skew — the #1 silent killer of ML systems — is eliminated by computing features once and serving them consistently
- After Databricks' acquisition of Tecton (~$900M, Aug 2025), the feature store landscape has consolidated around Databricks/Tecton, Feast (open-source), SageMaker Feature Store, and Vertex AI Feature Store
- Your choice depends on cloud commitment, latency requirements, and whether you need streaming transformations
Why Feature Stores Exist: The Problem They Solve
Most ML teams start the same way: a data scientist writes feature logic in a notebook, trains a model, hands it off. The engineering team rewrites that logic in a serving language. Now you have two implementations of the same feature — one in Python/Pandas, one in Java/Go/SQL.
They drift. Silently.
This is training-serving skew, and it's responsible for more production ML failures than bad models. A feature store eliminates this by becoming the single source of truth for feature computation, storage, and retrieval.
Beyond skew prevention, feature stores solve three other architectural problems:
| Problem | Without Feature Store | With Feature Store |
|---|---|---|
| Feature reuse | Each team rebuilds the same features | Compute once, share across models |
| Point-in-time correctness | Accidental data leakage in training | Time-travel queries prevent future data from leaking into past |
| Freshness guarantees | Batch features updated daily at best | Streaming pipelines deliver sub-second freshness |
| Feature discovery | "Does anyone have a user_churn_score feature?" (Slack message) | Searchable catalog with lineage and metadata |
Core Architecture: The Dual-Store Pattern
Every production feature store follows the same fundamental pattern — regardless of vendor. Understanding this pattern is more important than picking a tool.
Loading diagram...
① Data Sources feed raw events and records into the feature engineering layer — transactional databases, event streams (Kafka, Kinesis), and data lake tables.
② Feature Engineering computes features through three patterns: batch (scheduled Spark/SQL jobs), streaming (continuous Flink/Spark Streaming), and on-demand (computed at request time for features that can't be pre-materialized).
③ The Dual Store is the architectural core. The offline store (columnar formats like Delta Lake or Parquet) serves training with point-in-time correct historical data. The online store (key-value stores like Redis or DynamoDB) serves inference with the latest feature values at sub-10ms latency.
④ Consumers pull features consistently from both stores — training jobs read from offline, inference services read from online, monitoring reads from both.
🟦 Blue = Sources 🟨 Amber = Compute 🟩 Green = Storage 🟪 Purple = Consumers
Offline Store: Optimized for Training
The offline store holds the full feature history. It answers the question: "What were this user's features at 3:47 PM on March 12th?"
This point-in-time correctness is non-negotiable for valid model training. Without it, future information leaks into training data (label leakage), and your model performs unrealistically well in backtests but fails in production.
Technology choices:
- Delta Lake / Iceberg — Best for Lakehouse architectures. Time-travel built in. Works with Spark natively.
- BigQuery — Strong choice for GCP-native teams. Partitioned tables with snapshot semantics.
- S3 + Parquet + Athena — Budget option for AWS. Works but requires more glue code for time-travel.
Online Store: Optimized for Serving
The online store holds only the latest feature values per entity (user, product, session). It answers: "What are this user's features right now?"
Latency targets: p99 < 10ms for most real-time ML use cases (fraud detection, recommendations, pricing).
Technology choices:
- Redis / Valkey — Lowest latency, highest cost per GB. Best for small-to-medium feature sets.
- DynamoDB — AWS-native, auto-scaling, predictable pricing at scale. Good for large feature sets.
- Bigtable — GCP-native, excellent for wide rows with many features per entity.
- Cassandra — Multi-cloud, open-source. Higher operational burden but no vendor lock-in.
Three Feature Freshness Patterns
Not every feature needs sub-second freshness. Choosing the right pattern per feature is the most impactful architectural decision you'll make.
Pattern 1: Batch Features (Minutes to Hours)
Computed on a schedule (hourly, daily). Examples: 30-day purchase count, average session duration, credit score.
Architecture: Spark/SQL job → writes to offline store → materialization job copies latest values to online store.
When to use: Features that change slowly, where staleness of hours is acceptable. This covers 70-80% of features in most systems.
Pattern 2: Streaming Features (Seconds to Minutes)
Computed continuously from event streams. Examples: rolling 5-minute click count, real-time cart value, session event count.
Architecture: Kafka/Kinesis → Flink/Spark Streaming → writes to both offline and online stores simultaneously.
When to use: Features derived from user behavior in active sessions. Fraud detection, real-time personalization, dynamic pricing.
Pattern 3: On-Demand Features (Request-Time)
Computed at inference time, not pre-materialized. Examples: time since last event, geo-distance between user and merchant, request-specific embeddings.
Architecture: Feature logic runs inside the serving layer when a prediction request arrives. Results may be cached but aren't stored long-term.
When to use: Features that depend on request context (the incoming transaction itself), features that change too fast to pre-compute, or features with too many entity combinations to store.
| Pattern | Freshness | Latency Added | Compute Cost | Complexity |
|---|---|---|---|---|
| Batch | Minutes–hours | ~0ms (pre-materialized) | Low (scheduled) | Low |
| Streaming | Seconds | ~0ms (pre-materialized) | Medium–high (always-on) | Medium |
| On-demand | Real-time | 5–50ms (computed) | Per-request | High |
The pragmatic approach: Start with batch for everything. Promote to streaming only when staleness measurably hurts model performance. Use on-demand only when pre-computation is impossible.
Feature Store Comparison: 2026 Landscape
The Databricks acquisition of Tecton (August 2025) reshaped this market significantly. Here's where things stand.
Databricks Feature Store (+ Tecton)
Since acquiring Tecton for approximately $900M, Databricks has integrated Tecton's real-time feature serving into the Lakehouse platform. Features are now first-class citizens in Unity Catalog, with lineage tracked alongside tables and models in MLflow.
Strengths:
- Deepest integration with Delta Lake, Unity Catalog, and MLflow
- Tecton's streaming transformation engine now built-in
- Single governance layer for data and features
- Batch + streaming + on-demand all supported natively
Trade-offs:
- Databricks lock-in — features live in the Lakehouse ecosystem
- Pricing can be steep for streaming features (always-on compute) [PRICING-CHECK]
- Smaller community than Feast for custom extensions
Best for: Teams already on Databricks who want a fully managed, integrated feature platform.
Feast (Open Source)
Feast remains the leading open-source feature store. It's provider-agnostic — you bring your own offline store (BigQuery, Snowflake, Redshift, file-based) and online store (Redis, DynamoDB, SQLite, Postgres).
Strengths:
- Cloud-agnostic — runs anywhere, no vendor lock-in
- Strong community, extensive provider ecosystem
- Simple deployment (pip install, no dedicated infrastructure)
- Good for teams that already have feature engineering pipelines
Trade-offs:
- No built-in streaming transformations — you manage Flink/Spark Streaming separately
- Feature computation is your responsibility; Feast handles storage and serving
- Monitoring and observability require additional tooling
- Enterprise support is limited (no backing company since Tecton acquisition)
Best for: Multi-cloud teams, organizations that want full control, teams with existing feature pipelines that need a serving layer.
SageMaker Feature Store (AWS)
Tightly integrated with the SageMaker ML platform. Offers both online and offline stores with automatic syncing.
Strengths:
- Native AWS integration (IAM, S3, Glue, Athena)
- Automatic offline-online sync
- Feature groups with schema enforcement
- SageMaker Pipelines integration for end-to-end MLOps
Trade-offs:
- AWS-only — no multi-cloud story
- Limited streaming support compared to Databricks/Tecton
- Online store latency higher than Redis-based alternatives (~10-20ms p99) [VERIFY]
- Less flexible than Feast for custom backends
Best for: All-in AWS teams using SageMaker for training and deployment.
Vertex AI Feature Store (GCP)
Google's managed feature store, integrated with Vertex AI and BigQuery.
Strengths:
- Native BigQuery integration as offline store
- Bigtable-backed online serving (low latency at scale)
- Feature monitoring with drift detection built-in
- Strong integration with Vertex AI Pipelines
Trade-offs:
- GCP-only
- Less mature streaming support than Databricks
- Pricing complexity — multiple meters (storage, serving, sync) [PRICING-CHECK]
- Smaller ecosystem than Feast or Databricks
Best for: GCP-native teams using Vertex AI for ML workflows.
Comparison Matrix
| Capability | Databricks + Tecton | Feast (OSS) | SageMaker FS | Vertex AI FS |
|---|---|---|---|---|
| Cloud | Multi (Databricks-hosted) | Any | AWS only | GCP only |
| Offline store | Delta Lake | Pluggable | S3 + Glue | BigQuery |
| Online store | Integrated (Tecton) | Pluggable | Proprietary | Bigtable |
| Streaming transforms | ✅ Native | ❌ BYO | ⚠️ Limited | ⚠️ Limited |
| On-demand transforms | ✅ | ⚠️ Alpha | ❌ | ❌ |
| Feature catalog | Unity Catalog | Basic registry | SageMaker | Vertex AI |
| Point-in-time joins | ✅ | ✅ | ✅ | ✅ |
| Drift detection | ✅ | ❌ BYO | ⚠️ Basic | ✅ |
| Open source | ❌ | ✅ | ❌ | ❌ |
| Pricing model | DBU-based | Free (infra costs) | Per GB + requests | Per GB + requests |
Decision Framework: When to Choose What
Skip the feature matrix — here's how to actually decide.
Do You Even Need a Feature Store?
No, if:
- You have fewer than 5 models in production
- All your models use batch features with daily freshness
- A single team owns all ML — no feature sharing needed
- You're pre-product-market-fit and iterating on models weekly
Yes, if:
- Multiple teams need the same features (user profiles, product attributes)
- You need real-time features (sub-minute freshness)
- Training-serving skew has caused production incidents
- You're scaling beyond 10+ production models
Which One?
START
│
├─ Already on Databricks? ──→ Databricks Feature Store (post-Tecton)
│
├─ All-in on AWS SageMaker? ──→ SageMaker Feature Store
│
├─ All-in on GCP Vertex AI? ──→ Vertex AI Feature Store
│
├─ Multi-cloud or want control? ──→ Feast
│
└─ Need streaming transforms + no Databricks? ──→ Feast + Flink (self-managed)
The honest take: If you're on Databricks, the Tecton integration makes it hard to justify anything else. If you're not, Feast gives you the most flexibility. The cloud-native options (SageMaker, Vertex AI) are fine if you're fully committed to one cloud and want minimal operational overhead.
Architecture Anti-Patterns to Avoid
❌ Recomputing features in the serving path
If your model endpoint runs a SQL query against your data warehouse for every prediction, you don't have a feature store — you have a latency problem. Pre-materialize into an online store.
❌ Separate feature logic for training and serving
The moment you have features_training.py and features_serving.java, you've created a skew factory. One definition, two stores.
❌ Storing raw data in the online store
The online store should hold computed features, not raw events. Raw events belong in your event stream or data lake. The online store is a serving cache, not a database.
❌ Over-engineering freshness
Not every feature needs streaming. If your "user_lifetime_value" feature updates daily and your model works fine with that, don't build a streaming pipeline for it. Reserve streaming compute budget for features where freshness actually moves the needle.
❌ Skipping point-in-time joins in training
If your training pipeline joins features by entity ID without respecting event timestamps, your model is seeing the future. This is the most common and most damaging mistake in feature engineering.
Real-World Architecture: Fraud Detection
To make this concrete, here's how a real-time fraud detection system typically uses a feature store:
| Feature | Pattern | Freshness | Example |
|---|---|---|---|
| Account age | Batch | Daily | 847 days |
| 30-day transaction count | Batch | Hourly | 42 |
| 5-min transaction velocity | Streaming | Seconds | 3 txns in last 5 min |
| Geo-distance to merchant | On-demand | Real-time | 2.3 km |
| Device fingerprint match | On-demand | Real-time | 0.92 similarity |
The model receives all five features in a single API call to the feature store's online serving endpoint. Total latency budget: < 15ms. The feature store handles the complexity of fetching from different storage backends and freshness tiers.
What's Next: Feature Stores and AI Agents
The Databricks-Tecton acquisition wasn't just about ML models — it was explicitly about powering AI agents with real-time data. As LLM-based agents become production workloads, they need the same feature infrastructure that ML models do: fresh context, low latency, and consistent data.
Expect feature stores to evolve from "ML feature serving" to "real-time context serving" — providing any AI system (model or agent) with the entity-level context it needs to make decisions.
If you're exploring how your existing data sources feed into ML features, tools like Harbinger Explorer can help you profile and query raw API data directly in the browser — useful for the discovery phase before formalizing features in a store.
Key Takeaways
The dual-store architecture (offline for training, online for serving) is the proven pattern. Start with batch features, add streaming only when freshness measurably improves model performance, and use on-demand transforms as a last resort for request-dependent features.
Your next step: audit your current ML serving pipeline. If feature logic exists in more than one place, you have a skew problem — and a feature store should be your next infrastructure investment.
Continue Reading
- Cloud-Native ETL Patterns: Glue vs Data Factory vs Dataflow
- Event Streaming Architecture: Kafka vs Kinesis vs Pub/Sub vs Event Hubs
- Data Platform Observability: Metrics, Logs, and Traces
Markers:
- [VERIFY] SageMaker Feature Store online latency p99 ~10-20ms — needs verification against current AWS docs
- [PRICING-CHECK] Databricks Feature Store streaming compute pricing post-Tecton integration
- [PRICING-CHECK] Vertex AI Feature Store multi-meter pricing (storage, serving, sync)
Continue Reading
GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive
A comprehensive technical guide to building GDPR-compliant cloud data platforms — covering pseudonymisation architecture, Terraform infrastructure, Kubernetes deployments, right-to-erasure workflows, and cloud provider comparison tables.
Cloud Cost Allocation Strategies for Data Teams
A practitioner's guide to cloud cost allocation for data teams—covering tagging strategies, chargeback models, Spot instance patterns, query cost optimization, and FinOps tooling with real Terraform and CLI examples.
API Gateway Architecture Patterns for Data Platforms
A deep-dive into API gateway architecture patterns for data platforms — covering data serving APIs, rate limiting, authentication, schema versioning, and the gateway-as-data-mesh pattern.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial