cloud-architecture

Published: Apr 3, 2026

API Gateway Architecture Patterns for Data Platforms

13 min read·Tags: api-gateway, data-platform, data-mesh, rest-api, rate-limiting, platform-engineering

API Gateway Architecture Patterns for Data Platforms

Data platforms have traditionally served data through direct warehouse connections, JDBC endpoints, and blob storage presigned URLs. As organizations mature, these ad-hoc access patterns create governance nightmares: who has access to what, how much are they querying, and is the API contract stable enough for downstream teams to depend on?

API gateways solve this by inserting a managed control plane between data consumers and data systems. This guide covers the patterns that work in production for data platform teams.

Why Data Platforms Need API Gateways

The case for gateways isn't just about security theatre. It's about operational capability:

Problem Without Gateway	Gateway Solution
Direct warehouse connections hit quota limits	Rate limiting per consumer
No visibility into who's querying what	Centralized access logging
Consumers tightly coupled to warehouse internals	Schema abstraction layer
Auth handled ad-hoc per service	Centralized OAuth/API key auth
No way to deprecate old schemas safely	API versioning + sunset headers
Cross-team data access negotiated manually	Self-service data product APIs

Gateway Architecture Patterns

Pattern 1: Passthrough Gateway (Simple)

The simplest pattern: gateway handles auth and rate limiting, passes requests directly to the warehouse or data service.

Loading diagram...

Good for: small teams, internal APIs, early-stage data products.

Limitation: tightly couples API schema to warehouse schema — any warehouse refactor breaks consumers.

Pattern 2: Transformation Gateway (Recommended)

The gateway applies a transformation layer: incoming REST requests are translated into warehouse queries, responses are shaped before returning.

Loading diagram...

This pattern enables stable API contracts that survive warehouse refactors. The transformation layer owns the mapping between API schema (what consumers see) and warehouse schema (what actually stores the data).

Pattern 3: Data Mesh Gateway (Advanced)

In a data mesh architecture, the gateway is the entry point to domain-owned data products. Each domain exposes its data through a standardized API contract; the gateway provides discovery, federation, and cross-domain lineage.

Loading diagram...

The central gateway in a data mesh context handles:

API discovery (which data products exist and what they expose)
Cross-domain auth (consumers auth once, gateway negotiates domain permissions)
Lineage tracking (which consumers depend on which data products)

Harbinger Explorer is well-suited to the discovery and lineage layer in this pattern — it maintains the cross-domain dependency graph that makes data mesh governance tractable.

Implementation: AWS API Gateway + Lambda Authorizer

Terraform Configuration

# API Gateway for Data Platform
resource "aws_api_gateway_rest_api" "data_platform" {
  name        = "data-platform-api"
  description = "Central API gateway for data platform products"

  endpoint_configuration {
    types = ["REGIONAL"]
  }

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = "*"
      Action    = "execute-api:Invoke"
      Resource  = "arn:aws:execute-api:*:*:*"
      Condition = {
        IpAddress = {
          "aws:SourceIp" = var.allowed_cidr_ranges
        }
      }
    }]
  })
}

resource "aws_api_gateway_deployment" "data_platform" {
  rest_api_id = aws_api_gateway_rest_api.data_platform.id

  triggers = {
    redeployment = sha1(jsonencode(aws_api_gateway_rest_api.data_platform.body))
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_api_gateway_stage" "production" {
  deployment_id = aws_api_gateway_deployment.data_platform.id
  rest_api_id   = aws_api_gateway_rest_api.data_platform.id
  stage_name    = "v1"

  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.api_access_log.arn
    format = jsonencode({
      requestId      = "$context.requestId"
      sourceIp       = "$context.identity.sourceIp"
      requestTime    = "$context.requestTime"
      protocol       = "$context.protocol"
      httpMethod     = "$context.httpMethod"
      resourcePath   = "$context.resourcePath"
      routeKey       = "$context.routeKey"
      status         = "$context.status"
      responseLength = "$context.responseLength"
      integrationLatency = "$context.integrationLatency"
      userAgent      = "$context.identity.userAgent"
      # Custom: data platform tracking
      consumerId     = "$context.authorizer.consumerId"
      dataProduct    = "$context.authorizer.dataProduct"
    })
  }

  default_route_settings {
    throttling_burst_limit = 100
    throttling_rate_limit  = 50
  }
}

# Usage plans for rate limiting per consumer tier
resource "aws_api_gateway_usage_plan" "free_tier" {
  name = "data-platform-free"

  api_stages {
    api_id = aws_api_gateway_rest_api.data_platform.id
    stage  = aws_api_gateway_stage.production.stage_name
  }

  throttle_settings {
    burst_limit = 10
    rate_limit  = 5
  }

  quota_settings {
    limit  = 10000
    period = "MONTH"
  }
}

resource "aws_api_gateway_usage_plan" "professional" {
  name = "data-platform-professional"

  api_stages {
    api_id = aws_api_gateway_rest_api.data_platform.id
    stage  = aws_api_gateway_stage.production.stage_name
  }

  throttle_settings {
    burst_limit = 500
    rate_limit  = 100
  }

  quota_settings {
    limit  = 1000000
    period = "MONTH"
  }
}

Lambda Authorizer for JWT Validation

# lambda_authorizer.py
import json
import os
import re
from typing import Optional
import boto3
import jwt
from jwt import PyJWKClient

JWKS_URI = os.environ["JWKS_URI"]  # e.g. https://auth.company.com/.well-known/jwks.json
AUDIENCE = os.environ["TOKEN_AUDIENCE"]

jwks_client = PyJWKClient(JWKS_URI, cache_keys=True)

def handler(event: dict, context) -> dict:
    '''Lambda authorizer: validate JWT and return IAM policy.'''
    token = extract_token(event)
    
    if not token:
        raise Exception("Unauthorized")
    
    try:
        signing_key = jwks_client.get_signing_key_from_jwt(token)
        payload = jwt.decode(
            token,
            signing_key.key,
            algorithms=["RS256"],
            audience=AUDIENCE
        )
    except jwt.ExpiredSignatureError:
        raise Exception("Unauthorized: token expired")
    except jwt.InvalidTokenError as e:
        raise Exception(f"Unauthorized: {e}")
    
    consumer_id = payload.get("sub")
    scopes = payload.get("scope", "").split()
    
    # Map scopes to API Gateway resource permissions
    policy = build_policy(consumer_id, scopes, event["methodArn"])
    policy["context"] = {
        "consumerId": consumer_id,
        "scopes": " ".join(scopes),
        "dataProduct": extract_data_product(event["methodArn"])
    }
    
    return policy


def extract_token(event: dict) -> Optional[str]:
    auth_header = event.get("authorizationToken", "")
    if auth_header.startswith("Bearer "):
        return auth_header[7:]
    return event.get("queryStringParameters", {}).get("token")


def build_policy(principal: str, scopes: list, method_arn: str) -> dict:
    # Parse ARN to determine which resources to allow
    arn_parts = method_arn.split(":")
    region = arn_parts[3]
    account = arn_parts[4]
    api_id = arn_parts[5].split("/")[0]
    stage = arn_parts[5].split("/")[1]
    
    allowed_resources = []
    
    # Map scopes to allowed resource paths
    scope_resource_map = {
        "data:orders:read": f"arn:aws:execute-api:{region}:{account}:{api_id}/{stage}/GET/v1/orders*",
        "data:customers:read": f"arn:aws:execute-api:{region}:{account}:{api_id}/{stage}/GET/v1/customers*",
        "data:inventory:read": f"arn:aws:execute-api:{region}:{account}:{api_id}/{stage}/GET/v1/inventory*",
        "data:admin": f"arn:aws:execute-api:{region}:{account}:{api_id}/{stage}/*/*",
    }
    
    for scope in scopes:
        if scope in scope_resource_map:
            allowed_resources.append(scope_resource_map[scope])
    
    return {
        "principalId": principal,
        "policyDocument": {
            "Version": "2012-10-17",
            "Statement": [{
                "Action": "execute-api:Invoke",
                "Effect": "Allow" if allowed_resources else "Deny",
                "Resource": allowed_resources or [method_arn]
            }]
        }
    }


def extract_data_product(method_arn: str) -> str:
    # Extract data product name from ARN path
    parts = method_arn.split("/")
    if len(parts) >= 4:
        return parts[3]  # e.g. "orders" from /v1/orders
    return "unknown"

API Versioning Strategy

Data API versioning requires more careful thought than typical REST APIs because downstream consumers often run batch jobs that can't be updated instantaneously.

URL-Based Versioning (Recommended for Data APIs)

/v1/orders          → stable, supported
/v2/orders          → new schema, active development
/v1/orders [Sunset: 2025-09-01] → deprecated, add Sunset header

Always include Sunset and Deprecation headers for deprecated versions:

# FastAPI: data product endpoint with versioning headers
from fastapi import FastAPI, Response
from datetime import datetime

app = FastAPI()

@app.get("/v1/orders")
async def get_orders_v1(response: Response):
    # V1 is deprecated — add sunset headers
    response.headers["Deprecation"] = "true"
    response.headers["Sunset"] = "Sat, 01 Sep 2025 00:00:00 GMT"
    response.headers["Link"] = '</v2/orders>; rel="successor-version"'
    
    # Return V1 schema (legacy format)
    return {"orders": [], "total": 0, "page": 1}

@app.get("/v2/orders")
async def get_orders_v2(
    response: Response,
    date_from: str = None,
    date_to: str = None,
    status: str = None,
    limit: int = 100,
    cursor: str = None
):
    # V2: cursor-based pagination, ISO dates, richer filtering
    response.headers["X-Data-Version"] = "2.0"
    
    return {
        "data": [],
        "pagination": {
            "cursor": None,
            "has_more": False,
            "limit": limit
        },
        "meta": {
            "generated_at": datetime.utcnow().isoformat(),
            "data_freshness_seconds": 30
        }
    }

Rate Limiting Patterns

Kong Gateway Configuration

For teams using Kong as their gateway layer:

# Kong declarative config (deck format)
services:
  - name: orders-data-product
    url: http://orders-service.data-platform.svc.cluster.local:8080
    routes:
      - name: orders-api
        paths:
          - /v1/orders
          - /v2/orders
        methods:
          - GET
    plugins:
      - name: rate-limiting
        config:
          minute: 60
          hour: 1000
          policy: redis
          redis_host: redis.infra.svc.cluster.local
          redis_port: 6379
          redis_database: 1
          limit_by: consumer
          
      - name: jwt
        config:
          secret_is_base64: false
          claims_to_verify:
            - exp
            - nbf
            
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Consumer-ID:$(consumer.id)"
              - "X-Data-Platform-Gateway:true"
              
      - name: response-transformer
        config:
          add:
            headers:
              - "X-Rate-Limit-Info:see X-RateLimit-* headers"
              
      - name: http-log
        config:
          http_endpoint: http://audit-log.data-platform.svc.cluster.local/api/access
          method: POST
          content_type: application/json

Observability for Data APIs

A data API gateway should emit signals that answer: who is consuming what data, how fast, and with what freshness?

Key metrics to track:

Request rate per consumer — identify heavy users before they hit quotas
Latency p95/p99 per endpoint — data queries have long tails; median is misleading
Cache hit rate — poor hit rates mean expensive warehouse queries on every request
Error rate by type — 429s (quota) vs 503s (upstream unavailable) need different responses
Data freshness of served responses — critical for consumers who need near-real-time data

Combining gateway metrics with data platform observability (table freshness, pipeline health) in a unified view — as Harbinger Explorer provides — gives teams the full picture from raw ingestion through to API consumption.

Summary

API gateways for data platforms aren't just a security checkbox — they're the foundation of a governed, scalable data serving layer. The patterns that work in production:

Transformation gateways decouple API contracts from warehouse internals — always worth the investment
Scope-based authorization with JWT is more flexible than API keys for complex permission models
URL versioning with Sunset headers gives downstream consumers a reliable deprecation signal
Data mesh gateways federating domain APIs work best when backed by a data catalog for discovery
Kong or AWS API Gateway for rate limiting — don't build this yourself

Try Harbinger Explorer free for 7 days — track API consumption patterns across your data products, get visibility into consumer dependencies, and manage data API governance at scale.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

API Gateway Architecture Patterns for Data Platforms

API Gateway Architecture Patterns for Data Platforms

Why Data Platforms Need API Gateways

Gateway Architecture Patterns

Pattern 1: Passthrough Gateway (Simple)

Pattern 2: Transformation Gateway (Recommended)

Pattern 3: Data Mesh Gateway (Advanced)

Implementation: AWS API Gateway + Lambda Authorizer

Terraform Configuration

Lambda Authorizer for JWT Validation

API Versioning Strategy

URL-Based Versioning (Recommended for Data APIs)

Rate Limiting Patterns

Kong Gateway Configuration

Observability for Data APIs

Summary

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

Cloud-Agnostic Data Lakehouse: Portable Architectures

Try Harbinger Explorer for free

API Gateway Architecture Patterns for Data Platforms

Why Data Platforms Need API Gateways

Gateway Architecture Patterns

Pattern 1: Passthrough Gateway (Simple)

Pattern 2: Transformation Gateway (Recommended)

Pattern 3: Data Mesh Gateway (Advanced)

Implementation: AWS API Gateway + Lambda Authorizer

Terraform Configuration

Lambda Authorizer for JWT Validation

API Versioning Strategy

URL-Based Versioning (Recommended for Data APIs)

Rate Limiting Patterns

Kong Gateway Configuration

Observability for Data APIs

Summary

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

Cloud-Agnostic Data Lakehouse: Portable Architectures

Try Harbinger Explorer for free

Command Palette