cloud-architecture

Published: Apr 3, 2026

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

14 min read·Tags: gdpr, compliance, cloud, data-engineering, terraform, kubernetes, privacy, aws, gcp, azure, security

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Building cloud data platforms that are both powerful and GDPR-compliant is one of the most nuanced engineering challenges of our era. The regulation isn't just a legal checkbox — it fundamentally shapes how you architect data pipelines, choose cloud services, and manage the lifecycle of personal data. This guide walks through the technical realities of achieving GDPR compliance in modern cloud data stacks, complete with infrastructure-as-code examples and reference architectures.

Why GDPR Is an Engineering Problem

Most teams treat GDPR as a legal problem and hand it off to compliance teams. That's a mistake. At its core, GDPR is about data architecture:

Article 5 — Data minimisation, purpose limitation, storage limitation
Article 17 — Right to erasure ("right to be forgotten")
Article 20 — Data portability
Article 25 — Data protection by design and by default
Article 32 — Security of processing (encryption, pseudonymisation)
Article 35 — Data Protection Impact Assessments (DPIA)

Each of these has direct implications for how you design your ingestion layers, storage, access control, and APIs. Engineers own this.

Reference Architecture: GDPR-Compliant Cloud Data Platform

The following diagram illustrates a reference architecture for a GDPR-compliant data platform on AWS or GCP:

Loading diagram...

Key Architectural Tenets

PII never enters the raw zone unclassified
Pseudonymisation tokens are the only reference to PII in analytics
The PII Vault is the single source of truth for personal data
All access is logged immutably
Erasure is automated and verifiable

Terraform: Building the Compliance Infrastructure

Let's look at concrete Terraform for a GCP-based compliant data platform.

1. Encrypted Storage Buckets with Data Retention Policies

resource "google_storage_bucket" "raw_zone" {
  name          = "${var.project_id}-raw-zone"
  location      = "EU"
  storage_class = "STANDARD"

  # Enforce encryption at rest with CMEK
  encryption {
    default_kms_key_name = google_kms_crypto_key.data_key.id
  }

  # Enforce retention — storage limitation (Art. 5)
  retention_policy {
    is_locked        = true
    retention_period = 7776000  # 90 days in seconds
  }

  # Prevent public access
  uniform_bucket_level_access = true

  # Versioning for audit trail
  versioning {
    enabled = true
  }

  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type = "Delete"
    }
  }
}

resource "google_kms_key_ring" "gdpr_ring" {
  name     = "gdpr-keyring"
  location = "europe-west3"
}

resource "google_kms_crypto_key" "data_key" {
  name            = "gdpr-data-key"
  key_ring        = google_kms_key_ring.gdpr_ring.id
  rotation_period = "7776000s"  # 90-day rotation

  lifecycle {
    prevent_destroy = true
  }
}

2. IAM: Least-Privilege Access (Art. 25 — Privacy by Design)

# Data Engineer role — can read pseudonymised data only
resource "google_project_iam_custom_role" "data_engineer" {
  role_id     = "dataEngineerGDPR"
  title       = "Data Engineer (GDPR Compliant)"
  description = "Access to pseudonymised zones only — no PII Vault"
  permissions = [
    "bigquery.tables.getData",
    "bigquery.tables.list",
    "bigquery.jobs.create",
    "storage.objects.get",
    "storage.objects.list",
  ]
}

# PII Vault access — restricted to compliance service account only
resource "google_storage_bucket_iam_binding" "pii_vault_access" {
  bucket = google_storage_bucket.pii_vault.name
  role   = "roles/storage.objectViewer"

  members = [
    "serviceAccount:${google_service_account.erasure_service.email}",
    "serviceAccount:${google_service_account.portability_service.email}",
  ]
}

# Deny all other access to PII Vault
resource "google_storage_bucket_iam_deny" "pii_vault_deny_all" {
  bucket = google_storage_bucket.pii_vault.name

  deny_policy {
    deny_conditions {
      title      = "Deny non-service-accounts"
      expression = "!resource.name.startsWith('projects/_/serviceAccounts/')"
    }
    denied_permissions = ["storage.objects.get"]
  }
}

3. VPC Service Controls — Data Exfiltration Prevention

resource "google_access_context_manager_service_perimeter" "gdpr_perimeter" {
  parent = "accessPolicies/${var.access_policy_id}"
  name   = "accessPolicies/${var.access_policy_id}/servicePerimeters/gdpr_perimeter"
  title  = "GDPR Data Perimeter"

  status {
    resources = [
      "projects/${var.project_number}",
    ]

    restricted_services = [
      "bigquery.googleapis.com",
      "storage.googleapis.com",
      "dataflow.googleapis.com",
    ]

    ingress_policies {
      ingress_from {
        identity_type = "SERVICE_ACCOUNT"
        identities    = ["serviceAccount:${var.pipeline_sa}"]
      }
      ingress_to {
        resources = ["*"]
        operations {
          service_name = "bigquery.googleapis.com"
          method_selectors {
            method = "BigQueryStorage.ReadRows"
          }
        }
      }
    }
  }
}

Kubernetes: Deploying the Pseudonymisation Service

The pseudonymisation service is the heart of your GDPR architecture. Here's the Kubernetes manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pseudonymisation-service
  namespace: gdpr-compliance
  labels:
    app: pseudonymisation-service
    gdpr-component: "true"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: pseudonymisation-service
  template:
    metadata:
      labels:
        app: pseudonymisation-service
      annotations:
        # Force pod restart on key rotation
        secret-hash: "${SHA256_OF_KEY}"
    spec:
      serviceAccountName: pseudonymisation-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: pseudonymisation
          image: gcr.io/${PROJECT_ID}/pseudonymisation-service:1.4.2
          ports:
            - containerPort: 8080
          env:
            - name: KMS_KEY_NAME
              valueFrom:
                secretKeyRef:
                  name: gdpr-secrets
                  key: kms-key-name
            - name: VAULT_BUCKET
              value: ${PII_VAULT_BUCKET}
            - name: AUDIT_LOG_TOPIC
              value: projects/${PROJECT_ID}/topics/gdpr-audit
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: pseudonymisation-netpol
  namespace: gdpr-compliance
spec:
  podSelector:
    matchLabels:
      app: pseudonymisation-service
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: data-pipeline
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to: []  # Only KMS and GCS via VPC SC
      ports:
        - protocol: TCP
          port: 443

Data Catalog: Classifying PII at Ingestion

A crucial part of GDPR compliance is knowing what data you have. Use a YAML-based data catalog that feeds your classification engine:

# data-catalog/schemas/user_events.yaml
schema:
  name: user_events
  version: "2.1"
  gdpr_classification: "personal_data"
  dpia_required: true
  retention_days: 90
  legal_basis: "legitimate_interest"
  
  fields:
    - name: event_id
      type: UUID
      pii: false
      
    - name: user_id
      type: string
      pii: true
      pii_category: "indirect_identifier"
      pseudonymisation: "token_replace"
      vault_key: "user_tokens"
      
    - name: email
      type: string
      pii: true
      pii_category: "contact_data"
      pseudonymisation: "hash_hmac_sha256"
      erasable: true
      
    - name: ip_address
      type: string
      pii: true
      pii_category: "online_identifier"
      pseudonymisation: "ip_masking"
      masking_strategy: "last_octet"
      
    - name: event_type
      type: string
      pii: false
      
    - name: timestamp
      type: timestamp
      pii: false
      retention_trigger: true

Comparison: Cloud Provider GDPR Tooling

Feature	AWS	GCP	Azure
Data Residency	Region-specific S3, RDS	Regional GCS, BigQuery	Geo-restricted Azure Storage
CMEK Support	AWS KMS + SSE-KMS	Cloud KMS + CMEK	Azure Key Vault + CMK
Data Classification	Amazon Macie	Cloud DLP API	Azure Purview
Audit Logging	CloudTrail	Cloud Audit Logs	Azure Monitor + Activity Log
Data Erasure	Manual + Lambda	Cloud DLP deidentify	Azure Data Subject Requests
VPC Isolation	VPC + PrivateLink	VPC SC + Private Service Connect	VNet + Private Endpoints
PII Detection	Macie (S3 only)	DLP (text, images, structured)	Purview (broad but slower)
Compliance Reports	AWS Artifact	Compliance Reports Manager	Microsoft Service Trust Portal
SCCs / Org Policies	Service Control Policies	Organization Policies	Azure Policy
EU Data Boundary	✅ AWS EU Boundary	✅ GCP EU Boundary	✅ Azure EU Boundary

Verdict: GCP's Cloud DLP API has the most mature automated PII detection. AWS Macie is S3-only but deeply integrated. Azure Purview is catching up but remains complex to configure.

Implementing the Right to Erasure (Art. 17)

The right to be forgotten is technically the hardest GDPR requirement in data platforms. Here's a practical approach:

Erasure Workflow

Loading diagram...

Key Erasure Strategies

Strategy 1 — Token Invalidation (recommended for analytics) Don't delete records in BigQuery. Instead, invalidate the pseudonymisation token. All analytics referencing that user_id now resolve to NULL. No table scans needed.

Strategy 2 — Crypto Shredding Encrypt data with a user-specific key stored separately. Deleting the key makes all data unreadable. Works well for object storage.

Strategy 3 — Tombstoning Mark records as deleted in a deletion log table. Filter every query through this log. Simple but adds query overhead.

Data Processing Agreements and Cross-Border Transfers

Standard Contractual Clauses (SCCs) for Cloud Processors

When data leaves the EU — even to a US-based cloud service — you need SCCs. Map your data flows:

Data Flow	Transfer Mechanism	Risk Level
EU → AWS EU (Ireland/Frankfurt)	Within EEA — no SCC needed	🟢 Low
EU → AWS US	SCC Module 2 (Controller → Processor)	🟡 Medium
EU → Subprocessors (e.g., Datadog)	SCC in DPA + Article 28 clauses	🟡 Medium
EU → China/Russia	Adequacy decision absent — generally prohibited	🔴 High
EU → Canada	Adequacy decision in place	🟢 Low

Monitoring and Alerting for GDPR Incidents

Under GDPR Article 33, you have 72 hours to notify the supervisory authority of a personal data breach. Your monitoring must be fast:

# alerting/gdpr-breach-detection.yaml
alerts:
  - name: Unauthorized PII Access
    condition: >
      SELECT COUNT(*) FROM audit_logs
      WHERE resource = 'pii_vault'
      AND principal NOT IN (SELECT sa FROM allowed_principals)
      AND timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE)
    threshold: 1
    severity: CRITICAL
    notification:
      - channel: pagerduty
        policy: gdpr-incident
      - channel: email
        recipients:
          - dpo@company.com
          - cto@company.com
    sla_hours: 72  # GDPR breach notification window

  - name: Bulk Data Export Anomaly
    condition: >
      bytes_exported > 10GB AND timeframe = '1h'
      AND NOT in_approved_jobs
    severity: HIGH
    
  - name: Retention Policy Violation
    condition: >
      data_age_days > retention_policy_days
      AND data_classification IN ('personal_data', 'sensitive_data')
    severity: MEDIUM
    auto_remediate: true
    remediation: trigger_erasure_workflow

GDPR Compliance Checklist for Cloud Data Engineers

Control	Implementation	Status Check
Data Inventory	Automated catalog with PII tagging	Scan new tables on ingestion
Consent Management	Consent flags in user profiles	Block processing if no consent
Data Minimisation	Schema-level field necessity review	Quarterly schema audits
Pseudonymisation	Token vault with HMAC-SHA256	Pen test token reversibility
Encryption at Rest	CMEK with 90-day rotation	Key rotation alerts
Encryption in Transit	TLS 1.3 enforced at load balancer	TLS scan weekly
Access Control	RBAC with principle of least privilege	Quarterly access reviews
Audit Logging	Immutable logs, 3-year retention	Log integrity checks daily
Right to Erasure	Automated within 30 days	Monthly erasure SLA report
Data Portability	Machine-readable export API	Quarterly API testing
DPIA Documentation	For high-risk processing activities	Before new data types
Breach Detection	< 24h detection, < 72h notification	Incident drill biannual
DPA with Processors	Signed SCCs with all vendors	Annual DPA audit

Conclusion

GDPR compliance in cloud data platforms is achievable — but only if you treat it as an engineering discipline, not a legal afterthought. The key principles are:

Build PII isolation into your architecture from day one — retrofitting is 10x harder
Automate everything — manual compliance processes fail under scale
Pseudonymise, don't anonymise — true anonymisation is nearly impossible; pseudonymisation is tractable
Make erasure cheap — crypto shredding and token invalidation are your friends
Log everything immutably — when the regulator asks, you need receipts

The architectures and code in this guide are battle-tested patterns for teams building on AWS, GCP, or Azure. Start with the data catalog and PII classifier — everything else follows from knowing what data you have.

Ready to build a GDPR-compliant geopolitical intelligence platform?

Harbinger Explorer processes global event data from hundreds of sources with privacy-by-design architecture built in.

Try Harbinger Explorer free for 7 days →

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Why GDPR Is an Engineering Problem

Reference Architecture: GDPR-Compliant Cloud Data Platform

Key Architectural Tenets

Terraform: Building the Compliance Infrastructure

1. Encrypted Storage Buckets with Data Retention Policies

2. IAM: Least-Privilege Access (Art. 25 — Privacy by Design)

3. VPC Service Controls — Data Exfiltration Prevention

Kubernetes: Deploying the Pseudonymisation Service

Data Catalog: Classifying PII at Ingestion

Comparison: Cloud Provider GDPR Tooling

Implementing the Right to Erasure (Art. 17)

Erasure Workflow

Key Erasure Strategies

Data Processing Agreements and Cross-Border Transfers

Standard Contractual Clauses (SCCs) for Cloud Processors

Monitoring and Alerting for GDPR Incidents

GDPR Compliance Checklist for Cloud Data Engineers

Conclusion

Continue Reading

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Cloud-Agnostic Data Lakehouse: Portable Architectures

Try Harbinger Explorer for free

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Why GDPR Is an Engineering Problem

Reference Architecture: GDPR-Compliant Cloud Data Platform

Key Architectural Tenets

Terraform: Building the Compliance Infrastructure

1. Encrypted Storage Buckets with Data Retention Policies

2. IAM: Least-Privilege Access (Art. 25 — Privacy by Design)

3. VPC Service Controls — Data Exfiltration Prevention

Kubernetes: Deploying the Pseudonymisation Service

Data Catalog: Classifying PII at Ingestion

Comparison: Cloud Provider GDPR Tooling

Implementing the Right to Erasure (Art. 17)

Erasure Workflow

Key Erasure Strategies

Data Processing Agreements and Cross-Border Transfers

Standard Contractual Clauses (SCCs) for Cloud Processors

Monitoring and Alerting for GDPR Incidents

GDPR Compliance Checklist for Cloud Data Engineers

Conclusion

Continue Reading

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Cloud-Agnostic Data Lakehouse: Portable Architectures

Try Harbinger Explorer for free

Command Palette