Inhaltsverzeichnis21 Abschnitte

TL;DR
Warum Daten-Plattformen High-Value-Targets sind
Zero-Trust-Prinzipien angewandt auf Daten
Layer 1: Identity-First Data-Access
Service-Account-Key-Files eliminieren
Databricks Unity Catalog — Identity Federation
Layer 2: Attribute-Based Access Control (ABAC)
Data-Classification-Tags
Lake Formation — ABAC-Tag-Policy
Layer 3: Column-Level Security und Data-Masking
BigQuery Column-Level Security
Snowflake Dynamic Data Masking
Layer 4: Network-Micro-Segmentation
Private Endpoints für alle Daten-Services
Layer 5: Encryption an jeder Schicht
Layer 6: Continuous Verification und Anomaly Detection
Query-Anomalie-Detection
Harbinger Explorer für API-Access-Auditing
Zero-Trust-Maturity-Modell
FAQ
Zusammenfassung

Zero Trust Architektur für Data-Plattformen

"Never trust, always verify" — das Zero-Trust-Prinzip — wurde für Netzwerk-Security geprägt, ist aber zunehmend das richtige mentale Modell für Daten-Plattform-Access-Control. Das perimeter-basierte Modell nimmt an, alles in deinem VPC sei sicher. Moderne Daten-Plattformen umfassen Cloud-Accounts, Regionen, Drittparteien und eine Belegschaft, die aus Cafés zugreift. Der Perimeter ist weg.

Dieser Guide deckt Zero-Trust-Prinzipien speziell für Daten-Plattformen: Identity-First-Access, Attribute-Based-Controls, Encryption an jeder Schicht und Continuous Verification.

TL;DR

Identity-First: kurzlebige Credentials statt Service-Account-Keys
ABAC für Data-Classification statt RBAC
Column-Level-Masking für PII
Network-Micro-Segmentation
Continuous Verification mit Audit-Anomalie-Detection

Warum Daten-Plattformen High-Value-Targets sind

Daten-Plattformen aggregieren die sensibelste Information einer Organisation:

PII at Scale (Millionen Kunden-Records in einer Query)
Finanzdaten in analytischen Modellen
IP in ML-Training-Sets
Operative Daten, die Business-Strategie offenbaren

Ein kompromittiertes Warehouse ist nicht nur ein DSGVO-Verstoß — es ist potenziell jedes Geschäftsgeheimnis der Organisation, per SQL abfragbar.

Die klassische Antwort (VPC-Isolation + IP-Allowlisting) versagt, weil:

Die meisten Daten in Managed-Cloud-Services leben, die nicht "innerhalb" deines VPC sind
Analytischer Zugriff breite Read-Permissions braucht, die schwer zu scopen sind
Service-Accounts mit der Zeit überflüssige Permissions ansammeln

Zero-Trust-Prinzipien angewandt auf Daten

Drei Kern-Prinzipien:

Verify explicitly — jede Query ist authentifiziert
Least-privilege Access — Zugriff gescoped auf Data-Classification
Assume Breach — Sessions re-verifiziert, Anomalien gealertet

Layer 1: Identity-First Data-Access

Service-Account-Key-Files eliminieren

Long-lived Key-Files sind der häufigste Vektor für Daten-Plattform-Kompromisse. Ersetze sie durch kurzlebigen Credential-Exchange:

Workflow: App → IdP (Okta/Entra) gibt JWT (15 min) → STS gibt temporäre Credentials (1h) → API-Call → Credentials laufen ab, nächster Call re-authentifiziert.

Terraform — OIDC-Trust-Policy für GitHub Actions:

data "aws_iam_policy_document" "github_actions_trust" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRoleWithWebIdentity"]
    
    principals {
      type        = "Federated"
      identifiers = [aws_iam_openid_connect_provider.github.arn]
    }
    
    condition {
      test     = "StringLike"
      variable = "token.actions.githubusercontent.com:sub"
      values   = ["repo:myorg/data-platform:ref:refs/heads/main"]
    }
    
    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:aud"
      values   = ["sts.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "pipeline_execution" {
  name               = "data-pipeline-cicd"
  assume_role_policy = data.aws_iam_policy_document.github_actions_trust.json
  max_session_duration = 3600  # 1 Stunde max
}

Databricks Unity Catalog — Identity Federation

# Databricks Unity Catalog mit SCIM-Provisioning
resource "databricks_user" "data_engineer" {
  for_each     = var.data_engineer_emails
  user_name    = each.value
  display_name = each.key
  # SCIM handhabt Provisioning/Deprovisioning vom IdP
  # Kein lokales Passwort — nur SSO
  force_delete_repos = true
  force_delete_home_dir = true
}

resource "databricks_group_member" "de_team" {
  for_each  = var.data_engineer_emails
  group_id  = databricks_group.data_engineers.id
  member_id = databricks_user.data_engineer[each.key].id
}

# Table-Access an Gruppe geben, nicht Individuen
resource "databricks_grants" "silver_layer" {
  table = "main.silver.customer_events"

  grant {
    principal  = "data-engineers"
    privileges = ["SELECT", "MODIFY"]
  }

  grant {
    principal  = "analysts"
    privileges = ["SELECT"]
  }
}

Layer 2: Attribute-Based Access Control (ABAC)

Role-Based-Access-Control (RBAC) skaliert nicht für Daten-Plattformen. Bei 500 Tabellen, 50 Teams und 3 Environments explodiert die RBAC-Matrix. ABAC nutzt Daten-Attribute (Classification, Domain, Sensitivity) und User-Attribute (Team, Clearance, Location), um Zugriff dynamisch zu berechnen.

Data-Classification-Tags

# Jedes Daten-Asset bei Erstellung taggen
resource "aws_glue_catalog_table" "customer_pii" {
  name          = "customer_profiles"
  database_name = aws_glue_catalog_database.silver.name
  
  parameters = {
    "data_classification" = "PII"
    "data_domain"         = "customer"
    "sensitivity"         = "high"
    "gdpr_relevant"       = "true"
    "retention_days"      = "730"
    "owner_team"          = "customer-platform"
  }
  
  # ... Schema
}

Lake Formation — ABAC-Tag-Policy

# Zugriff basierend auf Classification-Tags
resource "aws_lakeformation_tag" "classification" {
  key    = "data_classification"
  values = ["public", "internal", "confidential", "PII", "restricted"]
}

# Data-Engineers: internal und confidential, nicht PII
resource "aws_lakeformation_tag_association" "engineer_access" {
  principal {
    iam_arn = "arn:aws:iam::123456789:role/data-engineers"
  }
  
  lf_tag_policy {
    resource_type = "TABLE"
    expression {
      key    = "data_classification"
      values = ["public", "internal", "confidential"]
    }
  }
  
  permissions = ["SELECT", "DESCRIBE"]
}

# PII-Access braucht explizite DPO-Genehmigung
resource "aws_lakeformation_tag_association" "pii_approved_access" {
  principal {
    iam_arn = "arn:aws:iam::123456789:role/pii-approved-analysts"
  }
  
  lf_tag_policy {
    resource_type = "TABLE"
    expression {
      key    = "data_classification"
      values = ["PII"]
    }
  }
  
  permissions = ["SELECT"]
  permissions_with_grant_option = []
}

Layer 3: Column-Level Security und Data-Masking

Auch User mit Table-Access sollten nicht immer alle Spalten sehen. Column-Level Security mit Dynamic Masking implementiert das ohne Daten-Duplikation.

BigQuery Column-Level Security

-- Policy-Tag-Taxonomy erstellen (via Data Catalog API oder Terraform)

-- Policy-Tag auf sensitive Spalte
CREATE OR REPLACE TABLE analytics.customer_orders (
  order_id        STRING,
  customer_id     STRING,
  email           STRING OPTIONS (
    description='PII — geschützt durch Policy-Tag',
    policy_tags='"projects/my-project/locations/us/taxonomies/12345/policyTags/67890"'
  ),
  amount_usd      NUMERIC,
  created_at      TIMESTAMP
);

-- Analysten ohne "PII Viewer"-Rolle sehen:
-- SELECT * → email-Spalte gibt NULL oder REDACTED
-- Kein Fehler, kein Hinweis dass maskiert wird

Snowflake Dynamic Data Masking

-- Masking-Policy erstellen
CREATE OR REPLACE MASKING POLICY pii_email_mask AS (val STRING)
RETURNS STRING ->
  CASE
    WHEN CURRENT_ROLE() IN ('PII_APPROVED_ANALYST', 'DPO_TEAM') THEN val
    WHEN CURRENT_ROLE() = 'ANALYST' THEN 
      REGEXP_REPLACE(val, '(.{2}).*(@.*)', '\1***\2')  -- Partial Mask
    ELSE '***REDACTED***'
  END;

-- Auf Spalte anwenden
ALTER TABLE customer_orders 
  MODIFY COLUMN email 
  SET MASKING POLICY pii_email_mask;

-- Als Analyst-Rolle testen:
USE ROLE ANALYST;
SELECT email FROM customer_orders LIMIT 5;
-- Returns: jo***@example.com, ma***@company.org, ...

Layer 4: Network-Micro-Segmentation

Private Endpoints für alle Daten-Services

# S3 Gateway Endpoint (gratis)
resource "aws_vpc_endpoint" "s3" {
  vpc_id          = aws_vpc.data_platform.id
  service_name    = "com.amazonaws.${var.region}.s3"
  route_table_ids = aws_route_table.private[*].id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = "*"
      Action    = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
      Resource  = [
        aws_s3_bucket.lakehouse.arn,
        "${aws_s3_bucket.lakehouse.arn}/*"
      ]
    }]
  })
}

# S3-Bucket auf VPC-Endpoint einschränken
resource "aws_s3_bucket_policy" "lakehouse_vpc_only" {
  bucket = aws_s3_bucket.lakehouse.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Deny"
      Principal = "*"
      Action    = "s3:*"
      Resource  = [
        aws_s3_bucket.lakehouse.arn,
        "${aws_s3_bucket.lakehouse.arn}/*"
      ]
      Condition = {
        StringNotEquals = {
          "aws:sourceVpce" = aws_vpc_endpoint.s3.id
        }
      }
    }]
  })
}

Layer 5: Encryption an jeder Schicht

Separate KMS-Keys pro Data-Classification, automatische 90-Tage-Rotation, Key-Policy: Deny-Root, explizite Grants nur.

# Separate KMS-Keys pro Data-Classification
resource "aws_kms_key" "pii_data" {
  description             = "PII Data Encryption — Data-Platform"
  deletion_window_in_days = 30
  enable_key_rotation     = true
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Enable DPO team management"
        Effect = "Allow"
        Principal = { AWS = var.dpo_team_role_arn }
        Action = ["kms:*"]
        Resource = "*"
      },
      {
        Sid    = "Allow approved roles to use key"
        Effect = "Allow"
        Principal = { AWS = [
          var.pii_pipeline_role_arn,
          var.pii_analyst_role_arn
        ]}
        Action = ["kms:GenerateDataKey", "kms:Decrypt"]
        Resource = "*"
      },
      {
        Sid    = "Deny all others"
        Effect = "Deny"
        Principal = { AWS = "*" }
        Action = ["kms:GenerateDataKey", "kms:Decrypt"]
        Resource = "*"
        Condition = {
          StringNotLike = {
            "aws:PrincipalArn" = [
              var.dpo_team_role_arn,
              var.pii_pipeline_role_arn,
              var.pii_analyst_role_arn
            ]
          }
        }
      }
    ]
  })
}

Layer 6: Continuous Verification und Anomaly Detection

Zero Trust ist nicht "einmal verifizieren und vertrauen". Es ist kontinuierlich.

Query-Anomalie-Detection

# Pseudocode für Query-Audit-Log-Analyse
# Als geplanter Spark-Job auf CloudTrail/Audit-Logs

from pyspark.sql import functions as F

audit_logs = spark.table("security.data_access_audit")

# Unusual Data-Volume-Access erkennen
anomalies = (
    audit_logs
    .where(F.col("event_date") == F.current_date())
    .groupBy("principal_id", "table_name")
    .agg(
        F.sum("bytes_scanned").alias("bytes_today"),
        F.count("*").alias("query_count")
    )
    .join(
        # Vergleich mit 30-Tage-Baseline
        audit_logs
        .where(F.col("event_date") >= F.date_sub(F.current_date(), 30))
        .groupBy("principal_id", "table_name")
        .agg((F.sum("bytes_scanned") / 30).alias("avg_daily_bytes")),
        on=["principal_id", "table_name"],
        how="left"
    )
    .where(F.col("bytes_today") > F.col("avg_daily_bytes") * 10)  # 10x Spike
)

# Alert via PagerDuty / Slack
anomalies.foreach(lambda row: alert_security_team(row))

Harbinger Explorer für API-Access-Auditing

Wenn deine Daten-Plattform APIs exponiert (und alle tun das — von Athena-Federation-Endpoints bis Custom-REST-APIs), brauchst du Sicht auf welche Endpoints aufgerufen werden, mit welchen Parametern, ob Responses dem Schema entsprechen. Harbinger Explorer bietet diese Test- und Monitoring-Schicht, um unerwartete Access-Patterns oder Schema-Abweichungen zu fangen, bevor sie Security-Incidents werden.

Zero-Trust-Maturity-Modell

Level	Beschreibung	Key-Controls
0 — Implicit Trust	VPC = trusted; jeder drinnen kann alles abfragen	Keine
1 — Identity-aware	Auth nötig; grobe RBAC	SSO, Basic-Roles
2 — Data-aware	ABAC auf Classification; Column-Masking	Policy-Tags, Masking-Policies
3 — Context-aware	Zugriff varies nach Zeit, Location, Device	Conditional Access, MFA-Step-up
4 — Continuous	Jede Query re-evaluiert; Anomaly-Detection; Immutable-Audit	SIEM-Integration, ML-Anomaly

Die meisten reifen Daten-Plattformen operieren auf Level 2–3. Level 4 ist passend für Organisationen mit Finanz-, Healthcare- oder Government-Daten.

FAQ

DSGVO und Zero Trust? Direkt verbunden: DSGVO Artikel 32 verlangt "angemessene technische und organisatorische Maßnahmen" — Zero-Trust-Prinzipien wie Encryption, Access-Control und Audit sind genau das.

Wie startet man als kleines Team? Layer 1 (SSO + kurze Credentials) und Layer 2 (Daten-Classification) zuerst. Impact-zu-Aufwand-Ratio am höchsten.

Welche Tools für DACH-Mittelstand? Azure Entra ID + Microsoft Purview, AWS IAM Identity Center + Lake Formation, Google Cloud IAM. Alle EU-Region-fähig.

Wie misst man Zero-Trust-Erfolg? Anzahl Long-lived Credentials (Ziel: 0), MTTR bei Compromise, % Daten klassifiziert, Anomalien gefangen vor Incident.

Was kostet das? Tools selbst meist im bestehenden Cloud-Bundle. Aufwand: 1–2 Engineer-Monate für Layer 1+2 in mittlerer Plattform.

Zusammenfassung

Zero Trust für Daten-Plattformen ist eine geschichtete Disziplin: Identity-First-Auth eliminiert das Key-File-Problem; ABAC skaliert Access-Control über RBAC hinaus; Column-Level-Masking schützt sensitive Felder ohne Daten-Duplikation; Network-Micro-Segmentation entfernt Lateral-Movement; und Continuous Verification fängt Anomalien, bevor sie zu Breaches werden.

Starte mit Layer 1 (Key-Files weg, SSO erzwingen) und Layer 2 (Daten klassifizieren, ABAC anwenden). Die Impact-zu-Aufwand-Ratio ist dort am höchsten und legt das Fundament für tiefere Controls.

Harbinger Explorer 7 Tage gratis — validiere deine Daten-API-Security-Posture, teste dass Access-Controls korrekt antworten, monitor für unerwartete Patterns. harbingerexplorer.com

Stand: 14. Mai 2026.

Geschrieben von

Harbinger Team

Cloud-, Data- und AI-Engineer in DACH. Schreibt seit 2018 über infrastrukturkritische Tech-Entscheidungen — keine Marketing- Folien, sondern echte Trade-offs aus Production-Workloads.

Mehr über Marc hello@harbingerexplorer.com

Hat dir das geholfen?

Jede Woche ein neuer Artikel über DACH-Cloud, Data und AI — direkt in dein Postfach. Kein Spam, kein Marketing-Sprech.

Kein Spam. 1-Klick-Abmeldung. Datenschutz bei Loops.so.

Zero Trust Architektur für Data-Plattformen (2026)