Databricks Cluster Policies for Cost Control: A Practical Guide
Databricks Cluster Policies for Cost Control: A Practical Guide
Databricks is powerful. It's also remarkably easy to accidentally spin up a 32-node Standard_E64s_v3 cluster and forget to terminate it. Anyone who's managed a Databricks workspace for more than a month has a story like this.
Cluster policies are Databricks' mechanism for preventing these expensive mistakes while still giving your team the flexibility they need. Done right, they're invisible guardrails — your engineers can work freely within a safe boundary.
What Are Cluster Policies?
A cluster policy is a JSON document that constrains the values users can set when creating or editing clusters. Policies can:
- Fix a value (e.g., always enable auto-termination)
- Limit a value to a range or list (e.g., max 8 workers)
- Set defaults (e.g., default to spot instances)
- Hide fields from the UI (simplify the creation experience)
- Require specific values (e.g., must tag clusters with a cost center)
Policies are assigned to users, groups, or service principals. A user can only create clusters using policies they have access to (unless they're a workspace admin).
Why Cluster Policies Matter (With Numbers)
Consider a team of 10 data engineers. Without policies:
| Scenario | Config | Hourly Cost |
|---|---|---|
| Overpowered dev cluster | 8x Standard_E16s_v3 (on-demand) | ~$18/hr |
| Forgotten overnight cluster | 4x Standard_DS3_v2 (on-demand) | ~$30 total |
| Production job over-provisioned | 16x Standard_E32s_v3 | ~$60/hr |
A single forgotten cluster running for a weekend = ~$150 in waste. Multiply by 10 engineers over a year, and you're looking at thousands in avoidable spend.
With cluster policies enforcing auto-termination and spot instances:
- Auto-terminates after 30 min = ~$1.50 wasted instead of $30
- Uses spot instances = ~60% cheaper baseline
Policy Definition Language
Policies are JSON documents. Each attribute maps to a constraint type:
{
"attribute_name": {
"type": "fixed | range | allowlist | blocklist | regex | unlimited | forbidden",
"value": "...",
"minValue": 0,
"maxValue": 10,
"values": [],
"defaultValue": "...",
"hidden": true
}
}
Policy Examples
1. Basic Cost Control Policy (for all data engineers)
{
"autotermination_minutes": {
"type": "range",
"minValue": 10,
"maxValue": 120,
"defaultValue": 30
},
"num_workers": {
"type": "range",
"minValue": 1,
"maxValue": 8
},
"node_type_id": {
"type": "allowlist",
"values": [
"Standard_DS3_v2",
"Standard_DS4_v2",
"Standard_DS5_v2",
"Standard_E4s_v3",
"Standard_E8s_v3"
],
"defaultValue": "Standard_DS3_v2"
},
"azure_attributes.availability": {
"type": "fixed",
"value": "SPOT_WITH_FALLBACK_AZURE",
"hidden": true
},
"spark_version": {
"type": "regex",
"pattern": "^(14|15)\\.[0-9]+\\.x-scala2\\.12$",
"defaultValue": "15.4.x-scala2.12"
},
"custom_tags.team": {
"type": "fixed",
"value": "data-engineering"
},
"custom_tags.cost_center": {
"type": "allowlist",
"values": ["harbinger", "research", "infra"]
}
}
This policy:
- Forces auto-termination between 10-120 minutes (default 30)
- Limits cluster size to 8 workers max
- Restricts to cost-effective instance types
- Forces spot instances (transparent to user)
- Ensures LTS Spark versions
- Requires cost center tagging for chargeback
2. Single-Node Policy (for interactive development)
For lightweight exploration and testing, a single-node policy keeps costs minimal:
{
"num_workers": {
"type": "fixed",
"value": 0,
"hidden": true
},
"spark_conf.spark.databricks.cluster.profile": {
"type": "fixed",
"value": "singleNode",
"hidden": true
},
"autotermination_minutes": {
"type": "fixed",
"value": 60,
"hidden": true
},
"node_type_id": {
"type": "allowlist",
"values": ["Standard_DS3_v2", "Standard_DS4_v2"],
"defaultValue": "Standard_DS3_v2"
}
}
3. Production Job Policy (for automated workflows)
Production jobs need reliability over cost savings. This policy allows larger clusters but on spot with fallback:
{
"num_workers": {
"type": "range",
"minValue": 2,
"maxValue": 32
},
"autotermination_minutes": {
"type": "fixed",
"value": 0,
"hidden": true
},
"azure_attributes.availability": {
"type": "fixed",
"value": "SPOT_WITH_FALLBACK_AZURE",
"hidden": true
},
"azure_attributes.spot_bid_max_price": {
"type": "fixed",
"value": -1,
"hidden": true
},
"custom_tags.environment": {
"type": "fixed",
"value": "production"
}
}
4. Unrestricted Policy (for workspace admins only)
Give your platform team full flexibility while still tracking usage:
{
"custom_tags.team": {
"type": "fixed",
"value": "platform"
}
}
Creating Policies via CLI
# Create a policy from a JSON file
databricks cluster-policies create \
--name "Data Engineers - Cost Controlled" \
--definition @policies/engineer_policy.json
# Update an existing policy
databricks cluster-policies edit \
--policy-id ABCD1234 \
--name "Data Engineers - Cost Controlled" \
--definition @policies/engineer_policy.json
# List all policies
databricks cluster-policies list
Creating Policies via Terraform
If you manage Databricks infrastructure with Terraform:
resource "databricks_cluster_policy" "engineer_policy" {
name = "Data Engineers - Cost Controlled"
definition = jsonencode({
"autotermination_minutes" = {
type = "range"
minValue = 10
maxValue = 120
defaultValue = 30
}
"num_workers" = {
type = "range"
minValue = 1
maxValue = 8
}
"azure_attributes.availability" = {
type = "fixed"
value = "SPOT_WITH_FALLBACK_AZURE"
hidden = true
}
"custom_tags.cost_center" = {
type = "allowlist"
values = ["harbinger", "research", "infra"]
}
})
}
resource "databricks_permissions" "engineer_policy_access" {
cluster_policy_id = databricks_cluster_policy.engineer_policy.id
access_control {
group_name = "data-engineers"
permission_level = "CAN_USE"
}
}
Enforcing Policies at Scale
Remove Default Policy Access
By default, workspace users have access to the "Unrestricted" policy. To enforce your custom policies, restrict it to admins only:
databricks permissions set \
--resource-type cluster-policies \
--resource-id 0 \
--access-controls '[{"group_name": "admins", "permission_level": "CAN_USE"}]'
Monitor Policy Compliance
Use Databricks system tables to track cluster creation and policy adherence:
-- Find clusters created without a policy (ungoverned spend)
SELECT
cluster_id,
cluster_name,
created_by,
create_time,
node_type_id,
num_workers,
autotermination_minutes
FROM system.compute.clusters
WHERE policy_id IS NULL
AND create_time >= CURRENT_TIMESTAMP - INTERVAL 30 DAYS
ORDER BY create_time DESC;
-- Total DBU consumption by policy
SELECT
c.policy_id,
cp.name AS policy_name,
SUM(u.dbu_consumption) AS total_dbus,
SUM(u.cloud_cost) AS total_cloud_cost_usd
FROM system.billing.usage u
JOIN system.compute.clusters c ON u.cluster_id = c.cluster_id
LEFT JOIN system.compute.cluster_policies cp ON c.policy_id = cp.policy_id
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY 1, 2
ORDER BY total_dbus DESC;
Implementing a FinOps Review Process
Cluster policies are a technical control. They work best paired with a lightweight process:
- Weekly cost review — query
system.billing.usageand share with team leads - Tag enforcement — require
cost_centerandownertags; use these for chargeback reports - Policy review cadence — review policy limits quarterly; adjust as team needs change
- Alert on spend anomalies — set Databricks SQL alerts on
system.billing.usagefor unexpected spikes
-- Alert query: daily spend over $50 (adjust threshold for your team)
SELECT
usage_date,
SUM(list_price * usage_quantity) AS daily_cost_usd
FROM system.billing.usage
WHERE usage_date = CURRENT_DATE - INTERVAL 1 DAY
HAVING daily_cost_usd > 50;
Policy Hierarchy: Job vs Interactive Clusters
One common confusion: cluster policies apply differently to interactive clusters (created manually) vs job clusters (created by Databricks Workflows).
- Interactive clusters — fully governed by policies; users must select a policy
- Job clusters — the job definition includes a cluster spec; policy is optional but recommended
- Shared job clusters — reuse a running cluster; no policy enforcement at spin-up
For job clusters, enforce policies in your Job definitions and CI/CD pipeline:
{
"job_clusters": [
{
"job_cluster_key": "default",
"new_cluster": {
"policy_id": "ABCD1234",
"spark_version": "15.4.x-scala2.12",
"num_workers": 4
}
}
]
}
Measuring the Impact
After implementing policies at a mid-sized team (15 engineers), here's what typically changes:
| Metric | Before Policies | After Policies |
|---|---|---|
| Avg cluster termination | 4.2 hours after last use | 28 minutes |
| Clusters using spot instances | 22% | 94% |
| Monthly compute spend | Baseline | 35-50% reduction |
| Ungoverned clusters per month | Uncounted | Less than 5 (admins only) |
Wrapping Up
Cluster policies are one of the highest-ROI governance investments you can make in a Databricks workspace. A few hours of setup translates to continuous cost savings, better standardization, and fewer 2am alerts about unexpected cloud bills.
Start with the basics: mandatory auto-termination, spot instances, and size limits. Then layer in tagging requirements and monitoring once the foundation is solid.
At Harbinger Explorer, our Databricks workspace runs fully policy-governed. Every cluster our team spins up is within guardrails — keeping infrastructure costs lean so we can focus on building intelligence, not managing bills.
Try Harbinger Explorer free for 7 days — we practice what we preach. Start your free trial at harbingerexplorer.com.
Continue Reading
Databricks Autoloader: The Complete Guide
CI/CD Pipelines for Databricks Projects: A Production-Ready Guide
Build a robust CI/CD pipeline for your Databricks projects using GitHub Actions, Databricks Asset Bundles, and automated testing. Covers branching strategy, testing, and deployment.
Databricks Asset Bundles (DABs): The Complete Deployment Guide
A comprehensive guide to Databricks Asset Bundles (DABs) — define, test, and deploy Databricks resources as code with CI/CD pipelines, multi-environment support, and GitOps best practices.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial