Delta Sharing Explained: Cross-Organization Data Sharing Without Data Copies
Delta Sharing Explained: Cross-Organization Data Sharing Without Data Copies
Data sharing between organizations has historically been painful: FTP dumps, emailed CSVs, custom API integrations, or expensive data marketplaces. Delta Sharing changes this with an open, REST-based protocol that lets you share live Delta Lake tables with any recipient — regardless of their platform, cloud provider, or tech stack.
What Is Delta Sharing?
Delta Sharing is an open-source protocol (not a Databricks-proprietary format) for sharing data securely across organizational boundaries. Key properties:
| Property | Detail |
|---|---|
| Open protocol | Not vendor-locked; implemented by Databricks, pandas, Apache Spark, Power BI, etc. |
| No data duplication | Recipients query data in your storage directly via signed URLs |
| Live data | Recipients always see the latest version (or a specific version you share) |
| Fine-grained control | Share specific tables, partitions, or even row/column subsets |
| Audit logging | Full access history for compliance |
Delta Sharing Architecture
Provider (You) Recipient (External Org)
─────────────── ──────────────────────────
Unity Catalog Any compatible client
│ │
▼ │
Delta Sharing Server ◄──── REST API ────►│
│ │
▼ ▼
Cloud Storage ──── Signed URLs ────► Direct Read
(S3/ADLS/GCS) (no copy made)
The recipient never gets your data — they get time-limited, signed URLs that let them read the Parquet files directly. Your data never leaves your storage bucket.
Setting Up Delta Sharing with Unity Catalog
Unity Catalog is the recommended way to manage Delta Sharing on Databricks (available on all cloud providers).
Step 1: Enable Delta Sharing in Your Metastore
-- Check if Delta Sharing is enabled
SHOW METASTORE;
-- Enable Delta Sharing on your metastore (admin required)
ALTER METASTORE SET ENABLE_DELTA_SHARING = TRUE;
Step 2: Create a Share
A share is a named collection of data assets you want to distribute:
-- Create a share
CREATE SHARE market_data_share
COMMENT 'Live market data for external partners';
-- Add tables to the share
ALTER SHARE market_data_share
ADD TABLE prod.gold.daily_prices;
-- Add a specific partition (e.g., only public data)
ALTER SHARE market_data_share
ADD TABLE prod.gold.company_filings
PARTITION (data_classification = 'public');
-- Add a table with an alias for the recipient
ALTER SHARE market_data_share
ADD TABLE prod.silver.events
AS prod_events_public
COMMENT 'Anonymized event data';
Step 3: Create a Recipient
A recipient represents an external entity (another company, team, or application):
-- Recipient with Unity Catalog (another Databricks workspace)
CREATE RECIPIENT acme_corp_recipient
USING ID 'databricks.acme_corp_metastore_id';
-- Recipient without Unity Catalog (open sharing, uses tokens)
CREATE RECIPIENT external_partner_recipient
COMMENT 'External analytics partner without Databricks';
-- Get the activation link for non-UC recipients
DESCRIBE RECIPIENT external_partner_recipient;
-- Returns an activation_link — send this to your recipient securely
Step 4: Grant Access
-- Grant the share to the recipient
GRANT SELECT ON SHARE market_data_share TO RECIPIENT acme_corp_recipient;
GRANT SELECT ON SHARE market_data_share TO RECIPIENT external_partner_recipient;
-- Verify grants
SHOW GRANTS ON SHARE market_data_share;
Consuming Shared Data: The Recipient Side
Option A: Databricks Unity Catalog (No Token Needed)
If the recipient also uses Databricks with Unity Catalog, they add the provider and access data seamlessly:
-- Recipient: create a provider (references your metastore)
CREATE PROVIDER acme_data_provider
USING ID 'databricks.your_metastore_share_id';
-- List available shares from the provider
SHOW SHARES IN PROVIDER acme_data_provider;
-- Create a catalog from the shared data
CREATE CATALOG shared_market_data
USING SHARE acme_data_provider.market_data_share;
-- Query shared tables like any other table
SELECT * FROM shared_market_data.prod_gold.daily_prices
WHERE trade_date >= '2024-01-01'
LIMIT 100;
Option B: Python (pandas / PyArrow)
Recipients without Databricks can use the delta-sharing Python library:
# pip install delta-sharing
import delta_sharing
# Load the profile file (downloaded from activation link)
profile_file = "path/to/config.share"
# List available tables
client = delta_sharing.SharingClient(profile_file)
shares = client.list_shares()
schemas = client.list_schemas(shares[0])
tables = client.list_tables(schemas[0])
print(f"Available tables: {[t.name for t in tables]}")
# Load as pandas DataFrame
df = delta_sharing.load_as_pandas(
f"{profile_file}#market_data_share.prod_gold.daily_prices"
)
print(df.head())
# Load as Spark DataFrame (for large datasets)
spark_df = delta_sharing.load_as_spark(
f"{profile_file}#market_data_share.prod_gold.daily_prices"
)
spark_df.filter("trade_date >= '2024-01-01'").show()
Option C: Power BI
Power BI has native Delta Sharing support. Recipients connect directly in Power BI Desktop:
- Get Data → Delta Sharing
- Paste the activation URL
- Browse and load shared tables as DirectQuery or Import
Option D: Apache Spark (Non-Databricks)
# Works with any Spark 3.x cluster
spark.conf.set("spark.sql.extensions", "io.delta.sharing.spark.DeltaSharingSparkExtension")
df = spark.read.format("deltaSharing") \
.load("path/to/config.share#market_data_share.prod_gold.daily_prices")
df.filter("trade_date >= '2024-01-01'").show()
Advanced: Time Travel and Version Sharing
Share specific historical versions for point-in-time consistency:
-- Share a specific table version (snapshot)
ALTER SHARE market_data_share
ADD TABLE prod.gold.daily_prices
AS daily_prices_snapshot_v100
VERSION AS OF 100;
-- Share as of a specific timestamp
ALTER SHARE market_data_share
ADD TABLE prod.gold.daily_prices
AS daily_prices_jan_2024
TIMESTAMP AS OF '2024-01-31T23:59:59Z';
Recipients can also use Change Data Feed to get incremental changes:
# Recipient: read only new changes since last sync
df = delta_sharing.load_table_changes_as_pandas(
f"{profile_file}#market_data_share.prod_gold.daily_prices",
starting_version=150,
ending_version=200
)
# Returns added/deleted rows with _change_type column
print(df[df["_change_type"] == "insert"])
Security and Governance
Auditing Access
SELECT
event_time,
event_type,
source_ip_address,
user_name,
request_params.share_name,
request_params.table_name
FROM system.access.audit
WHERE service_name = 'deltaSharingServer'
AND event_date >= CURRENT_DATE - 30
ORDER BY event_time DESC;
Revoking Access
-- Revoke a recipient's access to a share
REVOKE SELECT ON SHARE market_data_share FROM RECIPIENT external_partner_recipient;
-- Or drop the recipient entirely
DROP RECIPIENT external_partner_recipient;
Row-Level Filtering (Advanced)
For GDPR or contractual compliance, share only rows meeting specific criteria:
ALTER SHARE market_data_share
ADD TABLE prod.gold.company_profiles
PARTITION (is_public = 'true');
Delta Sharing vs Alternatives
| Approach | Data Copy? | Real-time? | Multi-platform? | Governance |
|---|---|---|---|---|
| Delta Sharing | No | Yes | Yes | Built-in |
| S3/ADLS file dump | Yes | No | Yes | Manual |
| Custom REST API | No | Yes | Yes | Manual |
| Snowflake Data Share | No | Yes | Snowflake only | Built-in |
| BigQuery Analytics Hub | No | Yes | GCP only | Built-in |
Delta Sharing's key differentiator is true openness — both provider and recipient can use any compatible platform.
Production Considerations
- Bandwidth costs: Recipients reading large tables generate egress charges for you (the provider). Monitor with audit logs.
- Token rotation: For open-sharing recipients, rotate tokens periodically.
- Partition strategy: Share only necessary partitions to minimize accidental full scans.
- SLA alignment: Recipients querying your data compete with your own workloads — consider dedicated storage tiers.
Tools like Harbinger Explorer can help you track which shares are being accessed most frequently and correlate access patterns with your storage costs.
Conclusion
Delta Sharing makes data collaboration as simple as sharing a Google Doc — except for massive, live datasets. The open protocol means your partners aren't locked into Databricks, and you retain full control with Unity Catalog's governance layer.
As data mesh architectures proliferate, Delta Sharing is becoming the backbone of inter-domain data contracts. Start with a pilot: share one Gold table with an external stakeholder and see how dramatically it simplifies the workflow compared to file dumps.
Try Harbinger Explorer free for 7 days — monitor your Delta Sharing access patterns, track recipient usage, and get alerted on unusual access in your shared datasets. Start your free trial at harbingerexplorer.com
Continue Reading
Databricks Autoloader: The Complete Guide
CI/CD Pipelines for Databricks Projects: A Production-Ready Guide
Build a robust CI/CD pipeline for your Databricks projects using GitHub Actions, Databricks Asset Bundles, and automated testing. Covers branching strategy, testing, and deployment.
Databricks Cluster Policies for Cost Control: A Practical Guide
Learn how to use Databricks cluster policies to enforce cost guardrails, standardize cluster configurations, and prevent cloud bill surprises without blocking your team's productivity.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial