Unlocking Secure and Open Data Collaboration with Delta Sharing
“A deep dive into Delta Sharing — the open protocol enabling secure, real-time data collaboration across platforms, powered by Delta Lake."
Introduction
In the era of modern data ecosystems, seamless and secure data sharing across organizational and technological boundaries is a key enabler for innovation, collaboration, and efficiency. However, traditional approaches to data sharing often involve cumbersome exports, data duplication, and security concerns. Enter Delta Sharing — an open protocol developed under the Delta Lake project that simplifies and modernizes data sharing for the cloud-native world.
Overview: When, How, Where, and Why Delta Sharing Was Introduced
Delta Sharing was officially announced by Databricks in May 2021 as part of the broader Delta Lake open-source project, governed by the Linux Foundation. It was introduced in response to the growing need for a unified, open, and cloud-agnostic way to share data across different organizations, tools, and platforms without the need for complex ETL pipelines or risky data duplication.
- When: Introduced in May 2021.
- Why: To solve key challenges in modern data collaboration — data silos, security risks, and vendor lock-in.
- Where: Designed to work across major cloud platforms (AWS, Azure, GCP), and within on-premise or hybrid architectures.
- How: Built as the first open protocol for secure data sharing, enabling real-time access to Delta Lake tables without needing to copy or move the data.
Delta Sharing extends the value of the Delta Lake format by making it not just a high-performance storage layer, but also a secure and flexible data distribution mechanism.
What Are Delta Lakes and Delta Tables?
A Delta Lake is an open-source storage layer that brings ACID transactions, schema enforcement, and time travel (data versioning) to data lakes. It enables data lakes to behave more like traditional databases in terms of reliability and consistency.
A Delta Table is a table stored in the Delta Lake format. These tables:
- Store data in optimized Parquet files - Maintain a transaction log (“_delta_log”) to track changes
- Support operations like
MERGE
,UPDATE
,DELETE
, andTIME TRAVEL
Delta Tables are the foundation for both Delta Live Tables (DLT) and Delta Sharing workflows, making them integral to reliable, scalable data architectures.
What is Delta Sharing?
Delta Sharing is the first open-source protocol for secure data sharing, designed to enable organizations to share live, ready-to-query data from their data lakes with external parties in real time, without copying or moving the data. It supports a wide range of tools and platforms and can be used across cloud providers.
Why Delta Sharing Matters
Modern organizations increasingly operate in multi-cloud environments and collaborate with partners, vendors, and regulators who use diverse tools and platforms. Delta Sharing solves the fundamental pain point of interoperability by providing:
- Open and vendor-neutral access to Delta Lake tables
- Secure and scalable sharing without duplicating data
- Live data access for always up-to-date insights
- Compatibility with diverse clients like Pandas, Spark, BI tools, and more
How Delta Sharing Works
Delta Sharing introduces a simple architecture:
- Data Providers expose selected Delta tables through a Delta Sharing server (either open-source or managed by Databricks).
- Recipients receive a secure sharing link or token to access the data.
- Clients such as Python, Spark, or BI tools can connect using this link and read the data live without needing Databricks accounts.
Delta Sharing vs. Other Sharing Methods
Traditional data sharing often relies on:
- Exporting files to S3, Azure Blob, or FTP
- Sharing static snapshots in CSV or Parquet
- Replicating databases across environments
Delta Sharing eliminates the need for these steps by allowing direct, governed access to the source Delta tables.
Delta Sharing vs. Federated Architecture
Both Delta Sharing and federated data architectures aim to enable cross-system data access without centralizing data, but they take different approaches:
- Federated SQL Access allows querying external databases from within a central analytics platform without moving the data. It’s about consuming data from other systems.
- Delta Sharing, on the other hand, is about exposing and sharing your Delta Lake data with others securely and in real time.
Key Differences:
- Direction: Federated = inward (pull data in); Delta Sharing = outward (share data out)
- Technology: Federated access often involves connectors, live queries, and joins across systems; Delta Sharing is optimized for sharing Delta Lake tables as immutable views.
- Use Cases: Federated for unified analytics across systems; Delta Sharing for open, governed data distribution.
What Powers Federated SQL Access?
Federated SQL access is made possible through:
- SQL Connectors and JDBC/ODBC Drivers: These provide the ability to connect to remote data sources like MySQL, PostgreSQL, Snowflake, BigQuery, etc.
- Query Pushdown: Only the necessary part of a query is executed remotely, and the results are returned, minimizing data movement.
- Databricks Connectors: Databricks uses its own native connectors to support federated access to external data sources.
- Data Governance and Access Control: Integrated with tools like Unity Catalog to ensure secure and compliant data access.
This architecture allows data engineers and analysts to query data across systems as if it were local, making data integration and analysis faster and more efficient.
In practice, many organizations use both together — ingest data via federated access, transform it, then distribute it using Delta Sharing.
Real-World Use Cases
- A retail company shares sales and inventory data with suppliers to optimize stock.
- A healthcare provider securely shares patient cohort data with research partners.
- A financial institution shares curated data views with regulators or auditors in real-time.
Delta Sharing in the Databricks Ecosystem
Databricks offers a Managed Delta Sharing service integrated with Unity Catalog, providing fine-grained access control, audit logging, and automated sharing workflows. However, since Delta Sharing is open-source, it can also be self-hosted and integrated into any cloud or platform that supports Delta Lake.
Conclusion
Delta Sharing represents a significant step forward in the evolution of data collaboration. By embracing openness, security, and interoperability, it empowers organizations to share data across boundaries while maintaining control and governance. Whether you’re using Databricks or a self-hosted platform, Delta Sharing makes it possible to build a truly modern, open data ecosystem.
Further Reading - Delta Sharing Documentation - Delta Lake GitHub Repository