Managed vs External Tables in Spark SQL Lake Database | Azure Synapse Analytics Tutorial

Managed vs External Tables in Spark SQL Lake Database | Azure Synapse Analytics

Managed vs External Tables in Spark SQL Lake Database | Azure Synapse Analytics Explained

📘 Overview

In Azure Synapse Analytics, you can create both Managed and External Tables within a Lake Database using Spark SQL. Understanding the difference is essential for managing data, storage, and performance effectively.

📂 What Are Managed Tables?

Managed tables are those where both the metadata and the data are controlled by Synapse Spark. When you drop the table, the data and metadata are both deleted.

✅ Create a Managed Table

%%spark
CREATE TABLE lakehouse.managed_employee (
    id INT,
    name STRING,
    salary FLOAT
)

✔️ No LOCATION clause is specified — Spark decides where to store the data internally.

📁 What Are External Tables?

External tables store only the metadata in the Lake Database while the data resides externally in user-defined paths like ADLS Gen2.

✅ Create an External Table

%%spark
CREATE TABLE lakehouse.external_sales (
    id INT,
    product STRING,
    amount DOUBLE
)
USING DELTA
LOCATION 'abfss://data@yourstorageaccount.dfs.core.windows.net/sales/external/'

✔️ Data lives outside Spark's internal catalog. Deleting the table will NOT remove the data.

📊 Key Differences

Feature Managed Table External Table
Data Location Managed by Synapse Custom ADLS path
Drop Table Deletes Data Yes No
Usage Quick prototyping Production pipelines

📌 Best Practices

  • Use external tables for production where you need to retain data
  • Use managed tables for temporary analysis or rapid testing
  • Prefer Delta format for ACID support and versioning

📂 Example: Read External Table

%%sql
SELECT * FROM lakehouse.external_sales;

📺 Watch the Full Video Explanation

📚 Credit: Content created with the help of ChatGPT and Gemini.