How to Read Data from Dedicated SQL Pool and Create CSV in ADLS Gen2 Using Azure Synapse
📘 Overview
In this tutorial, you'll learn how to extract data from a Dedicated SQL Pool using PySpark and write it to a single CSV file in Azure Data Lake Storage Gen2 (ADLS Gen2). This is a common task for exporting curated data for sharing, archiving, or downstream analytics.
🧱 Prerequisites
- Azure Synapse workspace with Spark and Dedicated SQL Pool
- Configured Linked Services and credentials
- ADLS Gen2 container path
🛠️ Step-by-Step Guide
✅ Step 1: Read from Dedicated SQL Pool Table
%%pyspark
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://yourserver.database.windows.net:1433;database=yourDB") \
.option("dbtable", "dbo.sales_data") \
.option("user", "youruser") \
.option("password", "yourpassword") \
.load()
✅ Step 2: Repartition to Single File
df_single = df.repartition(1)
✅ Step 3: Write to ADLS Gen2 as CSV
df_single.write \
.mode("overwrite") \
.option("header", "true") \
.csv("abfss://export@yourstorageaccount.dfs.core.windows.net/reports/sales_output")
📌 Notes
- Use
repartition(1)
to generate a single CSV file - You may rename the file from
part-00000.csv
using Azure Storage Explorer - Ensure Spark pool has permission to access the ADLS path
🎯 Use Cases
- Exporting business reports from SQL warehouse
- Sharing datasets with external users
- Backup/archive structured data in CSV format
📊 Output Example
The CSV file will be saved in your specified ADLS path, containing all the rows from the selected Dedicated SQL Pool table.
📺 Watch the Full Video Tutorial
📚 Credit: Content created with the help of ChatGPT and Gemini.