Read CSV File & Load to Dedicated SQL Pool Table Using Apache Spark Pool Notebook
📘 Overview
In this tutorial, you'll learn how to use Apache Spark Pools in Azure Synapse Analytics to read a CSV file from ADLS Gen2 and write that data into a Dedicated SQL Pool table. This is useful for ETL and data ingestion pipelines within your Synapse workspace.
🛠️ Step-by-Step Workflow
✅ Step 1: Read CSV File from ADLS Gen2
%%pyspark
csv_df = spark.read.option("header", "true") \
.csv("abfss://data@yourstorageaccount.dfs.core.windows.net/input/employees.csv")
csv_df.show()
✅ Step 2: Write Data to Dedicated SQL Pool
csv_df.write \
.synapsesql("yourdedicatedpool.dbo.Employee") \
.mode("overwrite") \
.save()
📝 Note: The `.synapsesql()` method is available by default when using Spark Pools in Synapse with built-in connectors.
✅ Step 3: Validate Inserted Data
%%sql
SELECT * FROM yourdedicatedpool.dbo.Employee;
📌 Tips
- Ensure the destination table exists before writing
- Use
overwrite
orappend
mode depending on your use case - Review schema compatibility between DataFrame and SQL table
🎯 Use Cases
- Bulk load historical data into data warehouse
- ETL transformation using PySpark before loading
- Automated data ingestion from CSV sources
📺 Watch the Video Tutorial
📚 Credit: Content created with the help of ChatGPT and Gemini.