Read CSV File & Load to Dedicated SQL Pool Table Using Apache Spark Pool Notebook | Azure Synapse Analytics Tutorial

Read CSV File & Load to Dedicated SQL Pool Table | Synapse Notebook

Read CSV File & Load to Dedicated SQL Pool Table Using Apache Spark Pool Notebook

📘 Overview

In this tutorial, you'll learn how to use Apache Spark Pools in Azure Synapse Analytics to read a CSV file from ADLS Gen2 and write that data into a Dedicated SQL Pool table. This is useful for ETL and data ingestion pipelines within your Synapse workspace.

🛠️ Step-by-Step Workflow

✅ Step 1: Read CSV File from ADLS Gen2

%%pyspark
csv_df = spark.read.option("header", "true") \
    .csv("abfss://data@yourstorageaccount.dfs.core.windows.net/input/employees.csv")
csv_df.show()

✅ Step 2: Write Data to Dedicated SQL Pool

csv_df.write \
    .synapsesql("yourdedicatedpool.dbo.Employee") \
    .mode("overwrite") \
    .save()

📝 Note: The `.synapsesql()` method is available by default when using Spark Pools in Synapse with built-in connectors.

✅ Step 3: Validate Inserted Data

%%sql
SELECT * FROM yourdedicatedpool.dbo.Employee;

📌 Tips

  • Ensure the destination table exists before writing
  • Use overwrite or append mode depending on your use case
  • Review schema compatibility between DataFrame and SQL table

🎯 Use Cases

  • Bulk load historical data into data warehouse
  • ETL transformation using PySpark before loading
  • Automated data ingestion from CSV sources

📺 Watch the Video Tutorial

📚 Credit: Content created with the help of ChatGPT and Gemini.