How to Read multiple CSV files from Blog Storage and Write to Azure SQL Table with Filenames | PySpark Tutorial

How to Read Multiple CSV Files from Blob Storage and Write to Azure SQL Table with Filenames

How to Read Multiple CSV Files from Blob Storage and Write to Azure SQL Table with Filenames

In this PySpark tutorial, you’ll learn how to load multiple CSV files from Azure Blob Storage, extract the filename from each file, and write the combined data to an Azure SQL table for tracking and processing.

1️⃣ Step 1: Setup Spark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("ReadCSV_WriteSQL_WithFilenames") \
    .getOrCreate()

2️⃣ Step 2: Set Azure Blob Storage Configuration

spark.conf.set(
    "fs.azure.sas.<container>.<storage_account>.blob.core.windows.net",
    "<your_sas_token>"
)

path = "wasbs://<container>@<storage_account>.blob.core.windows.net/input-folder/*.csv"

3️⃣ Step 3: Read All CSV Files and Add Filename Column

from pyspark.sql.functions import input_file_name, regexp_extract

df = spark.read.option("header", True).csv(path)
df = df.withColumn("filename", regexp_extract(input_file_name(), r"([^/]+$)", 1))
df.show()

4️⃣ Step 4: JDBC Configuration for Azure SQL

jdbcUrl = "jdbc:sqlserver://<your_server>.database.windows.net:1433;database=<your_db>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"

connectionProperties = {
    "user": "<your_username>",
    "password": "<your_password>",
    "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}

5️⃣ Step 5: Write DataFrame to Azure SQL Table

df.write \
    .mode("append") \
    .jdbc(url=jdbcUrl, table="your_table_name", properties=connectionProperties)

print("✅ All CSV files uploaded to Azure SQL with filename column.")

📺 Watch the Full Tutorial