How to Write DataFrame to Azure SQL Table Using PySpark
In this tutorial, you'll learn how to save a PySpark DataFrame to an Azure SQL Database using the `write()` function and JDBC connector. We’ll cover all necessary steps including JDBC URL, credentials, and data formatting.
1️⃣ Step 1: Create Spark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("WriteToAzureSQL").getOrCreate()
2️⃣ Step 2: Create Sample DataFrame
data = [
(1, "Aamir Shahzad", "aamir@example.com", "USA"),
(2, "Ali Raza", "ali@example.com", "Canada"),
(3, "Bob", "bob@example.com", "UK"),
(4, "Lisa", "lisa@example.com", "Germany")
]
columns = ["customer_id", "name", "email", "country"]
df = spark.createDataFrame(data, columns)
df.show()
3️⃣ Step 3: JDBC Configuration
jdbcHostname = "yourserver.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "yourdatabase"
jdbcUsername = "sqladmin"
jdbcPassword = "YourPassword123!"
jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabase};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"
connectionProperties = {
"user": jdbcUsername,
"password": jdbcPassword,
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
4️⃣ Step 4: Write DataFrame to Azure SQL Table
df.write \
.mode("append") \
.jdbc(url=jdbcUrl, table="customer", properties=connectionProperties)
print("✅ Data successfully written to Azure SQL Database 'customer' table.")