How to Write DataFrame to JSON File in Azure Blob Storage Using PySpark | PySpark Tutorial

How to Write DataFrame to JSON File in Azure Blob Storage | PySpark Tutorial

How to Write DataFrame to JSON File in Azure Blob Storage Using PySpark

This tutorial demonstrates how to use PySpark's write().json() function to export a DataFrame as a JSON file to Azure Blob Storage. You'll also see how to use coalesce(1) and secure your connection with a SAS token.

1️⃣ Step 1: Initialize Spark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("WriteToJSONBlob").getOrCreate()

2️⃣ Step 2: Create a Sample DataFrame

data = [
  ("Aamir Shahzad", "Engineer", 35),
  ("Ali Raza", "Data Analyst", 28),
  ("Bob", "Manager", 40),
  ("Lisa", "Developer", 25)
]
columns = ["name", "designation", "age"]

df = spark.createDataFrame(data, schema=columns)
df.show()

3️⃣ Step 3: Configure Azure Blob Storage Access

# Replace with your actual values
spark.conf.set("fs.azure.sas.<container>.<storage_account>.blob.core.windows.net", "<sas_token>")

4️⃣ Step 4: Write DataFrame to JSON

output_path = "wasbs://<container>@<storage_account>.blob.core.windows.net/output-json"

df.coalesce(1).write \
  .mode("overwrite") \
  .json(output_path)

print("✅ Data written to Azure Blob Storage in JSON format.")

5️⃣ Step 5: Read Data Back from JSON

df_read = spark.read.json(output_path)
df_read.show()

print("✅ Data read back from Azure Blob Storage successfully.")

📺 Watch the Full Tutorial