How to use Write function to Create Single CSV file in Blog Storage from DataFrame | PySpark Tutorial

How to Write a Single CSV File to Azure Blob Storage from PySpark DataFrame

How to Write a Single CSV File to Azure Blob Storage from PySpark DataFrame

By default, PySpark writes DataFrames to CSV as multiple files due to partitioning. In this tutorial, you'll learn how to use coalesce(1) to save your DataFrame as a single CSV file to Azure Blob Storage, along with setting headers and options.

1️⃣ Step 1: Create Spark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("WriteSingleCSV").getOrCreate()

2️⃣ Step 2: Create Sample DataFrame

data = [
  ("Aamir Shahzad", "Lahore", "Pakistan"),
  ("Ali Raza", "Karachi", "Pakistan"),
  ("Bob", "New York", "USA"),
  ("Lisa", "Toronto", "Canada")
]
columns = ["full_name", "city", "country"]
df = spark.createDataFrame(data, schema=columns)

df.show()

3️⃣ Step 3: Configure Azure Blob Storage Access

spark.conf.set("fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net", "<your-access-key>")

4️⃣ Step 4: Write to Single CSV File

output_path = "wasbs://<your-container>@<your-storage-account>.blob.core.windows.net/people_data"

df.coalesce(1).write \
  .option("header", "true") \
  .mode("overwrite") \
  .csv(output_path)

5️⃣ Step 5: Confirm CSV File in Blob

# You can list files using dbutils if on Databricks:
files = dbutils.fs.ls(output_path)
for f in files:
    print(f.name)

📺 Watch the Full Tutorial