How to Read JSON File into DataFrame from Azure Blob Storage | PySpark Tutorial

How to Read JSON File in DataFrame from Azure Blob Storage | Step-by-Step Guide

How to Read JSON File in DataFrame from Azure Blob Storage

In this PySpark tutorial, you'll learn how to read a JSON file stored in Azure Blob Storage and load it into a PySpark DataFrame.

Step 1: Prerequisites

  • Azure Storage Account
  • Container with a JSON file
  • Access key or SAS token
  • PySpark environment (Databricks or local setup)

Step 2: Configure Spark to Access Azure Blob Storage

Configure the Spark session to access your Azure Blob Storage account using the access key or SAS token.

# Set the Spark configuration for Azure Blob Storage

Step 3: Read JSON File into DataFrame

Now read the JSON file using the function and load it into a PySpark DataFrame.

# Define the file path
file_path = "wasbs://<container_name>@<storage_account_name><file_name.json>"

# Read JSON file into DataFrame
df =

# Display the DataFrame

# Print the schema of the DataFrame

Step 4: Example Output

Output of

|        Name|  Nationality|  Age|
|Aamir Shahzad|     Pakistan|   25|
|     Ali Raza|          USA|   30|
|          Bob|           UK|   45|
|         Lisa|       Canada|   35|

Output of df.printSchema():

 |-- Name: string (nullable = true)
 |-- Nationality: string (nullable = true)
 |-- Age: long (nullable = true)


  • Set Spark configuration to access Azure Blob Storage using your account key or SAS token.
  • Use to load JSON files into a PySpark DataFrame.
  • Verify your data using and df.printSchema().

Watch the Video Tutorial

Author: Aamir Shahzad