How to Perform Unpivot in PySpark | Convert Columns to Rows Easily | PySpark Tutorial

How to Perform Unpivot in PySpark | Step-by-Step Guide

How to Perform Unpivot in PySpark | Convert Columns to Rows

Author: Aamir Shahzad

Published: March 2025

📘 Introduction

In PySpark, unpivoting means converting columns into rows — the opposite of a pivot operation. While PySpark doesn't have a built-in unpivot() function, you can easily achieve this using selectExpr() and stack().

🧾 Sample Dataset

We’ll work with a DataFrame that contains sales data across three years:

Name           Sales_2023   Sales_2024   Sales_2025
Aamir Shahzad     800           900          1000
Ali               500           600          1001
Raza              450           550          102
Bob               700           750          103
Lisa              620           None         109

🔧 Create DataFrame in PySpark

from pyspark.sql import SparkSession

# Create Spark session
spark = SparkSession.builder.appName("UnpivotExample").getOrCreate()

# Sample data
data = [
    ("Aamir Shahzad", 800, 900, 1000),
    ("Ali", 500, 600, 1001),
    ("Raza", 450, 550, 102),
    ("Bob", 700, 750, 103),
    ("Lisa", 620, None, 109)
]

columns = ["Name", "Sales_2023", "Sales_2024", "Sales_2025"]

df = spark.createDataFrame(data, columns)
df.show()

✅ Expected Output

+-------------+-----------+-----------+-----------+
|         Name|Sales_2023|Sales_2024|Sales_2025|
+-------------+-----------+-----------+-----------+
|Aamir Shahzad|        800|        900|       1000|
|          Ali|        500|        600|       1001|
|         Raza|        450|        550|        102|
|          Bob|        700|        750|        103|
|         Lisa|        620|       null|        109|
+-------------+-----------+-----------+-----------+

🔁 Unpivot Using stack() and selectExpr()

Convert sales columns into rows using stack() for each year:

unpivotDF = df.selectExpr(
    "Name",
    "stack(3, '2023', Sales_2023, '2024', Sales_2024, '2025', Sales_2025) as (Year, Sales)"
)

unpivotDF.show()

✅ Expected Output

+-------------+----+-----+
|         Name|Year|Sales|
+-------------+----+-----+
|Aamir Shahzad|2023|  800|
|Aamir Shahzad|2024|  900|
|Aamir Shahzad|2025| 1000|
|          Ali|2023|  500|
|          Ali|2024|  600|
|          Ali|2025| 1001|
|         Raza|2023|  450|
|         Raza|2024|  550|
|         Raza|2025|  102|
|          Bob|2023|  700|
|          Bob|2024|  750|
|          Bob|2025|  103|
|         Lisa|2023|  620|
|         Lisa|2024| null|
|         Lisa|2025|  109|
+-------------+----+-----+

📌 Explanation

  • stack(3, ...) tells PySpark to generate 3 rows per input row.
  • Each pair of values (e.g., '2023', Sales_2023) is used to populate new rows.
  • The result is a normalized table format that's easier to process and analyze.

🎥 Watch the Video Tutorial

Watch on YouTube

© 2025 Aamir Shahzad. All rights reserved.

Visit TechBrothersIT for more tutorials.