How to Perform Unpivot in PySpark | Convert Columns to Rows Easily | PySpark Tutorial

How to Perform Unpivot in PySpark | Step-by-Step Guide

How to Perform Unpivot in PySpark | Convert Columns to Rows

Author: Aamir Shahzad

Published: March 2025

๐Ÿ“˜ Introduction

In PySpark, unpivoting means converting columns into rows — the opposite of a pivot operation. While PySpark doesn't have a built-in unpivot() function, you can easily achieve this using selectExpr() and stack().

๐Ÿงพ Sample Dataset

We’ll work with a DataFrame that contains sales data across three years:

Name           Sales_2023   Sales_2024   Sales_2025
Aamir Shahzad     800           900          1000
Ali               500           600          1001
Raza              450           550          102
Bob               700           750          103
Lisa              620           None         109

๐Ÿ”ง Create DataFrame in PySpark

from pyspark.sql import SparkSession

# Create Spark session
spark = SparkSession.builder.appName("UnpivotExample").getOrCreate()

# Sample data
data = [
    ("Aamir Shahzad", 800, 900, 1000),
    ("Ali", 500, 600, 1001),
    ("Raza", 450, 550, 102),
    ("Bob", 700, 750, 103),
    ("Lisa", 620, None, 109)
]

columns = ["Name", "Sales_2023", "Sales_2024", "Sales_2025"]

df = spark.createDataFrame(data, columns)
df.show()

✅ Expected Output

+-------------+-----------+-----------+-----------+
|         Name|Sales_2023|Sales_2024|Sales_2025|
+-------------+-----------+-----------+-----------+
|Aamir Shahzad|        800|        900|       1000|
|          Ali|        500|        600|       1001|
|         Raza|        450|        550|        102|
|          Bob|        700|        750|        103|
|         Lisa|        620|       null|        109|
+-------------+-----------+-----------+-----------+

๐Ÿ” Unpivot Using stack() and selectExpr()

Convert sales columns into rows using stack() for each year:

unpivotDF = df.selectExpr(
    "Name",
    "stack(3, '2023', Sales_2023, '2024', Sales_2024, '2025', Sales_2025) as (Year, Sales)"
)

unpivotDF.show()

✅ Expected Output

+-------------+----+-----+
|         Name|Year|Sales|
+-------------+----+-----+
|Aamir Shahzad|2023|  800|
|Aamir Shahzad|2024|  900|
|Aamir Shahzad|2025| 1000|
|          Ali|2023|  500|
|          Ali|2024|  600|
|          Ali|2025| 1001|
|         Raza|2023|  450|
|         Raza|2024|  550|
|         Raza|2025|  102|
|          Bob|2023|  700|
|          Bob|2024|  750|
|          Bob|2025|  103|
|         Lisa|2023|  620|
|         Lisa|2024| null|
|         Lisa|2025|  109|
+-------------+----+-----+

๐Ÿ“Œ Explanation

  • stack(3, ...) tells PySpark to generate 3 rows per input row.
  • Each pair of values (e.g., '2023', Sales_2023) is used to populate new rows.
  • The result is a normalized table format that's easier to process and analyze.

๐ŸŽฅ Watch the Video Tutorial

Watch on YouTube

© 2025 Aamir Shahzad. All rights reserved.

Visit TechBrothersIT for more tutorials.