PySpark Melt Function Explained | Reshape & Unpivot DataFrames Step by Step | PySpark Tutorial

How to Use Melt Function in PySpark | Step-by-Step Guide

How to Use melt() Function in PySpark

Reshape & Unpivot DataFrames | Step-by-Step Guide

Learn how to reshape and unpivot PySpark DataFrames using a custom melt function. Ideal for data engineering and analytics workflows.

📘 Introduction

PySpark doesn’t have a native melt() function like pandas. However, we can mimic this behavior by combining explode(), struct(), and array(). This transformation is useful when converting a wide table (with many columns) into a long format (more rows, fewer columns).

🔧 PySpark Code Example

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Create SparkSession
spark = SparkSession.builder.appName("PySpark Melt Function Example").getOrCreate()

# Sample data
data = [
    ("Aamir Shahzad", 85, 90, 88),
    ("Ali Raza", 78, 83, 89),
    ("Bob", 92, 95, 91),
    ("Lisa", 80, 87, 85)
]

columns = ["Name", "Math", "Science", "History"]

# Create DataFrame
df = spark.createDataFrame(data, columns)
df.show()

# Custom melt function
def melt(df, id_vars, value_vars, var_name="Subject", value_name="Score"):
    from pyspark.sql.functions import array, struct, explode, lit
    _vars_and_vals = array(*[
        struct(lit(c).alias(var_name), col(c).alias(value_name))
        for c in value_vars
    ])
    _tmp = df.withColumn("_vars_and_vals", explode(_vars_and_vals))
    cols = id_vars + [
        col("_vars_and_vals")[var_name].alias(var_name),
        col("_vars_and_vals")[value_name].alias(value_name)
    ]
    return _tmp.select(*cols)

# Apply melt
melted_df = melt(df, ["Name"], ["Math", "Science", "History"])
melted_df.show()

📊 Original DataFrame Output

+-------------+----+-------+-------+
| Name        |Math|Science|History|
+-------------+----+-------+-------+
|Aamir Shahzad|  85|     90|     88|
|Ali Raza     |  78|     83|     89|
|Bob          |  92|     95|     91|
|Lisa         |  80|     87|     85|
+-------------+----+-------+-------+

📎 Melted (Unpivoted) DataFrame Output

+-------------+--------+-----+
| Name        |Subject |Score|
+-------------+--------+-----+
|Aamir Shahzad|   Math |   85|
|Aamir Shahzad|Science |   90|
|Aamir Shahzad|History |   88|
|Ali Raza     |   Math |   78|
|Ali Raza     |Science |   83|
|Ali Raza     |History |   89|
|Bob          |   Math |   92|
|Bob          |Science |   95|
|Bob          |History |   91|
|Lisa         |   Math |   80|
|Lisa         |Science |   87|
|Lisa         |History |   85|
+-------------+--------+-----+

💡 Explanation

  • The melt function uses array() and struct() to combine columns and values into an array of structs.
  • explode() is used to flatten that array into rows.
  • This technique is equivalent to the pandas melt() and is very useful for reshaping DataFrames before writing to analytics layers.

🎥 Watch the Video Tutorial

Watch on YouTube

Author: Aamir Shahzad

© 2025 PySpark Tutorials. All rights reserved.