How to Use melt()
Function in PySpark
Reshape & Unpivot DataFrames | Step-by-Step Guide
Learn how to reshape and unpivot PySpark DataFrames using a custom melt function. Ideal for data engineering and analytics workflows.
📘 Introduction
PySpark doesn’t have a native melt()
function like pandas. However, we can mimic this behavior by combining explode()
, struct()
, and array()
. This transformation is useful when converting a wide table (with many columns) into a long format (more rows, fewer columns).
🔧 PySpark Code Example
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
# Create SparkSession
spark = SparkSession.builder.appName("PySpark Melt Function Example").getOrCreate()
# Sample data
data = [
("Aamir Shahzad", 85, 90, 88),
("Ali Raza", 78, 83, 89),
("Bob", 92, 95, 91),
("Lisa", 80, 87, 85)
]
columns = ["Name", "Math", "Science", "History"]
# Create DataFrame
df = spark.createDataFrame(data, columns)
df.show()
# Custom melt function
def melt(df, id_vars, value_vars, var_name="Subject", value_name="Score"):
from pyspark.sql.functions import array, struct, explode, lit
_vars_and_vals = array(*[
struct(lit(c).alias(var_name), col(c).alias(value_name))
for c in value_vars
])
_tmp = df.withColumn("_vars_and_vals", explode(_vars_and_vals))
cols = id_vars + [
col("_vars_and_vals")[var_name].alias(var_name),
col("_vars_and_vals")[value_name].alias(value_name)
]
return _tmp.select(*cols)
# Apply melt
melted_df = melt(df, ["Name"], ["Math", "Science", "History"])
melted_df.show()
📊 Original DataFrame Output
+-------------+----+-------+-------+
| Name |Math|Science|History|
+-------------+----+-------+-------+
|Aamir Shahzad| 85| 90| 88|
|Ali Raza | 78| 83| 89|
|Bob | 92| 95| 91|
|Lisa | 80| 87| 85|
+-------------+----+-------+-------+
📎 Melted (Unpivoted) DataFrame Output
+-------------+--------+-----+
| Name |Subject |Score|
+-------------+--------+-----+
|Aamir Shahzad| Math | 85|
|Aamir Shahzad|Science | 90|
|Aamir Shahzad|History | 88|
|Ali Raza | Math | 78|
|Ali Raza |Science | 83|
|Ali Raza |History | 89|
|Bob | Math | 92|
|Bob |Science | 95|
|Bob |History | 91|
|Lisa | Math | 80|
|Lisa |Science | 87|
|Lisa |History | 85|
+-------------+--------+-----+
💡 Explanation
- The melt function uses
array()
andstruct()
to combine columns and values into an array of structs. explode()
is used to flatten that array into rows.- This technique is equivalent to the pandas
melt()
and is very useful for reshaping DataFrames before writing to analytics layers.