How to Use withColumnRenamed() in PySpark
In this PySpark tutorial, you'll learn how to use the withColumnRenamed()
function to rename columns in a DataFrame. It’s useful when you want to clean or standardize column names.
📌 Why Use withColumnRenamed()?
- Rename existing column to a new column name
- Helps make DataFrames cleaner and easier to understand
- Useful for renaming multiple columns using chaining
🔧 Step 1: Create Spark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("PySpark Rename Column Example") \
.getOrCreate()
🔧 Step 2: Create a Sample DataFrame
data = [
("Aamir Shahzad", "Pakistan", 25),
("Ali Raza", "USA", 30),
("Bob", "UK", 45),
("Lisa", "Canada", 35)
]
df = spark.createDataFrame(data, ["FullName", "Country", "Age"])
print("Original DataFrame:")
df.show()
+--------------+--------+---+
| FullName | Country|Age|
+--------------+--------+---+
| Aamir Shahzad|Pakistan| 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+--------+---+
| FullName | Country|Age|
+--------------+--------+---+
| Aamir Shahzad|Pakistan| 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+--------+---+
🔧 Step 3: Rename Columns Using withColumnRenamed()
# Rename 'FullName' to 'Name' and 'Country' to 'Nationality'
renamed_df = df.withColumnRenamed("FullName", "Name") \
.withColumnRenamed("Country", "Nationality")
print("DataFrame with Renamed Columns:")
renamed_df.show()
+--------------+------------+---+
| Name | Nationality|Age|
+--------------+------------+---+
| Aamir Shahzad| Pakistan | 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+------------+---+
| Name | Nationality|Age|
+--------------+------------+---+
| Aamir Shahzad| Pakistan | 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+------------+---+