How to Use withColumnRenamed() to Rename DataFrame Columns | PySpark Tutorial #pysparktutorial

How to Use withColumnRenamed() in PySpark | Rename DataFrame Columns

How to Use withColumnRenamed() in PySpark

In this PySpark tutorial, you'll learn how to use the withColumnRenamed() function to rename columns in a DataFrame. It’s useful when you want to clean or standardize column names.

📌 Why Use withColumnRenamed()?

  • Rename existing column to a new column name
  • Helps make DataFrames cleaner and easier to understand
  • Useful for renaming multiple columns using chaining

🔧 Step 1: Create Spark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PySpark Rename Column Example") \
    .getOrCreate()

🔧 Step 2: Create a Sample DataFrame

data = [
    ("Aamir Shahzad", "Pakistan", 25),
    ("Ali Raza", "USA", 30),
    ("Bob", "UK", 45),
    ("Lisa", "Canada", 35)
]

df = spark.createDataFrame(data, ["FullName", "Country", "Age"])
print("Original DataFrame:")
df.show()
+--------------+--------+---+
| FullName | Country|Age|
+--------------+--------+---+
| Aamir Shahzad|Pakistan| 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+--------+---+

🔧 Step 3: Rename Columns Using withColumnRenamed()

# Rename 'FullName' to 'Name' and 'Country' to 'Nationality'
renamed_df = df.withColumnRenamed("FullName", "Name") \
               .withColumnRenamed("Country", "Nationality")

print("DataFrame with Renamed Columns:")
renamed_df.show()
+--------------+------------+---+
| Name | Nationality|Age|
+--------------+------------+---+
| Aamir Shahzad| Pakistan | 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+------------+---+

🎥 Watch the Full Tutorial Video

▶️ Watch on YouTube

Author: Aamir Shahzad

© 2024 PySpark Tutorials. All rights reserved.