How to Get Column Names in PySpark DataFrames Using columns Function

How to Use columns Function in PySpark | Step-by-Step Guide

How to Use columns Function in PySpark | Step-by-Step Guide

Author: Aamir Shahzad

Published: March 2025

๐Ÿ“˜ Introduction

The columns attribute in PySpark is a quick and effective way to retrieve the list of column names from a DataFrame. This is useful when you're dynamically working with column names in big data pipelines.

๐Ÿ“Œ What is columns() in PySpark?

The columns attribute returns a Python list of column names from the DataFrame. You can use this list to inspect, iterate, or programmatically manipulate columns in your PySpark applications.

๐Ÿงพ Sample Dataset

Name           Department    Salary
Aamir Shahzad   IT            5000
Ali Raza        HR            4000
Bob             Finance       4500
Lisa            HR            4000

๐Ÿ”ง Create DataFrame in PySpark

from pyspark.sql import SparkSession

# Create Spark session
spark = SparkSession.builder.appName("columnsFunctionExample").getOrCreate()

# Sample data
data = [
    ("Aamir Shahzad", "IT", 5000),
    ("Ali Raza", "HR", 4000),
    ("Bob", "Finance", 4500),
    ("Lisa", "HR", 4000)
]

# Create DataFrame
columns = ["Name", "Department", "Salary"]
df = spark.createDataFrame(data, columns)

# Show DataFrame
df.show()

๐Ÿ“Š Using columns Attribute

# Get list of column names
print("List of columns in the DataFrame:")
print(df.columns)

# Loop through column names
print("\nLooping through columns:")
for col_name in df.columns:
    print(col_name)

✅ Expected Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|         IT|  5000|
|     Ali Raza|         HR|  4000|
|          Bob|     Finance| 4500|
|         Lisa|         HR|  4000|
+-------------+-----------+------+

List of columns in the DataFrame:
['Name', 'Department', 'Salary']

Looping through columns:
Name
Department
Salary

๐Ÿ“Œ Explanation

  • df.columns returns a list of all column names in the DataFrame.
  • Useful for dynamic column operations like renaming, filtering, or applying transformations.
  • You can loop over df.columns for custom logic on each column.

๐ŸŽฅ Video Tutorial

Watch on YouTube

© 2025 Aamir Shahzad. All rights reserved.

Visit TechBrothersIT for more tutorials.