How to Get a Column in PySpark using Dot or Square Brackets
In PySpark, you can access DataFrame columns using either dot notation or square brackets. This tutorial shows both methods with examples and outputs.
1. Create Spark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("Get Columns in PySpark") \
.getOrCreate()
2. Create Sample DataFrame
data = [
(1, "Aamir Shahzad", 35),
(2, "Ali Raza", 30),
(3, "Bob", 25),
(4, "Lisa", 28)
]
columns = ["id", "name", "age"]
df = spark.createDataFrame(data, columns)
df.show()
+---+--------------+---+
| id| name|age|
+---+--------------+---+
| 1| Aamir Shahzad| 35|
| 2| Ali Raza| 30|
| 3| Bob| 25|
| 4| Lisa| 28|
+---+--------------+---+
3. Get a Column in PySpark
Method 1: Using Dot Notation
df.select(df.name).show()
+--------------+
| name|
+--------------+
| Aamir Shahzad|
| Ali Raza|
| Bob|
| Lisa|
+--------------+
Method 2: Using Square Brackets
df.select(df["name"]).show()
+--------------+
| name|
+--------------+
| Aamir Shahzad|
| Ali Raza|
| Bob|
| Lisa|
+--------------+
4. Filter with Column Reference
Using Dot Notation
df.filter(df.age > 28).show()
+---+--------------+---+
| id| name|age|
+---+--------------+---+
| 1| Aamir Shahzad| 35|
| 2| Ali Raza| 30|
+---+--------------+---+
Using Square Brackets
df.filter(df["age"] == 30).show()
+---+---------+---+
| id| name|age|
+---+---------+---+
| 2| Ali Raza| 30|
+---+---------+---+
5. View All Column Names
df.columns
['id', 'name', 'age']
6. Summary
df.name
is easier and more concise, great for simple column access.df["name"]
is more flexible and safer when column names include spaces or special characters.df.columns
returns a list of all column names.