PySpark Sorting Explained: ASC vs DESC | Handling NULLs with asc_nulls_first & desc_nulls_last
In this tutorial, you’ll learn how to sort DataFrames in PySpark using ascending and descending order, while controlling NULL value placement using asc_nulls_first()
and desc_nulls_last()
. Real examples are provided for clarity.
1️⃣ Create Spark Session
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.appName("SortingExamples").getOrCreate()
2️⃣ Create Sample Data
data = [
(1, "Aamir", 50000),
(2, "Ali", None),
(3, "Bob", 45000),
(4, "Lisa", 60000),
(5, "Zara", None)
]
columns = ["id", "name", "salary"]
df = spark.createDataFrame(data, columns)
df.show()
3️⃣ Sort by ASC (default)
df.orderBy(col("salary").asc()).show()
4️⃣ Sort by ASC with NULLs First
df.orderBy(col("salary").asc_nulls_first()).show()
5️⃣ Sort by ASC with NULLs Last
df.orderBy(col("salary").asc_nulls_last()).show()
6️⃣ Sort by DESC
df.orderBy(col("salary").desc()).show()
7️⃣ DESC with NULLs First
df.orderBy(col("salary").desc_nulls_first()).show()
8️⃣ DESC with NULLs Last
df.orderBy(col("salary").desc_nulls_last()).show()