How to Use orderBy() Function in PySpark
The orderBy() function in PySpark allows you to sort data in a DataFrame by one or more columns. By default, it sorts in ascending order, but you can specify descending order as well.
Syntax
DataFrame.orderBy(*cols, ascending=True)
Arguments
- cols: Column(s) to sort by.
- ascending: Boolean or list. True for ascending (default), False for descending.
Example 1: Order by Single Column (Ascending - Default)
df.orderBy("salary").show()
Example 2: Order by Single Column (Descending)
from pyspark.sql.functions import col
df.orderBy(col("salary").desc()).show()
Example 3: Order by Multiple Columns (Ascending)
df.orderBy("department", "salary").show()
Example 4: Order by Multiple Columns with Mixed Order
df.orderBy(col("department").asc(), col("salary").desc()).show()
Example 5: Using orderBy() with Explicit Boolean for Ascending/Descending
df.orderBy(col("salary"), ascending=True).show()
df.orderBy(col("salary"), ascending=False).show()
Conclusion
The orderBy() function in PySpark is a powerful tool to sort data efficiently. It can handle single or multiple columns and allows flexibility with ascending or descending order settings.