How to Sort DataFrames Using orderBy() in PySpark | PySpark Tutorial

How to Use orderBy() Function in PySpark | Step-by-Step Guide

How to Use orderBy() Function in PySpark

The orderBy() function in PySpark allows you to sort data in a DataFrame by one or more columns. By default, it sorts in ascending order, but you can specify descending order as well.

Syntax

DataFrame.orderBy(*cols, ascending=True)

Arguments

  • cols: Column(s) to sort by.
  • ascending: Boolean or list. True for ascending (default), False for descending.

Example 1: Order by Single Column (Ascending - Default)

df.orderBy("salary").show()

Example 2: Order by Single Column (Descending)

from pyspark.sql.functions import col

df.orderBy(col("salary").desc()).show()

Example 3: Order by Multiple Columns (Ascending)

df.orderBy("department", "salary").show()

Example 4: Order by Multiple Columns with Mixed Order

df.orderBy(col("department").asc(), col("salary").desc()).show()

Example 5: Using orderBy() with Explicit Boolean for Ascending/Descending

df.orderBy(col("salary"), ascending=True).show()

df.orderBy(col("salary"), ascending=False).show()

Conclusion

The orderBy() function in PySpark is a powerful tool to sort data efficiently. It can handle single or multiple columns and allows flexibility with ascending or descending order settings.

Watch the Full Tutorial