🧩 PySpark DataFrame.to() Function
Schema Reconciliation and Column Reordering Made Easy
Learn how to use the DataFrame.to()
function introduced in PySpark 3.4.0 to reorder columns and reconcile schema effortlessly.
📘 Introduction
PySpark’s DataFrame.to()
function helps with schema alignment, column reordering, and type casting — all in one step. This is especially useful when you're writing to tables, preparing data for joins, or ensuring schema compliance.
🔧 PySpark Code Example
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType
# Create SparkSession
spark = SparkSession.builder.appName("DataFrame.to Example").getOrCreate()
# Sample data
data = [
("Aamir Shahzad", "Math", 85, True),
("Ali Raza", "Science", 78, False),
("Bob", "History", 92, True),
("Lisa", "Math", 80, False)
]
columns = ["Name", "Subject", "Score", "Passed"]
df = spark.createDataFrame(data, columns)
# Print original schema and data
df.show(truncate=False)
df.printSchema()
# Define new schema with reordered and altered types
schema = StructType([
StructField("Passed", BooleanType(), True),
StructField("Score", StringType(), True), # Cast from int to string
StructField("Name", StringType(), True),
StructField("Subject", StringType(), True)
])
# Apply .to() transformation
df2 = df.to(schema)
# Show transformed DataFrame
df2.show(truncate=False)
df2.printSchema()
📊 Original DataFrame Output
+-------------+--------+-----+-------+
| Name |Subject |Score|Passed |
+-------------+--------+-----+-------+
|Aamir Shahzad|Math | 85| true|
|Ali Raza |Science | 78| false|
|Bob |History | 92| true|
|Lisa |Math | 80| false|
+-------------+--------+-----+-------+
root
|-- Name: string (nullable = true)
|-- Subject: string (nullable = true)
|-- Score: long (nullable = true)
|-- Passed: boolean (nullable = true)
✅ New DataFrame Output (After .to()
)
+-------+-----+-------------+--------+
|Passed |Score|Name |Subject |
+-------+-----+-------------+--------+
| true|85 |Aamir Shahzad|Math |
| false|78 |Ali Raza |Science |
| true|92 |Bob |History |
| false|80 |Lisa |Math |
+-------+-----+-------------+--------+
root
|-- Passed: boolean (nullable = true)
|-- Score: string (nullable = true)
|-- Name: string (nullable = true)
|-- Subject: string (nullable = true)
💡 Key Takeaways
DataFrame.to()
allows column reordering and type conversion.- It is especially useful when writing to pre-defined tables or aligning multiple datasets.
- Introduced in PySpark 3.4.0 and above.