PySpark Tutorial : How to Use dtypes in PySpark | Get DataFrame Column Names and Types

PySpark Tutorial: How to Use dtypes in PySpark | Get DataFrame Column Names and Types

PySpark Tutorial: How to Use dtypes in PySpark

Get DataFrame Column Names and Types | Step-by-Step Guide

Learn how to use dtypes in PySpark to quickly inspect column names and their corresponding data types — a must-know feature for any data engineer or analyst.

📘 Introduction

Understanding the structure of your DataFrame is essential for building reliable data pipelines. The dtypes attribute helps you list out each column with its corresponding data type in a quick and readable format.

🔧 PySpark Code Example

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName("PySpark dtypes Example").getOrCreate()

# Sample data
data = [
    ("Aamir Shahzad", 85, 90.5, True),
    ("Ali Raza", 78, 83.0, False),
    ("Bob", 92, 95.2, True),
    ("Lisa", 80, 87.8, False)
]

# Define columns
columns = ["Name", "Math_Score", "Science_Score", "Passed"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Show the DataFrame
df.show()

# Get column names and data types
print("Column Names and Data Types (dtypes):")
print(df.dtypes)

# Formatted output
print("\nFormatted Output of Column Data Types:")
for col_name, data_type in df.dtypes:
    print(f"Column: {col_name}, Type: {data_type}")

📊 Original DataFrame Output

+-------------+----------+-------------+-------+
| Name        |Math_Score|Science_Score|Passed |
+-------------+----------+-------------+-------+
|Aamir Shahzad|        85|         90.5|   true|
|Ali Raza     |        78|         83.0|  false|
|Bob          |        92|         95.2|   true|
|Lisa         |        80|         87.8|  false|
+-------------+----------+-------------+-------+

📥 dtypes Attribute Output

[('Name', 'string'), 
 ('Math_Score', 'bigint'), 
 ('Science_Score', 'double'), 
 ('Passed', 'boolean')]

✅ Formatted Column Type Output

Column: Name, Type: string
Column: Math_Score, Type: bigint
Column: Science_Score, Type: double
Column: Passed, Type: boolean

💡 Why Use dtypes?

  • Quickly inspect DataFrame schema during exploration and debugging.
  • Useful in dynamic transformations where data types matter (e.g., casting).
  • A faster alternative to printSchema() when you just need names and types.

🎥 Watch the Video Tutorial

Watch on YouTube

Author: Aamir Shahzad

© 2025 PySpark Tutorials. All rights reserved.