How to Use show() Function in PySpark | Step-by-Step Guide
The show() function in PySpark allows you to display DataFrame contents in a readable tabular format. It’s ideal for quickly checking your data or debugging your transformations.
What is show() in PySpark?
The show()
function is a simple way to view rows from a DataFrame. By default, it displays up to 20 rows and limits long strings to 20 characters.
Common Use Cases
1. Show Default Rows (First 20 Rows)
df.show()
Displays the first 20 rows and truncates long strings.
2. Show a Specific Number of Rows
df.show(5)
Displays only the first 5 rows of the DataFrame.
3. Show Full Column Content (No Truncation)
df.show(truncate=False)
Displays full content in each column, without cutting off long strings.
4. Truncate Column Content After N Characters
df.show(truncate=10)
Limits column text to 10 characters, useful for large text fields.
5. Show Rows in Vertical Format
df.show(vertical=True)
Displays rows in a vertical layout, which is helpful for wide DataFrames or debugging.
Summary of Options
- df.show(): Shows 20 rows with default truncation.
- df.show(n): Shows the first n rows.
- df.show(truncate=False): Shows full column content.
- df.show(truncate=n): Truncates text after n characters.
- df.show(vertical=True): Displays data vertically.
🎥 Watch the Video Tutorial
Prefer watching a step-by-step guide? Watch my video tutorial explaining show()
in PySpark: