How to Use createTempView()
in PySpark | Step-by-Step Guide
Author: Aamir Shahzad
Published: March 2025
📘 Introduction
The createTempView()
function registers a PySpark DataFrame as a temporary SQL view. This allows you to execute SQL queries using Spark SQL, combining the power of SQL with the flexibility of DataFrames.
🧾 Sample Dataset
Name Department Salary
Aamir Shahzad Engineering 5000
Ali Sales 4000
Raza Marketing 3500
Bob Sales 4200
Lisa Engineering 6000
🔧 Create DataFrame in PySpark
from pyspark.sql import SparkSession
# Create Spark session
spark = SparkSession.builder.appName("createTempViewExample").getOrCreate()
# Sample data
data = [
("Aamir Shahzad", "Engineering", 5000),
("Ali", "Sales", 4000),
("Raza", "Marketing", 3500),
("Bob", "Sales", 4200),
("Lisa", "Engineering", 6000)
]
columns = ["Name", "Department", "Salary"]
df = spark.createDataFrame(data, columns)
df.show()
✅ Expected Output
+-------------+-----------+------+
| Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|Engineering| 5000|
| Ali| Sales| 4000|
| Raza| Marketing| 3500|
| Bob| Sales| 4200|
| Lisa|Engineering| 6000|
+-------------+-----------+------+
📌 Register DataFrame as a Temp View
# Register as temporary SQL view
df.createOrReplaceTempView("employee_view")
📊 Run SQL Queries on the Temp View
Example 1: Select All Records
result1 = spark.sql("SELECT * FROM employee_view")
result1.show()
✅ Output
+-------------+-----------+------+
| Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|Engineering| 5000|
| Ali| Sales| 4000|
| Raza| Marketing| 3500|
| Bob| Sales| 4200|
| Lisa|Engineering| 6000|
+-------------+-----------+------+
Example 2: Filter by Department
result2 = spark.sql("""
SELECT Name, Salary
FROM employee_view
WHERE Department = 'Sales'
""")
result2.show()
✅ Output
+-----+------+
| Name|Salary|
+-----+------+
| Ali| 4000|
| Bob| 4200|
+-----+------+
Example 3: Average Salary by Department
result3 = spark.sql("""
SELECT Department, AVG(Salary) AS Avg_Salary
FROM employee_view
GROUP BY Department
""")
result3.show()
✅ Output
+-----------+----------+
| Department|Avg_Salary|
+-----------+----------+
| Marketing| 3500.0|
|Engineering| 5500.0|
| Sales| 4100.0|
+-----------+----------+
💡 Key Points
createTempView()
is used to register a DataFrame as a temporary view.- You can use
spark.sql()
to run standard SQL queries on that view. - The view is session-scoped and will be removed when the Spark session ends.