How to Use createTempView()
in PySpark | Step-by-Step Guide
Author: Aamir Shahzad
Published: March 2025
๐ Introduction
The createTempView()
function registers a PySpark DataFrame as a temporary SQL view. This allows you to execute SQL queries using Spark SQL, combining the power of SQL with the flexibility of DataFrames.
๐งพ Sample Dataset
Name Department Salary
Aamir Shahzad Engineering 5000
Ali Sales 4000
Raza Marketing 3500
Bob Sales 4200
Lisa Engineering 6000
๐ง Create DataFrame in PySpark
from pyspark.sql import SparkSession
# Create Spark session
spark = SparkSession.builder.appName("createTempViewExample").getOrCreate()
# Sample data
data = [
("Aamir Shahzad", "Engineering", 5000),
("Ali", "Sales", 4000),
("Raza", "Marketing", 3500),
("Bob", "Sales", 4200),
("Lisa", "Engineering", 6000)
]
columns = ["Name", "Department", "Salary"]
df = spark.createDataFrame(data, columns)
df.show()
✅ Expected Output
+-------------+-----------+------+
| Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|Engineering| 5000|
| Ali| Sales| 4000|
| Raza| Marketing| 3500|
| Bob| Sales| 4200|
| Lisa|Engineering| 6000|
+-------------+-----------+------+
๐ Register DataFrame as a Temp View
# Register as temporary SQL view
df.createOrReplaceTempView("employee_view")
๐ Run SQL Queries on the Temp View
Example 1: Select All Records
result1 = spark.sql("SELECT * FROM employee_view")
result1.show()
✅ Output
+-------------+-----------+------+
| Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|Engineering| 5000|
| Ali| Sales| 4000|
| Raza| Marketing| 3500|
| Bob| Sales| 4200|
| Lisa|Engineering| 6000|
+-------------+-----------+------+
Example 2: Filter by Department
result2 = spark.sql("""
SELECT Name, Salary
FROM employee_view
WHERE Department = 'Sales'
""")
result2.show()
✅ Output
+-----+------+
| Name|Salary|
+-----+------+
| Ali| 4000|
| Bob| 4200|
+-----+------+
Example 3: Average Salary by Department
result3 = spark.sql("""
SELECT Department, AVG(Salary) AS Avg_Salary
FROM employee_view
GROUP BY Department
""")
result3.show()
✅ Output
+-----------+----------+
| Department|Avg_Salary|
+-----------+----------+
| Marketing| 3500.0|
|Engineering| 5500.0|
| Sales| 4100.0|
+-----------+----------+
๐ก Key Points
createTempView()
is used to register a DataFrame as a temporary view.- You can use
spark.sql()
to run standard SQL queries on that view. - The view is session-scoped and will be removed when the Spark session ends.