How to Use createTempView in PySpark | Run SQL Queries on DataFrames | PySpark Tutorial

How to Use createTempView() in PySpark | Step-by-Step Guide

How to Use createTempView() in PySpark | Step-by-Step Guide

Author: Aamir Shahzad

Published: March 2025

๐Ÿ“˜ Introduction

The createTempView() function registers a PySpark DataFrame as a temporary SQL view. This allows you to execute SQL queries using Spark SQL, combining the power of SQL with the flexibility of DataFrames.

๐Ÿงพ Sample Dataset

Name           Department     Salary
Aamir Shahzad   Engineering     5000
Ali             Sales           4000
Raza            Marketing       3500
Bob             Sales           4200
Lisa            Engineering     6000

๐Ÿ”ง Create DataFrame in PySpark

from pyspark.sql import SparkSession

# Create Spark session
spark = SparkSession.builder.appName("createTempViewExample").getOrCreate()

# Sample data
data = [
    ("Aamir Shahzad", "Engineering", 5000),
    ("Ali", "Sales", 4000),
    ("Raza", "Marketing", 3500),
    ("Bob", "Sales", 4200),
    ("Lisa", "Engineering", 6000)
]

columns = ["Name", "Department", "Salary"]
df = spark.createDataFrame(data, columns)
df.show()

✅ Expected Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|Engineering|  5000|
|          Ali|      Sales|  4000|
|         Raza|  Marketing|  3500|
|          Bob|      Sales|  4200|
|         Lisa|Engineering|  6000|
+-------------+-----------+------+

๐Ÿ“Œ Register DataFrame as a Temp View

# Register as temporary SQL view
df.createOrReplaceTempView("employee_view")

๐Ÿ“Š Run SQL Queries on the Temp View

Example 1: Select All Records

result1 = spark.sql("SELECT * FROM employee_view")
result1.show()

✅ Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Aamir Shahzad|Engineering|  5000|
|          Ali|      Sales|  4000|
|         Raza|  Marketing|  3500|
|          Bob|      Sales|  4200|
|         Lisa|Engineering|  6000|
+-------------+-----------+------+

Example 2: Filter by Department

result2 = spark.sql("""
  SELECT Name, Salary
  FROM employee_view
  WHERE Department = 'Sales'
""")
result2.show()

✅ Output

+-----+------+
| Name|Salary|
+-----+------+
|  Ali|  4000|
|  Bob|  4200|
+-----+------+

Example 3: Average Salary by Department

result3 = spark.sql("""
  SELECT Department, AVG(Salary) AS Avg_Salary
  FROM employee_view
  GROUP BY Department
""")
result3.show()

✅ Output

+-----------+----------+
| Department|Avg_Salary|
+-----------+----------+
|  Marketing|    3500.0|
|Engineering|    5500.0|
|      Sales|    4100.0|
+-----------+----------+

๐Ÿ’ก Key Points

  • createTempView() is used to register a DataFrame as a temporary view.
  • You can use spark.sql() to run standard SQL queries on that view.
  • The view is session-scoped and will be removed when the Spark session ends.

๐ŸŽฅ Watch the Video Tutorial

Watch on YouTube

© 2025 Aamir Shahzad. All rights reserved.

Visit TechBrothersIT for more tutorials.