Explore Databases and Tables in PySpark with spark.catalog | A Beginner’s Guide to Metadata Management | PySpark Tutorial

Explore Databases and Tables in PySpark using spark.catalog

๐Ÿ” How to Explore Tables and Databases in PySpark using spark.catalog

Learn how to interact with databases, tables, views, cached data, and registered functions using PySpark's powerful spark.catalog interface. Ideal for metadata exploration and data pipeline control.

๐Ÿ“˜ Step 1: Inspect Current Catalog State

print("Current Database:", spark.catalog.currentDatabase())
print("Current Catalog:", spark.catalog.currentCatalog())

๐Ÿ“‚ Step 2: List Databases and Tables

print("List of Databases:")
for db in spark.catalog.listDatabases():
    print(db.name)

print("List of Tables:")
for tbl in spark.catalog.listTables():
    print(tbl.name, tbl.tableType)

๐Ÿงช Step 3: Check Existence of Tables & Databases

print("Table Exists:", spark.catalog.tableExists("demo_table"))
print("Database Exists:", spark.catalog.databaseExists("demo_db"))

๐Ÿ” Step 4: Explore Table Columns

for col in spark.catalog.listColumns("demo_table"):
    print(col.name, col.dataType)

๐Ÿงผ Step 5: Drop Temporary Views

spark.catalog.dropTempView("demo_view")

๐Ÿง  Step 6: Manage Cache

print("Is Cached:", spark.catalog.isCached("demo_view"))
spark.catalog.uncacheTable("demo_view")
spark.catalog.clearCache()

๐ŸŽฅ Watch Full Video Tutorial

Some of the contents in this website were created with assistance from ChatGPT and Gemini.