๐ How to Explore Tables and Databases in PySpark using spark.catalog
Learn how to interact with databases, tables, views, cached data, and registered functions using PySpark's powerful spark.catalog
interface. Ideal for metadata exploration and data pipeline control.
๐ Step 1: Inspect Current Catalog State
print("Current Database:", spark.catalog.currentDatabase())
print("Current Catalog:", spark.catalog.currentCatalog())
๐ Step 2: List Databases and Tables
print("List of Databases:")
for db in spark.catalog.listDatabases():
print(db.name)
print("List of Tables:")
for tbl in spark.catalog.listTables():
print(tbl.name, tbl.tableType)
๐งช Step 3: Check Existence of Tables & Databases
print("Table Exists:", spark.catalog.tableExists("demo_table"))
print("Database Exists:", spark.catalog.databaseExists("demo_db"))
๐ Step 4: Explore Table Columns
for col in spark.catalog.listColumns("demo_table"):
print(col.name, col.dataType)
๐งผ Step 5: Drop Temporary Views
spark.catalog.dropTempView("demo_view")
๐ง Step 6: Manage Cache
print("Is Cached:", spark.catalog.isCached("demo_view"))
spark.catalog.uncacheTable("demo_view")
spark.catalog.clearCache()