How to Read Lake Database Tables & Views in Azure Synapse Using PySpark | Azure Synapse Tutorial

How to Read Lake Database Tables & Views in Azure Synapse Using PySpark

How to Read Lake Database Tables & Views in Azure Synapse Using PySpark

📘 Overview

In Azure Synapse Analytics, you can use PySpark to read tables and views from a Lake Database. This allows you to harness the power of Apache Spark for analytics on curated datasets in your Data Lake.

✅ Why Read Lake Database in PySpark?

  • Supports reading from Delta and Parquet formats
  • Leverages Spark for distributed processing
  • Easy integration with Synapse Notebooks

🛠️ Step-by-Step Guide

Step 1: Read a Table from Lake Database

df = spark.read.table("LakeDBName.TableName")
df.show()

Replace LakeDBName and TableName with your actual Lake Database and table names.

Step 2: Read a View from Lake Database

view_df = spark.read.table("LakeDBName.ViewName")
view_df.display()

Step 3: Register as Temp View for SQL Queries

df.createOrReplaceTempView("temp_table")
spark.sql("SELECT * FROM temp_table WHERE column = 'value'").show()

🔎 Sample Use Case

  • Read curated sales data from Lake Database
  • Register as temp view
  • Join with another dataset using Spark SQL

📌 Notes

  • Ensure the Spark pool has access to the Lake Database metadata
  • Use display() in notebooks to visualize the output

📺 Watch the Full Video Tutorial

📚 Credit: Content created with the help of ChatGPT and Gemini.