How to Create Notebook to Join Two CSV Files and Write to Lake Database in Synapse Pipelines | Azure Synapse Analytics Tutorial

Join Two CSV Files and Write to Lake Database Using Synapse Notebook

How to Create Notebook to Join Two CSV Files and Write to Lake Database in Synapse Pipelines

📘 Overview

In this tutorial, we’ll show you how to create a PySpark notebook in Azure Synapse Analytics that reads two CSV files, performs a join operation, and writes the result to a Lake Database table. This notebook can then be integrated into a Synapse pipeline for automated data workflows.

🧱 Prerequisites

  • Azure Synapse workspace with Spark Pool
  • Lake Database created (or Spark database)
  • CSV files stored in ADLS Gen2

🛠️ Step-by-Step Instructions

✅ Step 1: Read CSV Files

%%pyspark
df_customers = spark.read.option("header", True).csv("abfss://data@yourstorage.dfs.core.windows.net/customers.csv")
df_orders = spark.read.option("header", True).csv("abfss://data@yourstorage.dfs.core.windows.net/orders.csv")

✅ Step 2: Join the Two DataFrames

df_joined = df_customers.join(df_orders, df_customers.customer_id == df_orders.customer_id, "inner")

✅ Step 3: Write the Result to a Lake Database Table

df_joined.write \
    .format("delta") \
    .mode("overwrite") \
    .saveAsTable("LakeDB.joined_customer_orders")

📂 Result

The output Delta table will be created inside your Lake Database and will be queryable via Spark and Serverless SQL Pools.

🔁 Integrate Notebook in Synapse Pipeline

  1. Go to Synapse Studio → Integrate → Pipeline
  2. Add “Notebook” activity
  3. Select the notebook you just created
  4. Configure Spark Pool and parameters (if any)
  5. Publish and trigger the pipeline

📌 Tips

  • Validate file schema before joining
  • Use display() for previewing data inside the notebook
  • Leverage overwrite mode for testing; use append for incremental writes

🎯 Use Cases

  • Merge transactional and customer data
  • Create curated data layers in Lakehouse
  • Automate data ingestion with Synapse Pipelines

📺 Watch the Full Video Tutorial

📚 Credit: Content created with the help of ChatGPT and Gemini.