How to Extract Date and Time from Blob File Names and Load Sequentially into SQL Table -ADF Tutorial

How to Extract Date and Time from Blob File Names and Load Sequentially into SQL Table - ADF Tutorial

How to Extract Date and Time from Blob File Names and Load Sequentially into SQL Table - ADF Tutorial

Introduction

In modern data pipelines, organizing and processing data efficiently is essential for maintaining data integrity and ensuring optimal performance. Azure Data Factory (ADF) offers powerful capabilities for managing data workflows, including sequential data loading based on file metadata. This tutorial explains how to extract date and time from blob file names stored in Azure Blob Storage and load them sequentially into an Azure SQL Table using Azure Data Factory.

Understanding the Scenario

The objective is to process and load files sequentially from Azure Blob Storage based on the date and time embedded in their file names. This approach ensures the data is loaded either from the oldest to the newest files or vice versa, depending on your business needs.

For example, consider files named in the following format:

  • customer_20250310_110000_US.csv
  • customer_20250311_120000_US.csv
  • customer_20250313_170000_US.csv

These files contain timestamps in their names, which serve as the basis for sequencing them during data loading operations.

Step-by-Step Process

1. Explore the Blob Storage

  • Access the Azure Blob Storage container holding your data files.
  • Verify that the file names include date and time details that follow a consistent naming convention.

2. Create Required SQL Tables

FileList Table: Stores metadata about each file, including the file name and load status.

CREATE TABLE FileList (
    ID INT IDENTITY(1,1),
    FileName NVARCHAR(255),
    LoadStatus NVARCHAR(10)
);
    

Customer Table: Stores the actual data loaded from each file.

CREATE TABLE Customer (
    CustomerID INT,
    FirstName NVARCHAR(100),
    LastName NVARCHAR(100),
    Salary FLOAT,
    FileName NVARCHAR(255),
    LoadDateTime DATETIME
);
    

3. Set Up Azure Data Factory Pipeline

a. Get Metadata Activity

  • Use this activity to fetch the list of files from the Blob Storage container.
  • Configure the dataset and link service pointing to the Blob Storage.

b. ForEach Loop Activity (Sequential Execution)

  • Loop through each file sequentially to ensure the correct order.
  • Use a script activity inside the loop to insert each file name into the FileList table.

c. Lookup Activity

  • Retrieve file names from the FileList table ordered by date and time.
  • Use a SQL query that extracts and orders files based on the date and time information parsed from the file names.

d. ForEach Loop for Data Loading

  • Sequentially process each file retrieved by the Lookup activity.
  • Use a script activity to update the load status of each file after processing.
  • Use a Copy activity to transfer data from each file in Blob Storage into the Customer table in Azure SQL Database.

4. Parameterization and Dynamic Content

  • Use parameters to dynamically handle file names and paths within datasets.
  • Create additional columns during the Copy activity to include the file name and load timestamp for traceability.

Handling Different Loading Scenarios

Loading Oldest to Newest Files

  • Order the files in ascending order of date and time.
  • This ensures data consistency when older data must be processed first.

Loading Newest to Oldest Files

  • Change the order clause in your SQL query to descending.
  • This is useful when prioritizing the most recent data.

Validation and Testing

  • Verify records in the Customer table to ensure they are loaded in the correct sequence.
  • Check the FileList table for accurate load status updates.
  • Use load timestamps to confirm data processing order.

Conclusion

This methodical approach using Azure Data Factory allows you to automate the sequential loading of data files based on embedded metadata like date and time. It enhances data pipeline reliability and ensures the correct sequencing of data ingestion processes.

By following this tutorial, data engineers and analysts can establish a robust data processing workflow in Azure that scales with their data growth and organizational needs.

Watch the Tutorial Video

For a step-by-step walkthrough, watch the video below: