What Are Statistics in Fabric Data Warehouse & How to Manage Statistics | Microsoft Fabric Tutorial

What Are Statistics in Fabric Data Warehouse & How to Manage Statistics | Microsoft Fabric Tutorial

What Are Statistics in Fabric Data Warehouse & How to Manage Statistics

In this Microsoft Fabric tutorial, we explain what statistics are in a Fabric Data Warehouse, why they are critical for performance, and how to manage them manually or automatically using T-SQL.

📊 What Are Statistics?

  • Statistics describe the distribution of values in a column
  • Used by the query optimizer to estimate row counts and selectivity
  • Critical to choosing an efficient query execution plan

🔧 Manual Statistics Management

Create, update, and inspect statistics on frequently filtered or joined columns:

-- Create statistics manually
CREATE STATISTICS DimCustomer_CustomerKey_FullScan
ON dbo.DimCustomer (CustomerKey) WITH FULLSCAN;

-- Update statistics after data changes
UPDATE STATISTICS dbo.DimCustomer (DimCustomer_CustomerKey_FullScan) WITH FULLSCAN;

-- View histogram of statistics
DBCC SHOW_STATISTICS ('dbo.DimCustomer', 'DimCustomer_CustomerKey_FullScan') WITH HISTOGRAM;

⚙️ Automatic Statistics

  • Automatically created when queries involve JOIN, GROUP BY, WHERE, or ORDER BY
  • Generated under names like _WA_Sys_...
  • Automatically refreshed when underlying data changes significantly
-- Query that triggers auto-stat creation
SELECT CustomerKey FROM dbo.DimCustomer GROUP BY CustomerKey;

-- View system-generated statistics
SELECT
    object_name(s.object_id) AS object_name,
    c.name AS column_name,
    s.name AS stats_name,
    STATS_DATE(s.object_id, s.stats_id) AS stats_update_date,
    s.auto_created
FROM sys.stats s
JOIN sys.stats_columns sc ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id
JOIN sys.columns c ON sc.object_id = c.object_id AND sc.column_id = c.column_id
WHERE object_name(s.object_id) = 'DimCustomer';

💡 Best Practices

  • Create stats manually for columns heavily used in WHERE and JOIN clauses
  • Use WITH FULLSCAN for higher accuracy
  • Inspect STATS_DATE to verify freshness
  • Allow auto stats to handle low-frequency changes

🎬 Watch the Full Tutorial

Blog post written with the help of ChatGPT.