What Are Statistics in Fabric Data Warehouse & How to Manage Statistics
In this Microsoft Fabric tutorial, we explain what statistics are in a Fabric Data Warehouse, why they are critical for performance, and how to manage them manually or automatically using T-SQL.
📊 What Are Statistics?
- Statistics describe the distribution of values in a column
- Used by the query optimizer to estimate row counts and selectivity
- Critical to choosing an efficient query execution plan
🔧 Manual Statistics Management
Create, update, and inspect statistics on frequently filtered or joined columns:
-- Create statistics manually
CREATE STATISTICS DimCustomer_CustomerKey_FullScan
ON dbo.DimCustomer (CustomerKey) WITH FULLSCAN;
-- Update statistics after data changes
UPDATE STATISTICS dbo.DimCustomer (DimCustomer_CustomerKey_FullScan) WITH FULLSCAN;
-- View histogram of statistics
DBCC SHOW_STATISTICS ('dbo.DimCustomer', 'DimCustomer_CustomerKey_FullScan') WITH HISTOGRAM;
⚙️ Automatic Statistics
- Automatically created when queries involve
JOIN
,GROUP BY
,WHERE
, orORDER BY
- Generated under names like
_WA_Sys_...
- Automatically refreshed when underlying data changes significantly
-- Query that triggers auto-stat creation
SELECT CustomerKey FROM dbo.DimCustomer GROUP BY CustomerKey;
-- View system-generated statistics
SELECT
object_name(s.object_id) AS object_name,
c.name AS column_name,
s.name AS stats_name,
STATS_DATE(s.object_id, s.stats_id) AS stats_update_date,
s.auto_created
FROM sys.stats s
JOIN sys.stats_columns sc ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id
JOIN sys.columns c ON sc.object_id = c.object_id AND sc.column_id = c.column_id
WHERE object_name(s.object_id) = 'DimCustomer';
💡 Best Practices
- Create stats manually for columns heavily used in WHERE and JOIN clauses
- Use
WITH FULLSCAN
for higher accuracy - Inspect
STATS_DATE
to verify freshness - Allow auto stats to handle low-frequency changes