When the Phone Rings
A production database is slow. Users are complaining. Every minute costs money. Here is a structured approach.
Immediate Triage (First 5 Minutes)
1. Check if It Is Actually SQL Server
Before diving into query plans, rule out:
- Network latency between app server and database
- Application server CPU/memory exhaustion
- Connection pool exhaustion
- DNS resolution issues
2. Check Running Queries
SELECT
r.session_id,
r.blocking_session_id,
r.cpu_time,
r.total_elapsed_time,
r.reads,
r.writes,
r.logical_reads,
t.text AS query_text,
r.status
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.session_id > 50;
Look for:
- Queries with high elapsed time but low CPU (waiting on locks or I/O)
- Blocking chains (blocking_session_id > 0)
- Suspended queries (status = 'suspended')
3. Check Waits
SELECT TOP 10
wait_type,
wait_time_ms / 1000 AS wait_time_sec,
waiting_tasks_count,
max_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE '%SLEEP%'
AND wait_type NOT IN ('CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT')
ORDER BY wait_time_ms DESC;
The most common performance-related wait types:
PAGEIOLATCH_*: Disk I/O bottleneck — check storage performanceLCK_M_*: Locking — review blocking chainsWRITELOG: Transaction log bottleneck — check log disk performanceCXPACKET: Parallelism — review MAXDOP configurationASYNC_NETWORK_IO: Application is consuming results slowly
4. Check Current Locks
SELECT
request_session_id,
resource_type,
resource_description,
request_mode,
request_status
FROM sys.dm_tran_locks
WHERE request_session_id > 50
AND resource_database_id = DB_ID();
5. Check TempDB Contention
TempDB is a shared resource. Contention here affects every query.
SELECT
session_id,
wait_type,
wait_resource
FROM sys.dm_exec_requests
WHERE wait_resource LIKE '2:%' -- TempDB ID is 2
AND wait_type = 'PAGELATCH_%';
Second-Tier Investigation (Next 15 Minutes)
6. Identify Expensive Queries by I/O
SELECT TOP 10
total_logical_reads / execution_count AS avg_logical_reads,
total_physical_reads / execution_count AS avg_physical_reads,
execution_count,
total_worker_time / execution_count / 1000 AS avg_cpu_ms,
SUBSTRING(text, (statement_start_offset/2)+1,
((CASE statement_end_offset WHEN -1 THEN DATALENGTH(text)
ELSE statement_end_offset END - statement_start_offset)/2) + 1) AS query_text
FROM sys.dm_exec_query_stats
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
ORDER BY total_logical_reads DESC;
7. Check Missing Indexes
SELECT
migs.avg_user_impact,
migs.user_seeks + migs.user_scans AS total_accesses,
mid.statement AS table_name,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY migs.avg_user_impact DESC;
Be cautious. Missing index recommendations are useful hints, not commands. Adding indexes can hurt write performance.
8. Review SQL Server Error Log
EXEC sp_readerrorlog 0, 1, 'error';
EXEC sp_readerrorlog 0, 1, 'deadlock';
EXEC sp_readerrorlog 0, 1, 'I/O';
Root Cause Checklist (After the Fire)
- Was there a deployment? Check deployment timestamps.
- Was there a data change? Sudden data volume changes can invalidate execution plans.
- Was there a statistics update? Auto-update statistics can change plans unexpectedly.
- Was there a configuration change? MAXDOP, cost threshold for parallelism, memory settings.
- Did a maintenance job run? Index rebuilds, DBCC CHECKDB, backups.
Prevention
- Set up Query Store and baseline normal performance
- Configure alerts for blocking > 30 seconds
- Monitor disk latency (avg. disk sec/read and avg. disk sec/write)
- Keep statistics updated manually during maintenance windows
- Document the normal state so you can recognize abnormal faster
One Final Tip
When you cannot find the problem, check the application code. Many "database performance" issues are actually N+1 query problems, missing connection pooling, or transactions held open by application logic.