Performance diagnostics without a methodology is archaeology — you dig until you find something interesting, not necessarily something important. Oracle’s AWR and ASH provide an extraordinarily detailed record of database activity, but interpreting them requires a structured approach. This article presents a top-down methodology that moves from macro to micro: DB Time → Wait Profile → Top SQL → Root Cause.
The Foundation: DB Time
DB Time is the total time all sessions spent working in the database (on CPU or waiting for database resources) during a snapshot period. It is the single most important number in the AWR report.
-- DB Time from AWR snapshots
SELECT s.snap_id,
s.begin_interval_time,
s.end_interval_time,
ROUND(m.value / 1000000 / 60, 2) AS db_time_minutes
FROM dba_hist_sys_time_model m
JOIN dba_hist_snapshot s ON m.snap_id = s.snap_id AND m.dbid = s.dbid
WHERE m.stat_name = 'DB time'
AND s.begin_interval_time >= SYSDATE - 1
ORDER BY s.snap_id DESC;
DB Time as a Capacity Metric
DB Time / elapsed wall-clock time = average active sessions (AAS).
AAS = DB Time (seconds) / Elapsed Time (seconds)
If AAS > number of CPUs, the database is overloaded (CPU is the bottleneck or sessions are queued). If AAS ≪ number of CPUs, the database has headroom. This single ratio puts all performance numbers in context.
Step 1: Wait Event Profile
Decompose DB Time into CPU time and wait time categories. This tells you whether time is being spent usefully (on CPU) or blocking (waiting).
-- Wait event profile from AWR (one snapshot interval)
SELECT event,
waits,
ROUND(time_waited_micro / 1000000, 2) AS total_wait_secs,
ROUND(time_waited_micro / 1000000 / waits * 1000, 2) AS avg_wait_ms,
ROUND(time_waited_micro / (SELECT SUM(time_waited_micro)
FROM dba_hist_system_event e2
WHERE e2.snap_id = e.snap_id
AND e2.dbid = e.dbid) * 100, 1) AS pct_total
FROM dba_hist_system_event e
WHERE snap_id = (SELECT MAX(snap_id) - 1 FROM dba_hist_snapshot)
AND wait_class NOT IN ('Idle')
ORDER BY time_waited_micro DESC
FETCH FIRST 10 ROWS ONLY;
Interpreting Wait Class Distribution
| Wait Class | Common Events | What It Signals |
|---|---|---|
| User I/O | db file sequential read, db file scattered read |
Index scans, full table scans |
| Commit | log file sync |
High commit rate; redo bottleneck |
| Concurrency | buffer busy waits, enq: TX - row lock contention |
Hot blocks; lock contention |
| System I/O | log file parallel write, control file I/O |
Redo write performance |
| Network | SQL*Net more data to client |
Result set transfer overhead |
| Other | gc cr blocks received, gc current blocks received |
RAC Cache Fusion (only in RAC) |
| Scheduler | resmgr: cpu quantum |
CPU contention; resource manager queuing |
If On CPU (not a wait event but reported in the Top 5 Timed Events) is dominant, focus on reducing logical I/Os (unnecessary full scans, missing indexes, suboptimal SQL).
Step 2: ASH for In-flight Diagnostics
AWR snapshots are 30 minutes apart by default. For problems that occur within a snapshot interval (or are happening right now), use Active Session History (ASH) — a rolling in-memory buffer sampled every second.
Real-time ASH: What Is the Database Doing Now?
-- What are active sessions doing right now?
SELECT event,
COUNT(*) AS active_sessions,
ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM v$active_session_history
WHERE sample_time >= SYSDATE - 1/1440 -- last 1 minute
AND session_state = 'WAITING'
GROUP BY event
ORDER BY active_sessions DESC;
-- Which SQL IDs are consuming the most DB Time right now?
SELECT sql_id,
COUNT(*) AS ash_samples,
ROUND(COUNT(*) * 10, 0) AS approx_db_time_secs -- ~1 sample/sec
FROM v$active_session_history
WHERE sample_time >= SYSDATE - 5/1440 -- last 5 minutes
GROUP BY sql_id
ORDER BY ash_samples DESC
FETCH FIRST 10 ROWS ONLY;
Historical ASH from DBA_HIST_ACTIVE_SESS_HISTORY
-- Top wait events for a specific time window (historical)
SELECT ash.event,
COUNT(*) AS samples,
ROUND(COUNT(*) / SUM(COUNT(*)) OVER () * 100, 1) AS pct_db_time
FROM dba_hist_active_sess_history ash
WHERE ash.sample_time BETWEEN TO_TIMESTAMP('2026-03-01 09:00', 'YYYY-MM-DD HH24:MI')
AND TO_TIMESTAMP('2026-03-01 09:30', 'YYYY-MM-DD HH24:MI')
AND ash.session_state = 'WAITING'
AND ash.wait_class != 'Idle'
GROUP BY ash.event
ORDER BY samples DESC
FETCH FIRST 15 ROWS ONLY;
ASH to Identify the “Offending” Time Window
-- Activity timeline in 5-minute buckets (find when the spike occurred)
SELECT TRUNC(sample_time, 'MI') - MOD(EXTRACT(MINUTE FROM sample_time), 5) * INTERVAL '1' MINUTE AS bucket,
COUNT(*) AS active_sessions
FROM dba_hist_active_sess_history
WHERE sample_time >= SYSDATE - 2 -- last 2 days
GROUP BY TRUNC(sample_time, 'MI') - MOD(EXTRACT(MINUTE FROM sample_time), 5) * INTERVAL '1' MINUTE
ORDER BY bucket;
Step 3: Top SQL Identification
Once you know the dominant wait event, find which SQL statements are responsible.
-- Top SQL by DB Time from AWR (for a specific snapshot range)
SELECT sql.sql_id,
ROUND(SUM(elapsed_time_delta) / 1000000, 2) AS elapsed_secs,
SUM(executions_delta) AS executions,
ROUND(SUM(elapsed_time_delta) / NULLIF(SUM(executions_delta), 0) / 1000, 2) AS avg_ms,
ROUND(SUM(buffer_gets_delta) / NULLIF(SUM(executions_delta), 0)) AS avg_lios,
ROUND(SUM(disk_reads_delta) / NULLIF(SUM(executions_delta), 0)) AS avg_pio
FROM dba_hist_sqlstat sql
JOIN dba_hist_snapshot s ON sql.snap_id = s.snap_id AND sql.dbid = s.dbid
WHERE s.begin_interval_time >= SYSDATE - 1
GROUP BY sql.sql_id
ORDER BY elapsed_secs DESC
FETCH FIRST 20 ROWS ONLY;
Retrieve the SQL Text
SELECT sql_id, sql_text
FROM dba_hist_sqltext
WHERE sql_id = '&sql_id';
Step 4: Execution Plan Analysis
For identified top SQL, retrieve the execution plan and runtime statistics:
-- Current plan with runtime stats (requires the query to be in shared pool)
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('&sql_id', NULL, 'ALLSTATS LAST +PEEKED_BINDS'));
-- Historical plan from AWR
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', NULL, NULL, 'ALL'));
Reading the Plan: Key Indicators of Problems
- High E-rows vs A-rows discrepancy: The optimizer’s row estimate is far from actual. Points to stale statistics or bind variable peeking issues.
- Full table scan on large table without Smart Scan: If not on Exadata, indicates a missing index or poor index usage.
- Nested loop on large row sources: Driving a large table into a nested loop with many probe iterations.
- Sort/Hash Join with large temp space: Workarea spill to disk (
TEMPtablespace writes).
ADDM: Automated Diagnostic Output
ADDM analyses AWR snapshots automatically and surfaces findings with impact estimates:
-- View latest ADDM task findings
SELECT f.type,
f.impact_type,
ROUND(f.impact, 0) AS impact_secs,
f.message
FROM dba_advisor_findings f
JOIN dba_advisor_tasks t ON f.task_id = t.task_id
WHERE t.advisor_name = 'ADDM'
AND t.created >= SYSDATE - 1
ORDER BY f.impact DESC
FETCH FIRST 10 ROWS ONLY;
ADDM findings are a useful starting point but not a substitute for manual analysis. ADDM tends to surface SQL-level issues well, but misses application-tier problems, connection management issues, and complex multi-component bottlenecks.
Practical Diagnostic Workflow Summary
- Establish baseline: What is the expected AAS for this workload at this time of day?
- Identify anomalous period: Use ASH timeline query to find when performance degraded.
- Decompose by wait class: What wait class dominates during the anomalous period?
- Find top SQL: Which SQL IDs account for the most samples in that wait class/event?
- Analyse the plan: Is the execution plan for those SQL IDs correct? Compare against known-good plans if available.
- Act: Fix statistics, add indexes, pin SQL baselines, fix application, or tune resource management — depending on what step 5 reveals.
The methodology is the same whether the problem happened an hour ago (ASH) or two weeks ago (AWR). The data is there — the key is following a consistent path from macro to micro.