Chapter 7 · Performance Optimization

Performance work is a loop, not a finish line. Each tuning pass starts by observing the workload, forming a hypothesis, testing a change, and measuring again. This chapter mirrors the style of Chapters 4 and 5: we anchor the discussion in a shared dataset, we run multiple variations of every tuning technique, and we surface the before/after impact so you can see what actually changed deep inside the engine.

7.0. Lab Dataset and Baseline Metrics

Unless a scenario explicitly changes data or configuration, assume the system resets to the state below.

Customers

CustomerID	Region	Segment	CreatedAt
1001	NA-West	Premium	2021-01-04
1002	NA-East	Standard	2021-02-18
1003	EMEA	Premium	2021-03-22
1004	APAC	Startup	2021-04-09
1005	NA-West	Standard	2021-05-15

Orders (≈ 5 million rows in the simulated workload)

OrderID	CustomerID	OrderedAt	Status	TotalAmount
9000001	1001	2024-06-01 10:14:03	SHIPPED	185.40
9000002	1002	2024-06-01 10:15:11	PENDING	42.50
9000003	1005	2024-06-01 10:15:36	SHIPPED	79.99
9000004	1003	2024-06-01 10:16:02	CANCELLED	349.00
…	…	…	…	…

OrderLineItems (≈ 20 million rows)

LineID	OrderID	ProductID	Qty	UnitPrice
700001	9000001	P-4432	2	92.70
700002	9000001	P-7788	1	64.00
…	…	…	…	…

SlowQueryLog (captured over 24 hours)

QueryHash	SampleText	ExecCount	AvgTimeMs	MaxTimeMs
H1	`SELECT * FROM Orders WHERE CustomerID = ?`	48,512	83.40	612.52
H2	`SELECT SUM(TotalAmount) FROM Orders WHERE Status=?`	6,143	65.18	190.04
H3	`SELECT … FROM Orders JOIN OrderLineItems …`	2,901	214.72	841.88

Baseline metrics (no tuning applied yet)

Metric	Value
P95 latency for `Orders` lookup	612 ms
Total logical reads per minute	1.8 million
Buffer cache hit ratio	92%
Write journal stall rate	0.8%

We will gradually improve these numbers and document the internal effects (access path, logical reads, CPU time) for every change.

7.1. Index Tuning in Practice

Indexes shape the access path. Every variation below shows the plan summary before and after a change so you can see how the optimizer responds.

Variation A – Missing index vs selective single-column index

Query:

SELECT OrderID, OrderedAt, TotalAmount
FROM Orders
WHERE CustomerID = 1003;

State	Access Path	Rows Scanned	AvgTimeMs	Notes
Before	Seq Scan on `Orders`	5,000,000	83.4	Touches entire table, high I/O
After `CREATE INDEX idx_orders_customer ON Orders(CustomerID);`	Index Seek + Key Lookup	8	3.1	Optimizer jumps directly to matching customer rows

Inside the buffer cache after indexing

Metric	Before	After
Logical reads per execution	18,450	32
Physical reads per execution	220	1
CPU time per execution (ms)	42	2

The huge drop comes from eliminating the full table scan. Notice the remaining key lookup—each matching row still hops back to the clustered data pages. We remove that next.

Variation B – Turning the index into a covering index

Extended query:

SELECT OrderID, OrderedAt, TotalAmount
FROM Orders
WHERE CustomerID = 1003;

CREATE INDEX idx_orders_customer_cover
  ON Orders(CustomerID)
  INCLUDE (OrderedAt, TotalAmount);
DROP INDEX idx_orders_customer;

State	Access Path	Key Lookups	AvgTimeMs	Notes
With non-covering index	Index Seek + Lookup	3	3.1	Each lookup fetches data page
With covering index	Index-only Seek	0	1.7	Plan never touches base table

Inside the storage engine: the seek reads only leaf pages of the nonclustered index. Because the requested columns live in the INCLUDE list, no extra lookups are needed.

Variation C – Composite index order matters (left-prefix rule)

Typical workload: filter by Status, then order by OrderedAt DESC.

-- Option 1 (incorrect order)
CREATE INDEX idx_orders_status_amount ON Orders(Status, TotalAmount);

-- Option 2 (matches workload)
CREATE INDEX idx_orders_status_orderedat ON Orders(Status, OrderedAt DESC);

Index	Filter + ORDER BY support	Plan Shape	AvgTimeMs	Observation
`(Status, TotalAmount)`	Poor (OrderedAt missing)	Index Scan + Sort	74.2	Sort spills to tempdb
`(Status, OrderedAt)`	Excellent	Index Seek, no Sort	8.6	Plan streams rows already sorted

Because the leading column in an index is the entry point, reordering columns to match filter → sort order removes the expensive sort step.

Variation D – Partial/filtered index

Scenario: 90% of historical orders are closed; only recent Status = 'PENDING' orders need fast access.

CREATE INDEX idx_orders_pending_recent
  ON Orders(OrderedAt DESC)
  WHERE Status = 'PENDING' AND OrderedAt >= CURRENT_DATE - INTERVAL '30 days';

Query	Access Path	Pages Touched	AvgTimeMs
Fetch recent pending orders	Bitmap Heap Scan (before)	120,000	96.0
	Partial Index Seek (after)	1,240	4.2

Filtered indexes dramatically reduce index size and maintenance overhead while still accelerating the hot subset of data.

Variation E – When an index hurts

Adding an index is not always a win. On a heavy-write table, every insert must maintain the index.

Operation	No Extra Index	With Extra Index	Delta
Batch insert 10k orders	1.2 s	2.8 s	+133% latency
Transaction log growth	140 MB	280 MB	+100%

Rule of thumb: Keep only the indexes that demonstrably reduce read cost by more than they increase write cost.

7.2. Query Rewrites and Execution Plans

The optimizer chooses a plan based on the SQL you hand it. Small syntactic tweaks can unlock better plans. We’ll inspect EXPLAIN summaries (simplified) and key metrics.

**Variation A – `SELECT *` vs targeted projection**

-- Original
SELECT * FROM Orders WHERE CustomerID = 1003;

-- Rewrite
SELECT OrderID, OrderedAt, TotalAmount
FROM Orders
WHERE CustomerID = 1003;

Metric	`SELECT *`	Targeted columns	Explanation
Projected columns	12	3	Narrow projection lowers memory pressure
Key lookups	3	0 (with covering index)	Fewer columns mean covering index is feasible
Avg network payload	9.5 KB	2.1 KB	Less data crossing the wire

The rewrite plus covering index shrinks result size and eliminates extra lookups, aligning perfectly with Variation B above.

Variation B – Making predicates SARGable

-- Original (non-SARGable)
SELECT OrderID
FROM Orders
WHERE DATE(OrderedAt) = CURRENT_DATE;

-- Rewrite
SELECT OrderID
FROM Orders
WHERE OrderedAt >= CURRENT_DATE
  AND OrderedAt < CURRENT_DATE + INTERVAL '1 day';

Plan Metric	Original	Rewritten
Access path	Seq Scan	Index Seek on `OrderedAt`
Rows scanned	5,000,000	18,240
AvgTimeMs	148.6	5.4

Moving the function off the column allows the optimizer to use the time-based index. If the database supports it, a functional index on DATE(OrderedAt) would also work, but the predicate rewrite keeps the SQL portable.

Variation C – Replacing correlated subquery with `JOIN`

-- Original
SELECT o.OrderID
FROM Orders o
WHERE TotalAmount > (
    SELECT AVG(TotalAmount)
    FROM Orders
    WHERE CustomerID = o.CustomerID
);

-- Rewrite
WITH CustomerAverages AS (
    SELECT CustomerID, AVG(TotalAmount) AS AvgAmount
    FROM Orders
    GROUP BY CustomerID
)
SELECT o.OrderID
FROM Orders AS o
JOIN CustomerAverages AS ca
  ON ca.CustomerID = o.CustomerID
WHERE o.TotalAmount > ca.AvgAmount;

Metric	Correlated Subquery	CTE + Join
Executions of inner query	5,000,000	1
Logical reads	92 million	1.5 million
AvgTimeMs	842	63

Variation D – Splitting troublesome `OR` predicates

-- Original
SELECT OrderID
FROM Orders
WHERE Status = 'PENDING' OR Region = 'EMEA';

-- Rewrite
SELECT OrderID
FROM Orders
WHERE Status = 'PENDING'
UNION ALL
SELECT OrderID
FROM Orders
WHERE Region = 'EMEA' AND Status <> 'PENDING';

Plan Metric	Original	Rewritten
Access path	Seq Scan	Two index seeks
AvgTimeMs	116	18
Duplicate handling	Implicit	Explicit (second query filters duplicates)

Splitting the predicate lets the optimizer exploit the Status and Region indexes independently.

Variation E – Window function versus self-join

-- Ancient pattern using self-join
SELECT a.OrderID
FROM Orders a
LEFT JOIN Orders b
  ON a.CustomerID = b.CustomerID
 AND a.OrderedAt < b.OrderedAt
WHERE b.OrderID IS NULL;

-- Modern window function
SELECT OrderID
FROM (
    SELECT OrderID,
           ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderedAt DESC) AS rn
    FROM Orders
) AS ranked
WHERE rn = 1;

Metric	Self-join	Window function
Intermediate rows	25 million	5 million
Temp space used	4.2 GB	480 MB
AvgTimeMs	410	62

7.3. Statistics, Plans, and Maintenance Windows

Indexes are only as good as the statistics that feed the optimizer. When those stats drift, good indexes look invisible.

Variation A – Updating statistics

Action	P95 latency (ms)	Estimated rows accuracy
Before `ANALYZE Orders;`	118	Estimates off by 8×
After `ANALYZE Orders;`	42	Estimates within 1.2×

Fresh statistics reduce cardinality guesswork, preventing plan regressions.

Variation B – Rebuild vs Reorganize

Index	Fragmentation	Operation	Duration	Log Growth	When to choose
`idx_orders_customer_cover`	38%	`ALTER INDEX REBUILD`	3m 12s	1.4 GB	Use when fragmentation > 30%
`idx_orders_status_orderedat`	12%	`ALTER INDEX REORGANIZE`	42s	180 MB	Use when 5–30%

Rebuild rewrites the index from scratch (heavy but thorough). Reorganize defragments in-place (lighter but slower to converge).

Variation C – Plan cache inspection

QueryHash	Cached Plans	Avg CPU ms	Recompile Reason
H1	1	18	Stable
H2	4	85	Parameter sniffing
H3	2	202	Statistics changed

If a query has wildly different parameter values, consider parameter-sensitive plan techniques: OPTIMIZE FOR hints, plan guides, or rewriting to use recompile.

7.4. Workload Governance and Observability

Connection pooling vs ad-hoc bursts

Scenario	Active sessions	CPU saturation	Wait type
Without pooling	120	88%	`THREADPOOL`
With 32-pool limit	34	55%	`PAGEIOLATCH`

Fewer, longer-lived connections reduce context switching and improve plan reuse.

Baseline monitoring checklist

Time Range	Key Metric	Threshold	Action
Every 5 min	Top wait events	`LCK_` > 5%	Inspect blocking tree
Hourly	Buffer cache hit ratio	< 90%	Investigate index-only scans or cache size
Nightly	Autovacuum / autostats lag	> 2 hrs	Trigger manual maintenance
Weekly	SlowQueryLog deltas	> 10% growth	Re-run tuning playbook

7.5. Putting It Together

Tuning Move	Primary Benefit	Secondary Cost	When to Roll Back
Add covering index	Slashes logical reads	Higher write amplification	Write-heavy bursts slow noticeably
Rewrite correlated subquery	Reduces CPU and temp I/O	More complex SQL	Query plan still scans millions of rows
Update statistics	Fixes cardinality estimates	Temporary CPU spike	Production window too tight
Limit pool size	Stabilizes resource usage	Slightly higher queue wait	Queue wait > SLA threshold

Remember: Performance tuning is applied science. Measure first, change one thing at a time, and keep a rollback plan for each experiment.

7.0. Lab Dataset and Baseline Metrics​

7.1. Index Tuning in Practice​

Variation A – Missing index vs selective single-column index​

Variation B – Turning the index into a covering index​

Variation C – Composite index order matters (left-prefix rule)​

Variation D – Partial/filtered index​

Variation E – When an index hurts​

7.2. Query Rewrites and Execution Plans​

Variation A – SELECT * vs targeted projection​

Variation B – Making predicates SARGable​

Variation C – Replacing correlated subquery with JOIN​

Variation D – Splitting troublesome OR predicates​

Variation E – Window function versus self-join​

7.3. Statistics, Plans, and Maintenance Windows​

Variation A – Updating statistics​

Variation B – Rebuild vs Reorganize​

Variation C – Plan cache inspection​

7.4. Workload Governance and Observability​

Connection pooling vs ad-hoc bursts​

Baseline monitoring checklist​

7.5. Putting It Together​