8.7 — Performance Testing

Opening scenario

A user reports that scrolling the news feed “feels janky on my iPhone 12.” Your team’s iPhone 15s see 120fps; nobody noticed. Three weeks later your App Store rating drops half a star because the iPhone 12 is the median device, and 30% of your users are on it. A performance regression test would have caught this on commit.

Performance tests in XCTest measure execution time, memory, CPU, and frame rate — and fail the build when a metric drifts past a baseline.

Context taxonomy

Concept	Context	Why it matters	Common confusion
`measure {}`	Wraps code to be timed	Runs the block 10× by default	One measurement → meaningless statistics
`XCTMetric`	What to measure	Time, memory, CPU, storage	Defaulting to time only
`XCTClockMetric`	Wall-clock time	Total elapsed time	Differs from `XCTCPUMetric` (CPU work)
`XCTMemoryMetric`	Peak/persistent memory	Catches leaks + bloat	Confused with Instruments leak detection
Baseline	The expected metric value	Committed to repo, enforced in CI	Re-baselining every regression — kills the signal
Instruments	Apple’s profiler	Deeper investigation tool	Confused with XCTest performance (they’re complementary)

Concept → Why → How → Code

Concept: XCTest performance tests run a code block multiple times, record metrics, and compare against a stored baseline. If the new run exceeds the baseline by your tolerance (default 10%), the test fails.

Why: performance regressions are silent until users complain. A regression gate in CI keeps your hot paths fast forever — you can’t accidentally drop frame rate, allocate 10× more memory, or 2× a critical function without the test screaming.

How: write a test that exercises the hot path, wrap it in measure(metrics:options:), record a baseline on a known-good build, commit the baseline, configure CI to fail on regression.

Code — a complete performance test:

import XCTest
@testable import App

final class FeedPerformanceTests: XCTestCase {
    let largeFeed = (0..<10_000).map { Post.stub(id: $0) }

    func test_renderFeed_performance() {
        let options = XCTMeasureOptions()
        options.iterationCount = 5

        measure(
            metrics: [XCTClockMetric(), XCTMemoryMetric(), XCTCPUMetric()],
            options: options
        ) {
            let processed = FeedProcessor.prepare(largeFeed)
            XCTAssertEqual(processed.count, 10_000)
        }
    }
}

After first run, click the gray diamond in the gutter → “Set Baseline.” Commit. Subsequent runs compare against it.

The metric catalog

Metric	Measures
`XCTClockMetric`	Wall-clock time (default since iOS 14)
`XCTCPUMetric`	CPU instructions retired + cycles
`XCTMemoryMetric`	Peak physical memory, peak heap
`XCTStorageMetric`	Bytes written to disk
`XCTApplicationLaunchMetric`	Cold launch time (UI test target only)
`XCTOSSignpostMetric`	Custom signpost spans

Pass multiple metrics in the array — one run, multiple gauges.

Custom signposts — measure the right span

Use os_signpost to demarcate the work you actually care about:

import os

let log = OSLog(subsystem: "com.example.app", category: "feed")

func processBatch() {
    let id = OSSignpostID(log: log)
    os_signpost(.begin, log: log, name: "processBatch", signpostID: id)
    // ... work ...
    os_signpost(.end, log: log, name: "processBatch", signpostID: id)
}

// In test:
func test_processBatch_performance() {
    measure(metrics: [XCTOSSignpostMetric(subsystem: "com.example.app",
                                          category: "feed",
                                          name: "processBatch")]) {
        processBatch()
    }
}

Launch time (UI test target)

final class LaunchTests: XCTestCase {
    func test_launch_performance() {
        measure(metrics: [XCTApplicationLaunchMetric()]) {
            XCUIApplication().launch()
        }
    }
}

Cold launch is the metric Apple highlights in Xcode Organizer — it’s one of the very few that the App Store surfaces to users implicitly via “first impression.”

Baselines and CI

Baselines are stored per-device per-OS. iPhone 12 simulator and iPhone 15 simulator have separate baselines.
Don’t set baselines on the slowest device in your dev team; set them on what CI runs.
Default tolerance: 10% above baseline = failure. Tune in the gutter UI.
In CI: xcodebuild test ... -resultBundlePath → parse the .xcresult for performance regressions.

Instruments vs XCTest performance

Use Instruments when…	Use XCTest performance when…
Debugging a specific slow path	Preventing future regressions
Investigating allocations + leaks	Asserting “this stays under X ms”
Profiling a real device session	Running in CI on every PR
Building a flame graph	Failing the build on drift

They’re complementary. Instruments tells you why; XCTest tells you whether.

In the wild

Xcode Organizer → Metrics — surfaces real-user performance (launch, hang, disk usage, energy) from the App Store opt-in metrics; not the same as XCTest performance, but the data goal is the same.
MetricKit — opt-in framework that delivers MXMetricPayload reports daily; for production telemetry, not CI tests.
Square’s Pollux — internal perf regression dashboard built on XCTest metrics + custom signposts.

Common misconceptions

“Performance tests need real devices.” Simulator is fine for regression detection — you’re measuring deltas, not absolutes. Real devices are for absolute measurements before launch.
“measure {} runs the code once.” It runs 5–10 times by default and reports min/avg/std-dev.
“Re-baseline whenever the test fails.” That destroys the regression signal. Investigate first; only re-baseline when the change is intentional.
“XCTMemoryMetric catches leaks.” It catches peak memory deltas, not leaks specifically. Use Instruments → Leaks for that.
“Performance tests should run on every PR.” They should — but only on a consistent runner. Putting them on a varied pool gives noisy baselines.

Seasoned engineer’s take

The two performance tests that matter most for nearly every app: cold launch and the largest list-rendering path. Get those two locked down with baselines and a CI gate, and you’ve caught 80% of the regressions users will notice. Everything beyond that is nice-to-have. Don’t build a massive performance test suite up front — let production telemetry (MetricKit, App Store metrics) tell you what’s actually slow before you over-instrument.

[!TIP] Run performance tests with Release configuration, not Debug. Debug has assertions, no inlining, no whole-module optimization — measurements there are meaningless.

[!WARNING] Performance tests on macOS runners with thermal throttling (CI fleet under heavy load) produce flaky baselines. Pin tests to a dedicated runner or use a noisy-neighbor-tolerant tolerance (15–20%).

Interview corner

Junior — “How do you measure performance in XCTest?” Use measure {} inside a test method. It runs the block multiple times and records metrics. Pass an array of XCTMetric to capture time, memory, CPU. Set a baseline through Xcode’s gutter UI; future runs compare against it.

Mid — “Your CI started flaking on a perf test. What’s the diagnosis?” Check whether CI is running on a shared/variable runner — thermal throttling and noisy neighbors inflate measurements unpredictably. Check the baseline was set on the same runner type. Check the test isn’t doing real I/O (network, disk) inside measure. If all three are clean, investigate whether a recent change actually regressed the code path being measured.

Senior — “Design a perf regression strategy for a list-heavy app.” Three layers. One: XCTest perf tests on the hot paths — cell configuration, image decoding, scroll-triggered prefetch — with baselines per CI device, gated on every PR. Two: custom OS signposts wrapping each subsystem, so Instruments traces in development map cleanly to the same boundaries the tests measure. Three: MetricKit + App Store metrics for real-user telemetry, with a weekly dashboard for cold launch time, hang rate, and scroll responsiveness. The XCTest layer catches the regressions you wrote; the telemetry layer catches the ones the device fleet exposes that you didn’t predict. I’d also consider running the perf tests under Instruments’ “Time Profiler” template in nightly to capture flame graphs alongside the pass/fail signal.

Red flag — “I just look at Xcode’s runtime info when I’m coding.” That’s not a regression strategy; that’s spot-checking.

Lab preview

Lab 8.3 includes a perf test section: write a measure test for an image-heavy collection view, set a baseline, intentionally regress the code, and watch the test fail.

Next: Code Coverage & SwiftLint

The Swift iOS & macOS Engineer