7.7 — ARKit basics

Opening scenario

A retail client wants an “place this couch in your living room” feature. You open Xcode, see ARKit, RealityKit, SceneKit, Reality Composer Pro, USDZ, QuickLook — and wonder which one you’re supposed to use. The answer: in 2026, RealityKit on top of ARKit for new code, with QuickLook + USDZ for any “view this product in AR” feature that doesn’t need custom interaction. SceneKit still ships but is in maintenance; SpriteKit is unrelated (2D). This chapter is the just-enough-to-be-dangerous tour.

ContextWhat it usually means
Reads “ARView / RealityView”Has placed objects in AR
Reads “plane detection”Knows the world-tracking basics
Reads “anchors / entities”Has structured a scene
Reads “occlusion / people occlusion”Has worked on realism
Reads “image / object tracking”Has built marker-based AR

Concept → Why → How → Code

Concept

  • ARKit — the sensor + tracking layer. Owns the camera feed, world tracking, plane/image/object/face/body detection, LiDAR depth, ARCoachingOverlay.
  • RealityKit — the rendering + simulation layer. PBR materials, physics, audio, ECS-style Entity model.
  • RealityView (SwiftUI, iOS 18+) — modern SwiftUI host. Older code uses ARView (UIKit).
  • Reality Composer Pro — visual scene editor; outputs .reality and USDZ bundles.
  • QuickLook + USDZ — drop a USDZ file into a SwiftUI view (ARQuickLookView) and Apple handles AR-place, scale, rotate, share. Zero custom code.

Why

ARKit is the only way to use the device’s full sensor fusion (camera, gyro, IMU, LiDAR) for AR. RealityKit gives you a modern, Swift-native scene graph with PBR rendering that visually matches Apple’s reference apps without authoring an OpenGL/Metal renderer.

How — capabilities & Info.plist

NSCameraUsageDescription = "AR features require the camera to overlay 3D content on your view."

ARKit availability check:

import ARKit

guard ARWorldTrackingConfiguration.isSupported else {
    // Older device — fall back to 2D or QuickLook USDZ
    return
}

Simplest AR — QuickLook with a USDZ

import QuickLook
import SwiftUI

struct ARQuickLookView: UIViewControllerRepresentable {
    let url: URL // path to a .usdz file

    func makeUIViewController(context: Context) -> QLPreviewController {
        let controller = QLPreviewController()
        controller.dataSource = context.coordinator
        return controller
    }
    func updateUIViewController(_ controller: QLPreviewController, context: Context) {}
    func makeCoordinator() -> Coordinator { Coordinator(url: url) }

    final class Coordinator: NSObject, QLPreviewControllerDataSource {
        let url: URL
        init(url: URL) { self.url = url }
        func numberOfPreviewItems(in c: QLPreviewController) -> Int { 1 }
        func previewController(_ c: QLPreviewController, previewItemAt index: Int) -> QLPreviewItem {
            url as QLPreviewItem
        }
    }
}

Tap the AR button, the user gets free plane detection, scale, rotate, occlusion, share. 90% of e-commerce “view in your room” features should stop here.

Custom AR with RealityView (iOS 18+)

import RealityKit
import ARKit
import SwiftUI

struct PlaceCubeView: View {
    var body: some View {
        RealityView { content in
            // One-time setup
            let arConfig = ARWorldTrackingConfiguration()
            arConfig.planeDetection = [.horizontal]
            arConfig.environmentTexturing = .automatic
            content.camera = .spatialTracking

            // Anchor will attach to first detected horizontal plane
            let anchor = AnchorEntity(plane: .horizontal, classification: .floor, minimumBounds: [0.2, 0.2])
            let mesh = MeshResource.generateBox(size: 0.1)
            let material = SimpleMaterial(color: .systemTeal, isMetallic: false)
            let model = ModelEntity(mesh: mesh, materials: [material])
            anchor.addChild(model)
            content.add(anchor)
        }
        .ignoresSafeArea()
    }
}

Older path — ARView (UIKit, still common in 2026 codebases)

import ARKit
import RealityKit

final class ARSceneController: UIViewController {
    private let arView = ARView(frame: .zero)

    override func viewDidLoad() {
        super.viewDidLoad()
        view.addSubview(arView)
        arView.frame = view.bounds
        arView.autoresizingMask = [.flexibleWidth, .flexibleHeight]

        let config = ARWorldTrackingConfiguration()
        config.planeDetection = [.horizontal, .vertical]
        config.frameSemantics = [.personSegmentationWithDepth] // people occlusion (A12+)
        arView.session.run(config)

        let coaching = ARCoachingOverlayView()
        coaching.session = arView.session
        coaching.goal = .horizontalPlane
        coaching.frame = view.bounds
        view.addSubview(coaching)

        let tap = UITapGestureRecognizer(target: self, action: #selector(handleTap(_:)))
        arView.addGestureRecognizer(tap)
    }

    @objc func handleTap(_ tap: UITapGestureRecognizer) {
        let point = tap.location(in: arView)
        guard let result = arView.raycast(from: point, allowing: .estimatedPlane, alignment: .horizontal).first else { return }
        let anchor = AnchorEntity(world: result.worldTransform)
        let box = ModelEntity(mesh: .generateBox(size: 0.1),
                              materials: [SimpleMaterial(color: .systemPink, isMetallic: false)])
        box.generateCollisionShapes(recursive: true)
        anchor.addChild(box)
        arView.scene.addAnchor(anchor)
        arView.installGestures([.translation, .rotation, .scale], for: box)
    }
}

Image tracking — marker-based AR

let config = ARImageTrackingConfiguration()
guard let images = ARReferenceImage.referenceImages(inGroupNamed: "ARImages", bundle: nil) else { return }
config.trackingImages = images
config.maximumNumberOfTrackedImages = 4
arView.session.run(config)

Reference images live in Assets.xcassets → AR Resource Group, tagged with their physical size (the tracker needs the real-world width).

Object tracking & body / face tracking

  • ARObjectScanningConfiguration + ARReferenceObject — scan a 3D object once, then track in subsequent sessions.
  • ARFaceTrackingConfiguration — front camera, requires A12+ Bionic. Powers Memoji, FaceTime filters.
  • ARBodyTrackingConfiguration — full skeleton tracking, A12+.

LiDAR niceties (Pro iPhones, iPads)

config.sceneReconstruction = .meshWithClassification
config.frameSemantics.insert(.sceneDepth)

Gives you a real-time mesh of the room with surface classification (wall/floor/ceiling/window). Use for occlusion of virtual objects behind real geometry and for physics interactions with real walls.

visionOS — same APIs, different scale

On visionOS, RealityKit is the primary UI framework, not a special-case AR layer. Most of the scene-graph code transfers; the input model (gaze + pinch) and the spatial UX patterns are different. Covered in Chapter 12.

In the wild

  • IKEA Place, Wayfair View in Room — QuickLook USDZ + custom Reality Composer scenes.
  • Apple Maps Look Around — not strictly AR but uses related camera+geo fusion.
  • Snapchat / Instagram AR filters — they ship their own engine on top of ARFaceTrackingConfiguration.
  • Measure (Apple) — pure ARKit + raycast + simple geometry.
  • Pokémon Go — runs both ARCore on Android and ARKit on iOS, falls back to gyro-only.

Common misconceptions

  1. “AR drains the battery so much I shouldn’t ship it.” Modern devices handle 15-30 minutes of AR fine. Be mindful of idle AR — don’t keep ARSession running on a screen the user isn’t interacting with.
  2. “SceneKit is the modern choice.” SceneKit is in maintenance. RealityKit is where Apple invests; new features (Object Capture, hover effects, visionOS) land in RealityKit.
  3. “I need a Mac with USDZ tools to make a model.” Reality Composer Pro (free with Xcode) and even SwiftUI’s Model3D view handle USDZ. For complex models, Blender → USDZ via the official exporter works.
  4. “Plane detection works in any lighting.” It needs visual features. Plain white walls, glossy floors, and low-light all degrade tracking. Use ARCoachingOverlayView to guide the user.
  5. “ARKit can save the user’s AR session to share with another user.” It can — via ARWorldMap serialization — but only between the same user’s devices on iOS. Cross-user shared AR is MultipeerConnectivity + sharing collaboration data.

Seasoned engineer’s take

The hardest part of shipping AR is the UX before the AR session starts. Users don’t know they need to wave the phone around in good light; they tap the AR button, see a black screen for 3 seconds, and bounce. Standard mitigations:

  1. ARCoachingOverlayView — Apple’s official “move your phone” UI. Always present it.
  2. Pre-AR onboarding — a single still-frame explainer: “Point at the floor and move your phone slowly.”
  3. Fallback to 3D-only mode when tracking fails (Reality Composer scene viewed without the camera background).

Engineering: keep the ARKit code in a dedicated controller, don’t sprinkle anchors across view models. ARSession is a long-lived stateful object; restarting it costs ~500ms of re-tracking, which feels like a freeze.

For most apps that “want AR,” the answer is QuickLook + USDZ. Custom RealityKit work is justified when you need interactive behavior (drag-to-resize beyond QuickLook’s, multi-object scenes, physics) or specialized tracking (images, objects, body).

TIP: Cache USDZ files in Application Support, not in the bundle, when products are downloaded dynamically. Bundle USDZs inflate IPA size dramatically and trigger over-the-air install warnings at 200MB.

WARNING: Never store raw ARFrame capturedImage data to disk without a strong privacy reason and user consent. The camera frame contains the user’s living room.

Interview corner

Junior: “What’s the simplest way to let a user place a 3D product in their room?”

Ship a USDZ file and present it with QLPreviewController. The system handles AR-place, scale, rotate, occlusion, and the AR button. No custom AR code needed.

Mid: “How would you let the user tap to place a model and then drag, rotate, scale it?”

Run an ARWorldTrackingConfiguration with horizontal-plane detection. On tap, arView.raycast(from:point, allowing: .estimatedPlane, alignment: .horizontal) returns a world transform; create an AnchorEntity(world:) and a ModelEntity. Call generateCollisionShapes on the entity, then arView.installGestures([.translation, .rotation, .scale], for: entity). Use ARCoachingOverlayView to guide the user through plane detection.

Senior: “Design an AR retail app that ships hundreds of products, with reliable performance on a 5-year-old iPhone XS.”

Two-tier model. (1) Lightweight catalog: thumbnails + USDZ files hosted on a CDN, downloaded on demand into Application Support, capped to ~50MB cache with LRU eviction. (2) AR mode: QuickLook for default “place in room”; custom RealityView only for SKUs that need configurator behavior (color swap, modular assembly). Tracking config: horizontal-plane only on older devices; horizontal+vertical+person-occlusion on A12+. Reserve LiDAR-only features for Pro devices. Optimize USDZ assets: <5MB per model, materials baked, normal maps preferred over high-poly geometry. Show ARCoachingOverlayView always; on tracking failure, fall back to a 360° spin viewer (no camera background) so the user still sees the product. Telemetry: log “AR placed”, “AR dismissed without place”, “tracking lost” to identify catalog SKUs with broken assets.

Red flag: “We use SceneKit because it’s been around longer and is more stable.”

In 2026, Apple is investing in RealityKit, not SceneKit. New AR features (people occlusion, scene reconstruction, hover effects, visionOS) land in RealityKit first. SceneKit still works but is a dead-end choice for greenfield code.

Lab preview

ARKit doesn’t have a dedicated lab in Phase 7 (the 4 labs prioritize the more common patterns), but the Lab 7.1 — Weather + Map app stretch goal includes “place a 3D weather icon in AR at the location’s coordinate” — a 50-line addition that exercises this chapter end-to-end.


Next: 7.8 — CoreML & Create ML