7.8 — CoreML & Create ML

Opening scenario

The product team wants three things: “classify this photo as cat/dog/other,” “tell me the sentiment of this review,” and “detect when the user finishes a yoga pose on Vision Pro.” All three are on-device ML problems. You don’t need a PhD, a GPU cluster, or a cloud bill. You need Create ML (Apple’s no-code/low-code trainer, ships with Xcode) and CoreML (the on-device runtime). The full pipeline — gather data → train → drop the .mlmodel into Xcode → call from Swift — takes a couple of afternoons.

Context	What it usually means
Reads “MLModel / VNCoreMLRequest”	Has done basic image inference
Reads “Create ML”	Has trained a custom model
Reads “model quantization / Neural Engine”	Has optimized for size and speed
Reads “MLModelConfiguration”	Has tuned compute units
Reads “ModelCollection / on-device personalization”	Has shipped updateable models

Concept → Why → How → Code

Concept

CoreML — the runtime. Loads .mlmodelc (compiled .mlmodel), runs inference on CPU/GPU/Neural Engine. Auto-generated Swift class per model.
Vision — the vision framework. Wraps image-related CoreML use into pre-built request types (object detection, classification, face detection, text recognition).
Natural Language — sentiment, language ID, tokenization, named entity recognition. Many built-in models; you can swap your own.
Create ML — Apple’s training app (standalone Mac app + Xcode integration). UI for image classifier, object detector, text classifier, tabular regressor/classifier, sound classifier, action classifier, hand pose classifier.
CoreML Tools (coremltools Python package) — converts TensorFlow/PyTorch/ONNX/scikit-learn models to CoreML format.

Why

On-device — no network, no inference cost, no privacy leak.
Neural Engine — Apple’s dedicated ML accelerator. Inference times measured in single-digit milliseconds for typical image classifiers.
No backend — your inference is free at scale.
Offline — works on planes and in tunnels.

How — image classification with a Create ML model

Open Create ML.app (Xcode → Open Developer Tool → Create ML).
New Document → Image Classification.
Drop training folder structured as cat/, dog/, other/ with ~50+ images each.
Drop validation folder with the same structure (~20% of training count).
Click Train. Wait minutes.
Export the .mlmodel.
Drag into Xcode. Xcode generates a Swift class (e.g. PetClassifier).

import CoreML
import Vision
import UIKit

actor PetClassifierService {
    private let model: VNCoreMLModel

    init() throws {
        let config = MLModelConfiguration()
        config.computeUnits = .all // CPU + GPU + Neural Engine; iOS picks best
        let core = try PetClassifier(configuration: config).model
        self.model = try VNCoreMLModel(for: core)
    }

    func classify(_ image: UIImage) async throws -> [VNClassificationObservation] {
        guard let cgImage = image.cgImage else { return [] }
        let request = VNCoreMLRequest(model: model)
        request.imageCropAndScaleOption = .centerCrop
        let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .up)
        try handler.perform([request])
        return (request.results as? [VNClassificationObservation]) ?? []
    }
}

// Usage
let results = try await service.classify(uiImage)
if let top = results.first {
    print("\(top.identifier) — \(String(format: "%.2f", top.confidence))")
}

Vision built-in: text recognition, face detection, body pose

You don’t always need a custom model. Vision ships with:

VNRecognizeTextRequest — OCR. Multiple languages, fast, accurate.
VNDetectFaceRectanglesRequest / VNDetectFaceLandmarksRequest — faces and landmarks.
VNDetectHumanBodyPoseRequest — joint positions for body pose.
VNDetectAnimalBodyPoseRequest — dogs and cats.
VNDetectBarcodesRequest — QR + many barcode formats.
VNGenerateOpticalFlowRequest — frame-to-frame motion.

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "ja-JP"]
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
let lines = (request.results ?? []).compactMap { $0.topCandidates(1).first?.string }

NaturalLanguage — sentiment & language ID

import NaturalLanguage

let tagger = NLTagger(tagSchemes: [.sentimentScore])
tagger.string = "I love this app, but the latest update broke my widget."
let (sentiment, _) = tagger.tag(at: tagger.string!.startIndex,
                                 unit: .paragraph, scheme: .sentimentScore)
// sentiment.rawValue is "-0.4" — slightly negative

For language ID:

let recognizer = NLLanguageRecognizer()
recognizer.processString("これは日本語のテキストです。")
let lang = recognizer.dominantLanguage // .japanese

MLModelConfiguration — tuning

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine // skip GPU to save battery on intensive inference
config.allowLowPrecisionAccumulationOnGPU = true // faster, occasionally less accurate
config.preferredMetalDevice = MTLCreateSystemDefaultDevice()

.all (default) — let CoreML pick. Usually right.
.cpuAndNeuralEngine — force Neural Engine path, lowest power.
.cpuOnly — for debugging, or for ANE-incompatible ops.

Quantization & model size

Models ship at FP32 by default. Most apps can quantize to FP16 or even INT8 with negligible accuracy loss and 2-4× smaller download. Done in coremltools or in Create ML’s export options.

A 50MB image classifier becomes 12MB after FP16 → INT8 — fits in a bundle without bloating IPA.

Updateable models (on-device personalization)

CoreML supports .mlmodel files marked updatable: you can call MLUpdateTask on-device to fine-tune with the user’s own data without ever leaving the device. The Photos app uses this to learn faces; Mail uses it for spam classification heuristics.

Setup is involved (must be designed updatable in the original model spec). Not the first feature to ship, but worth knowing exists.

Converting from PyTorch / TensorFlow

# coremltools (run on a Mac with Python)
import coremltools as ct
import torch

torch_model.eval()
example = torch.rand(1, 3, 224, 224)
traced = torch.jit.trace(torch_model, example)

mlmodel = ct.convert(
    traced,
    inputs=[ct.ImageType(shape=example.shape, scale=1/255.0)],
    classifier_config=ct.ClassifierConfig(labels=["cat", "dog", "other"]),
    convert_to="mlprogram",  # modern format
    minimum_deployment_target=ct.target.iOS17,
)
mlmodel.save("PetClassifier.mlmodel")

mlprogram is the modern CoreML format (replacing older .mlmodel NeuralNetwork spec) — better ANE compatibility, smaller, faster.

Foundation Models (iOS 18+)

In 2024 Apple introduced on-device foundation models (~3B parameter language model) accessible via the new Apple Intelligence APIs. Not strictly CoreML — a higher-level framework that wraps them. Covered briefly in Chapter 13.

In the wild

Apple Photos — face recognition, object classification, scene detection, OCR — all on-device CoreML.
Apple Mail — sender categorization, “important” flagging, spam scoring.
Visual Look Up — VNGenerateImageFeaturePrintRequest + a curated landmark database.
Pixelmator Pro, Photoshop on iPad — denoise, upscale, object selection use CoreML.
Shazam Kit — built on a CoreML audio fingerprint model.
Be My Eyes — accessibility app that pairs OCR + a vision LLM for blind users.

Common misconceptions

“CoreML is slow because it’s a phone.” A typical image classifier runs in 5-20ms on the Neural Engine of an A14 or newer. Often faster than a server round-trip.
“I need TensorFlow expertise to use Create ML.” Image, sound, action, tabular, and text classifiers can all be trained in Create ML with no code — drop folders, click train.
“My model is huge so I’ll just download it.” Apple’s Background Assets framework lets you ship a small bundle and download large ML payloads on first launch. Use it for models >50MB.
“CoreML can run any PyTorch model.” Most ops convert; some (custom CUDA kernels, certain dynamic shapes) don’t. Run ct.convert early to validate.
“All CoreML models run on the Neural Engine.” Only certain op patterns are ANE-compatible. Use Xcode’s Instruments → Core ML template to verify which units handle your model.

Seasoned engineer’s take

The team writing the model and the team shipping the app must talk constantly. Three lessons from production:

Define the inference contract before training. Input shape, normalization, output labels, expected latency budget. Models that drop in with no docs become “magic box that returns numbers.”
Build a fallback path. Models occasionally output garbage (low confidence). Always check topCandidates.first.confidence > threshold and degrade gracefully — “We couldn’t identify this image.”
Profile on the oldest supported device. A model that runs in 8ms on an iPhone 16 Pro might run in 40ms on an SE 3 — fine, but if it runs in 400ms on an XS, you have a problem.

For the Apple-flavored ML workflow, here’s the order of operations that always pays off:

Can a Vision built-in request do it? (OCR, face, body pose, animal pose, barcode.) If yes, use that.
Can NaturalLanguage do it? (sentiment, language ID, NER.) If yes, use that.
Can a Create ML built-in template do it? (image / text / sound / action classifier.) If yes, train in Create ML.
Only then drop to custom PyTorch/TF + coremltools.

TIP: Xcode 16 has a Preview tab for .mlmodel files. Drag test images in, see live predictions. Catches bad labels and broken preprocessing before you write Swift.

WARNING: Don’t ship a model trained on a dataset you don’t have rights to. App Review doesn’t check, but lawsuits do. Document model provenance in your repo.

Interview corner

Junior: “How do you classify a photo as cat/dog/other?”

Train an image classifier in Create ML by dropping labeled folders. Export the .mlmodel, drop into Xcode. Xcode generates a class. Wrap it in VNCoreMLModel, run VNCoreMLRequest via VNImageRequestHandler, read the top VNClassificationObservation.

Mid: “How would you ship a 200MB ML model without bloating the IPA?”

Use Apple’s Background Assets framework. Ship a stub model or no model at all; on first launch, fetch the full model from your CDN (or App Store-hosted asset pack), persist to Application Support, load with MLModel(contentsOf:). Quantize first — most 200MB models quantize to 50-80MB with FP16 and 25-40MB with INT8 at small accuracy cost. Set computeUnits to .cpuAndNeuralEngine to force the most battery-efficient path.

Senior: “Design an on-device personalization system: a recipe app learns the user’s cuisine preferences without sending data to a server.”

Ship an updatable CoreML model with isUpdatable = true flags on the final layers. When the user rates or saves a recipe, build training samples (recipe feature vector + user score) and accumulate to disk. Periodically (battery-permitting, charging-and-idle-only), run MLUpdateTask to fine-tune the model with the accumulated samples. Persist the updated model to Application Support. Cap stored training data (e.g., last 500 events) to bound disk and training time. Recommendations come from running inference on candidate recipes; the per-user model gives personalized scores. Privacy: never POST samples to a server. For backups, optionally allow opt-in iCloud sync of just the user’s model file (encrypted). For new-user cold start, ship a generic base model and graceful “We’re learning your preferences” UI for the first dozen interactions.

Red flag: “We just hit OpenAI’s API on every photo upload to classify it.”

Five wins for on-device CoreML: zero inference cost, zero latency variance, works offline, no privacy issues, no rate limits. A trained Create ML classifier nails this in a weekend with zero ongoing cost. Reaching for an LLM API for a 3-class image classifier is overengineering and budget waste.

Lab preview

CoreML doesn’t have a dedicated Phase 7 lab; the Lab 7.2 — Widget extension stretch goal includes “classify the most recent photo with a Create ML model and show its label on the widget” — a tight end-to-end demonstrating model load + Vision request + App Group caching.

Next: 7.9 — AppIntents & Shortcuts

The Swift iOS & macOS Engineer