Back to Home

Experimental Pattern

This is an emerging architecture pattern. Example apps coming soon as we explore this approach.

Edge-Native Architecture

AI on the device—instant, private, offline

What is it?

AI models run directly on user devices (phones, laptops, IoT) using on-device inference engines like WebGPU, Core ML, and ONNX Runtime. No network calls, no API keys—everything runs locally on the device.

This pattern enables zero-latency interactions with absolute privacy. The AI never leaves your device, making it ideal for sensitive applications, offline scenarios, and environments where connectivity is unreliable.

💡 Key Insight

"The fastest network request is the one you never make. Edge-native AI trades model size for zero latency and absolute privacy."

Tradeoffs

Advantages

  • Zero latency - no network round-trip
  • Absolute privacy - data never leaves device
  • Works offline completely
  • No API costs after model download
  • Better for regulated industries (HIPAA, GDPR)

Tradeoffs

  • Model size constraints (storage, memory)
  • Limited to simpler models (1-7B params typically)
  • Device capability variance (older devices struggle)
  • Initial download size can be large
  • Model updates require new downloads

Technical Deep Dive

Architecture

Edge-native AI leverages device-specific inference engines to run quantized models locally. Modern browsers and mobile platforms now support hardware-accelerated AI inference.

  • Models: Quantized/compressed LLMs (1-7B parameters, 4-bit or 8-bit)
  • Inference: WebGPU, WebNN, TensorFlow Lite, Core ML, ONNX Runtime
  • State: Local IndexedDB, SQLite on device
  • Sync: Optional background sync when online (for updates only)

When to Use This Pattern

  • Privacy is non-negotiable (healthcare, finance, legal)
  • Offline capability is critical (travel, remote work)
  • Latency must be under 100ms
  • User base willing to download larger apps (100MB+)
  • Compliance with data residency requirements

When NOT to Use This Pattern

  • Need latest/largest models (GPT-4, Claude-3 class)
  • Frequent model updates required
  • Users on low-end devices or limited storage
  • Need server-side tool use or API integrations

Example App Concepts

NotesWithOtto

Coming Soon

Local note-taking with AI suggestions (offline-first)

TranslateWithOtto

Coming Soon

Real-time translation using on-device models

HealthWithOtto

Coming Soon

Medical symptom checker with complete privacy

Want to explore other architecture patterns?

View All Patterns