Running AI models locally on your phone isn’t just a technical choice — it’s a privacy statement.

When we started building Celest Oracle, we had a decision to make: send user data to the cloud for AI processing, or figure out how to run everything on-device. We chose the harder path, and we’d do it again in a heartbeat.

The Privacy Argument

Every time you send a query to a cloud AI service, you’re trusting that company with your data. For a tarot app — where people ask deeply personal questions about love, career, and life decisions — that felt wrong.

With on-device AI, your questions never leave your phone. Your readings are yours alone. There’s no server logging your most vulnerable moments.

The Technical Reality

Running Gemma 3 (270M parameters) on a mobile device is no joke. We learned a lot about:

  • Model optimization: Quantization, pruning, and careful memory management
  • MediaPipe LLM Inference API: Google’s framework for on-device inference
  • Temperature tuning: 1.0 works best for Gemma 3 — lower values cause repetition loops
  • Output cleanup: A 7-step pipeline to clean up small model artifacts

Was It Worth It?

Absolutely. Our users get instant, private AI readings with zero internet dependency. The model runs in under 30 seconds on most modern phones. And we sleep better knowing we’re not sitting on a database of people’s deepest questions.

On-device AI isn’t just the future — for privacy-sensitive applications, it’s the only ethical choice.