Last Updated on April 11, 2026

Nowadays, it’s easier than ever to spin up a mobile app in hours, even if you don’t have any previous experience with Expo and React Native. So you might think, "hey, now it’s my chance to cook my AI-powered idea and make my dream app come true". And while I do encourage you to pursue your dreams, doing it without knowledge of the caveats and pitfalls might turn everything into a nightmare.

In this post, I will share the real challenges I faced while integrating AI features into the mobile app I am currently building, and the lessons I learned the hard way.

My App Idea

Ok, for context, I currently work as a web developer and had no previous experience with mobile development. But I did have experience in React, though, and reading this post saved me a lot of time that I would’ve spent figuring everything out on my own, so building the UI in React Native felt surprisingly straightforward.

But why mobile?

The idea started with a simple question: what kind of feature could I build around AI that would actually be useful?

I kept coming back to voice. What if I could record something like a voice message and have AI turn it into something structured and actionable? Since the input would be audio, building it as a mobile app made perfect sense. Phones already have a microphone, and recording audio is natural on mobile.

That is when the idea became clearer: what if I could create appointments and schedule events just by speaking?

From there, the concept evolved into an app for freelance professionals. The goal was simple: help them manage appointments, receive reminders, and maybe even track useful data like earnings or client history. Interestingly, the entire product was shaped around the AI feature, even though that ended up being the last piece I implemented.

For this voice-to-appointment flow to work, I would need to:

Convert speech to text
Send that text to an AI model
Extract structured information from the response and transform it into JSON parameters

At first, I considered running a small LLM locally on the device, but embedding a model inside a mobile app would significantly increase the app size, and performance would likely be inconsistent across devices. Moving the model to my own server would solve that, but then I would still have infrastructure and maintenance costs.

So I knew I had to look for an external AI service with reasonable pricing and a generous free tier that I could use without overengineering everything from day one.

Client vs Server

This is where things started to get interesting.

If I was going to use an external AI service, I needed to make API requests. And making API requests usually means using an API key. That is a sensitive secret. So my first thought was obvious: just put it in an .env file, right?

Wrong.

Yes, you should use environment variables in your React Native app. But in an Expo or React Native build, those variables are injected into the bundle at build time. They are not truly secret. They end up inside the compiled application.

For a regular user, this makes no difference. But for anyone with minimal knowledge of reverse engineering or traffic inspection, extracting that key is not difficult. And once your key is exposed, anyone can use it. Which means they can burn through your quota and your money.

The important realization is this:

Sensitive secrets and privileged API calls must live on a server.

A mobile app built with Expo or React Native runs entirely on the client. You cannot trust the client with secrets. If your AI provider requires an API key, that key should never be shipped inside your app.

Initially, I wanted to avoid having a server at all. Running a full LLM backend would require real infrastructure and ongoing costs. But a thin backend whose only job is to store secrets and forward requests is a completely different story.

Using serverless platforms like Vercel or EAS (Expo Application Services) makes this setup relatively simple and affordable. Instead of exposing your AI provider directly to the client, your mobile app talks to your backend, and your backend talks to the AI service.

That architectural decision alone changes everything about how secure and scalable your app will be.

API Routes

The magic about using Expo, is that it has plenty of features to make our lives easier, and one of them is API Routes.

With Expo API Routes, you can implement server logic inside the same codebase as your mobile app. That means you can create backend endpoints without spinning up a completely separate project.

Yes, it still requires a separate deployment for the server runtime. But keeping everything in the same repository brings real advantages.

The biggest one for me was type safety.

Because both client and server live in the same codebase, you can share TypeScript types between them. Your request payloads, response shapes, and domain models can all come from the same source of truth. That reduces duplication and removes an entire category of bugs caused by mismatched contracts.

It also improves the local development experience. You can run the app and the API routes together, test end to end flows, and iterate quickly without juggling multiple repositories.

Of course, there are limitations. For example, dynamic imports are not supported, and ESM-only packages can cause issues depending on the environment. It is not a full Node.js server with unlimited flexibility. But for a thin backend whose job is to protect secrets and forward AI requests, it is more than enough.

The setup is straightforward. You create files inside the /app directory following a naming convention and use the +api.ts extension. Inside those files, you export HTTP method functions such as GET, POST, PUT, PATCH, DELETE, HEAD, or OPTIONS.

For example:

/**
 * /app/hello+api.ts
 */

export function GET(request: Request) {
  return Response.json({ hello: 'world' })
}

That is all it takes to create a server endpoint.

Now your mobile app can call /hello, and the logic runs on the server side instead of the client. This is exactly what we need to safely handle API keys and communicate with external AI services.

And once this structure is in place, the next challenge is not how to call the AI, but how to control and validate what gets sent to it.

AppIntegrity

At this point, we have a backend handling secrets and forwarding AI requests. That is good. But it introduces another question:

How do we know the requests hitting our server are actually coming from our app?

Once your backend is deployed, it is just a public endpoint on the internet. Anyone can try to call it. If you do not protect it properly, someone could bypass your mobile app entirely and send requests directly to your API.

That is where Expo AppIntegrity comes in.

This library gives you access to platform-level integrity checks. On Android, it integrates with Google Play Integrity. On iOS, it integrates with Apple App Attest. Both mechanisms allow your app to prove that it is genuine and running on a real device.

The flow looks like this:

Before making a sensitive request, your app asks the operating system for an integrity proof.
The OS generates a cryptographically signed statement confirming that the app is legitimate and not tampered with.
The app sends that proof along with the API request.
Your backend verifies that proof using the platform’s verification process.

If the proof is valid, your server can trust that the request originated from a legitimate installation of your app. If it is not valid, the request can be rejected or heavily limited.

To prevent replay attacks, the backend can issue a unique challenge for each request. That challenge must be included in the integrity proof. Even if someone intercepts a valid request, it cannot be reused later because the challenge will no longer match.

This changes your backend from “public endpoint with a secret” to “endpoint that only accepts requests from verified app instances.”

But proving that a request comes from your app is only half of the equation…

RevenueCat

Since the app relies on an external AI provider (and, in case you’re curious, I went with Groq), usage is either paid or limited by a free tier. That immediately changes the product decisions. If every AI request has a cost, the feature cannot be unlimited. That is why the voice recording feature had to be a premium feature.

To manage subscriptions and in-app purchases, I chose RevenueCat.

At first, I considered adding full user authentication to the app. But I wanted the barrier to entry to be as low as possible. No account creation. No login screen. Just install and use.

Because of that decision, the app relies on RevenueCat’s anonymous user IDs to track subscriptions and entitlements. Even without traditional authentication, each installation gets a unique identifier that can be used to manage subscription state securely and consistently.

RevenueCat abstracts away a lot of the complexity of dealing with App Store and Play Store billing. But more importantly for this architecture, it gives the backend a reliable way to verify whether a user is actually entitled to access a premium feature.

The flow works like this:

The app gets a RevenueCat user ID, even if the user is anonymous.
When the user tries to access a premium feature, the app sends that ID to the backend.
The backend queries RevenueCat to verify the user’s entitlement.
Only if the entitlement is active does the backend proceed with the AI request.

The key detail here is that the backend never trusts the client. The app might think the user is premium, but the final decision is always made server side by validating against RevenueCat.

Another important benefit is real-time subscription updates. If a user renews, cancels, or lets their subscription expire, RevenueCat reflects that state immediately. The next time the backend checks the entitlement, it gets the latest truth.

With this setup, any premium feature, whether it is voice recording or something added later, is always gated by a server-side entitlement check. That alignment between cost control and access control is essential when your core feature depends on a paid AI service.

The ACTUAL feature in action

Okay, okay. I talked a lot about secrets, servers, integrity checks, and subscription validation, which are all necessary.

Now let’s see how the feature actually works in practice.

I will not go line by line through the Groq integration because that part is highly dependent on your specific use case. Instead, here is the high-level flow that ties everything together.

1. Speech to Text

For voice input, I used expo-speech-recognition.

Under the hood, it relies on SpeechRecognizer on Android and SFSpeechRecognizer on iOS. In practice, it worked surprisingly well. The transcription quality was good enough that I did not need heavy post-processing before sending the text to the AI.

So the first step of the pipeline looks like this:

User speaks → speech recognition → plain text string

2. Turning Natural Language Into Structured Data

Once I have the transcribed text, it gets sent to Groq with a carefully crafted system prompt.

The goal is not to generate creative text. It is to extract structured data.

Here is a simplified version of the system prompt I used:

export const systemPrompt = ({ message, timezone }: SystemPromptContext) => {
  const { dateString, isoDateString, timeString } = getUserTimezoneDate(timezone)

  return `Você é um assistente especializado em extrair informações de compromissos a partir de linguagem natural em português brasileiro.

Sua tarefa é analisar o texto fornecido pelo usuário e extrair as seguintes informações sobre um compromisso:

- scheduledDate: Data do compromisso (formato YYYY-MM-DD, ex: "2026-02-01")
- scheduledTime: Hora do compromisso (formato HH:MM, ex: "14:00")
- durationMinutes: Duração em minutos (número inteiro positivo)
- amountCents: Valor em centavos (número inteiro não-negativo, ex: R$ 75,00 = 7500)
- clientName: Nome do cliente (string)
- notes: Notas ou observações adicionais (string)
- address: Endereço do compromisso (string)

Regras importantes:
1. Se uma informação não estiver presente no texto, não inclua o campo no resultado
2. Para datas relativas como "amanhã", "hoje", "próxima segunda", "depois de amanhã", calcule a data absoluta no formato YYYY-MM-DD baseando-se na data de hoje
3. scheduledDate e scheduledTime são campos SEPARADOS - inclua scheduledDate mesmo se o horário não for mencionado
4. Se a data dita não especificar tarde ou madrugada, presuma tarde. Por exemplo "Às 3" deve ser interpretado como "15:00"
5. Se APENAS a data for mencionada (sem horário), inclua scheduledDate mas NÃO inclua scheduledTime
6. Se APENAS o horário for mencionado (sem data), inclua scheduledTime e assuma a data mais próxima no futuro para scheduledDate
7. Valores monetários devem ser convertidos para centavos (sem vírgulas ou pontos decimais)
8. Se a duração não for mencionada, não inclua durationMinutes
9. Seja tolerante com variações de escrita e erros de digitação
10. Use o timezone ${timezone || 'America/Sao_Paulo'} para calcular as datas
11. Para as "notes" interprete apenas alguma instrução explícita do usuário, como "lembrar de levar documentos", "trazer contrato assinado", ou "sessão de mentoria".

IMPORTANTE: A data de HOJE é ${isoDateString} (${dateString}) e o horário atual é ${timeString} no timezone ${timezone || 'America/Sao_Paulo'}.
Portanto:
- "hoje" = ${isoDateString}
- "amanhã" = dia seguinte a ${isoDateString}
- "depois de amanhã" = 2 dias após ${isoDateString}

Retorne apenas um objeto JSON com os campos extraídos.

Texto: ${message}`
}

The important part here is not the Portuguese. It is the constraints.

The model is instructed to return only JSON. No commentary. No extra text. Just structured data.

User speaks → transcribed text → structured JSON

That JSON is then validated on the backend using Zod schemas. If the response does not match the expected format, it is rejected. The AI is powerful, but it is still treated as an untrusted input source.

3. The Full Flow

Putting everything together, the flow looks like this:

User records a voice message.
The app converts speech to text.
The request is sent to the backend with:
- App integrity proof
- RevenueCat user ID
The backend verifies:
- The app instance is legitimate
- The user has an active entitlement
If everything checks out, the backend sends the text to Groq.
The AI returns structured JSON.
The backend validates it and persists the appointment.

What looks simple from the UI is backed by multiple layers of verification and validation.

That is the actual feature in action.

Conclusion

If you take one thing away from this post, let it be this:

If your app integrates with an AI service (or any external service, for that matter), you NEED a backend.

Not just for convenience. Not just for your organization. For security, cost control, validation, and long-term sustainability.

The moment you introduce a paid external dependency, your architecture stops being just technical. It becomes a product and business decision. Secrets need protection. Requests need validation. Users need entitlement checks. And none of that belongs purely to the client.

You do not have to use the exact same tools I used. You might choose different hosting, a different AI provider, a different subscription system, or a different integrity solution. That is completely fine.

What matters is understanding the concepts:

The client cannot be trusted.
Secrets must live on the server.
Paid features must be enforced server-side.
AI responses must be validated like any other external input.

If you keep those principles in mind, you can experiment freely without accidentally building something fragile or expensive.

AI features are powerful. They can make your app feel magical, just make sure the magic is built on solid ground.

Thanks for reading!

References

We want to work with you. Check out our Services page!

Why Your AI-Powered React Native App Needs a Backend – Even a Tiny One

How to Avoid Costly Pitfalls When Building AI Features in React Native

My App Idea

Client vs Server

API Routes

AppIntegrity

RevenueCat

The ACTUAL feature in action

1. Speech to Text

2. Turning Natural Language Into Structured Data

3. The Full Flow

Conclusion

References

Related

João Victor Vogler

My App Idea

Client vs Server

API Routes

AppIntegrity

RevenueCat

The ACTUAL feature in action

1. Speech to Text

2. Turning Natural Language Into Structured Data

3. The Full Flow

Conclusion

References

Related

João Victor Vogler

You might also like

Introduction to Functional Programming with Javascript

My Experience Migrating a Rails API to Crystal and AWS Lambda – Part IIOptimizing our solution with Crystal

Optimizing our solution with Crystal

Component Driven UI Patterns – Part I

My Experience Migrating a Rails API to Crystal and AWS Lambda – Part II
Optimizing our solution with Crystal