Using State Machines With Realtime AI To Create Dynamic AI Applications That Change

We've all heard the pejorative "LLM Wrapper." It is a pejorative because it implies little-to-no work was done to create the application. That may or may not be true depending on the application, but what I will say is any application that you want to do more than simply call an LLM to generate a response needs a concept of a state machine.

I had a (very) short stint building an AI version of Instagram/Tinder (you swipe for friends, then it's like Instagram) (Revolutionary, I know), and in that process I learned a ton about integrating AI into applications.

Tasks LLMs Are Good At Vs. Tasks LLMs Are Bad At

AI is only good at simple tasks. Even very good models are only very good when asked to do very simple things. The AI might need a level of intelligence to do those tasks---for example solving a problem or providing advice---but the ask must be something a five-year-old would understand.

You may need the task to be difficult, have a decent breadth of knowledge, or even do some reasoning, but it must be simple.

Examples of good, simple, "hard" asks include: - Tell me how Carl Jung would feel about my continued interest in Goth Girls who mistreat me. - Give me a list of countries to visit with cuisines that won't hurt my tummy.

Examples of bad, complicated, "easy" asks: - Please count my reps on bicep curls and when I reach 10 reps, say "10", and then we'll do the same for lateral raises, and then the same for squats. - Help me follow this 12-step recipe to make cookies.

The difference between the hard asks that the AI can do and the easy asks that AI can't is that AI isn't good at being asked to do more than one thing at once.

With the right prompting and patience, you can sort of figure out how to trick it into a multi-step workflow, but if you've ever asked AI to run through a few if-else statements, you've seen just how common deviations are.

Solving The Problem Of LLM Hallucinations And Multi-step Workflows With State Machines

You already know what I think about LLMs because I just told you. I think they're kinda dumb.

As part of that app I was building, I had to create an onboarding flow.

Ask the user's age. If it's under 18, tell them they can't use the app.
After we have the age, get their first name, then their location, their background.
Each time we collect this information, we should move on to the next piece of info to collect, and we should let users go back and change things by simply saying something like: "Hey, my birthday is actually 2 years sooner, so I am totally 18 and not 16. Whoops, my bad."

We're essentially building a social media profile, but we're using a concierge-type chat interface to build it. Easy enough, right?

No. This was a nightmare to build.

LLMs are not good at these tasks.

Small LLMs are more than capable of performing each of these steps independently.
But as soon as I asked them to do 2--3 things in a sequence they would fall apart---especially when incorporating tool calls to save the user's information.
Large LLMs are slow and expensive, and I still wouldn't trust them to get this right even 90% of the time.

After a lot of trial and error (3 days in the cave), it became very clear that I'd need to simplify the asks:

Get the user's first and last name.
Save it.
When we've saved it, update the state to move on to location.
Rinse and repeat for each step.

This got me thinking about AI in general.

Maybe AI is intelligent in a way I didn't appreciate. Our brains are good at a lot of things, but they don't actually hold everything in context at once.

We have some models keeping us on task, some solving the task at hand, some remembering the next task, some recalling a previous step. Our brains have state, and they use multiple "models" at once.

We aren't GPT-5.

We are GPT-5 + a random 8b model to respond to things automatically + a STT model + a TTS model + a classifier model + a database + a state machine that goes between the different models and progresses us through tasks.

State Machines Are The Big Unlock To Building Dynamic AI Applications

You can't stick everything into a single prompt, or update a prompt through time.

Instead: - You stick a narrow set of tasks into a prompt.

You run them in parallel with other models that have other tasks.
You wrangle those outputs in application logic.

Example (onboarding flow): - One model eavesdrops on the conversation to extract the user's information and call tools.

A database stores the collected information.
A conversational model asks the questions.
A state machine updates the conversational model's prompt and tool calls as we move through the collection process.

When I approached the problem this way, the problem became trivial. Within 10 minutes I'd built the thing. 10 MINUTES!

state-machines-for-ai-onboarding-assistant

And I'm literally consuming fewer tokens than trying to prompt my way through this onboarding flow.

State Machines Are The Best Way To Build Real-time AI Applications

Maybe if I wasn't trying to do this in real time I could afford to use the smartest model and iterate on the prompt, but I'm basically going to die on this hill: you can't build real-time apps that way (and probably not non-real-time apps either).

If you want to try this:

Clone the repo
Copy the image or ping me for the JSON to upload to your Gabber
If you have questions, join the Discord or bug me on Twitter

Example State Machine JSON