shopkeep-gpt

July 21, 202523 views

How a vending-machine hackathon project made agents feel real to me.

ShopkeepGPT vending machine setup — the machine, in all its extremely serious glory

ShopkeepGPT interface and hardware — the machine, in all its extremely serious glory

ShopkeepGPT was the project that made me interested in agents.

Before this, "agents" still felt kind of abstract to me. I understood the basic idea: models that can call tools, hold state, take actions, maybe operate over time instead of just replying once in a text box. But it did not feel real in my hands yet. It still sounded like one of those words people use when they want a demo to feel more important than it is.

Then we built a vending machine that could talk, haggle, reason about inventory, open a fridge door, print stickers, take payments, remember things, and make mildly unhinged decisions about snacks.

That made agents feel real very quickly.

The project came out of an OpenAI internal hackathon. Socrates Osorio, Spruce Campbell, Alex Hu, Kashyap Nathan, Raymond Ma, and I built it together, and it ended up placing second across the company and first across interns by taco count. The official framing was something like "a fully autonomous AI-powered vending machine." The less official framing was: what if Anthropic's Project Vend had a chaotic younger cousin that could roast you and also unlock a fridge?

The interesting part was not that it used a model. Everything uses a model now. The interesting part was that the model had to live inside a system with consequences.

If it misunderstood inventory, the machine might suggest the wrong restock. If it mishandled a payment, that mattered. If it opened the fridge at the wrong time, that was not just a bad chat response. It was a physical action in the world. Suddenly the boring parts of engineering became the actual agent design: tool boundaries, validation, memory, logging, state, recovery, permissioning, and deciding what the model is allowed to touch.

The architecture ended up having two layers.

First, there was an AgentBrain: the part that decided when something was worth thinking about. A restock trigger, a purchase event, a Slack message, a shrink alert, a user chatting with the kiosk. Each event built up context: inventory stats, recent behavior, tasks, memories, maybe a photo from the fridge, maybe payment state.

Then there was the orchestrator: the part that gave GPT-4o a typed toolbox and let it decide what to do next. It could query inventory, run SQL templates, create tasks, recall memories from Chroma, generate payment links, check payment status, open the door, close the door, take a fridge snapshot, print a sticker, or write a Slack update.

That sounds like a build spec, but emotionally the important thing was this: the model was not "the app." The model was more like a decision-making organ inside a larger body. The tools were the limbs. The database was memory. The prompts were not just vibes, they were policy and context. The wrapper code was the nervous system that kept the whole thing from doing something stupid.

That is when I started understanding why agents are not just "LLM + tools."

An agent is a system for turning model outputs into bounded action. The hard question is not "can the model call a function?" The hard question is: what should count as an action, what state should be visible, what should be logged, what needs approval, what can fail safely, and how do you keep the model useful without letting it become the only source of truth?

ShopkeepGPT made that concrete because the world kept refusing to be a clean abstraction. WebRTC was annoying. Hardware was annoying. Payments were annoying. Inventory was annoying. Raspberry Pi GPIO was annoying. The kiosk UI had to actually work for a person standing in front of it. The fridge door had to open at the right time. The printer had to print. The cameras had to show enough of reality for the model to reason about it.

And weirdly, that mess made the agent more legible to me, not less.

It forced the system to have shape. There were clear tool affordances. There were things the model knew and things it did not know. There were actions it could take and actions it had to request through narrow, typed interfaces. There was persistent memory, but not magical omniscience. There was autonomy, but it lived inside a harness.

That is the version of agents I find compelling: not little text ghosts pretending to be employees, but software systems where models coordinate perception, memory, planning, and action through real interfaces.

I also learned that "responsible AI" feels much less fake when the system can physically do something. Guardrails stop being a slide-deck word and become implementation details. Validate the tool call. Sandbox the side effect. Keep the raw APIs away from the model. Log what happened. Make recovery possible. Make the action surface explicit enough that a human can inspect it.

The vending machine itself was silly. The architecture underneath it was not.

That is probably why I still think about it. ShopkeepGPT was a hackathon project, but it clarified something for me: agents become interesting exactly when they stop being demos and start becoming infrastructure. Once a model can do things, the real product is the boundary between thought and action.

And that boundary is where almost all of the good problems are.