Encoding the Constitution: Hardcoding Accountability into the Stack

How backend discipline prevents LLM hallucinations from costing real money

May 18, 2026

When you run customer-facing LLMs, especially a multi-model architecture like ours that uses Claude for streaming chat, Gemini for batch jobs, and ElevenLabs for voice ordering, your security surface explodes. Tool-use is fragile. Prompt injections are trivial. Hallucinations can cost real money. At BrewHub, we enforce one non-negotiable invariant: The backend is the single source of truth.

1. Bare-Metal Discipline: Trust Nothing from the Edge

This isn’t an abstract design preference; it’s an operational defense mechanism. We recently ran a full code health audit on our primary branch. To build this platform solo, I’ve deployed over 150 Netlify serverless handlers and 14 platform modules. If you move that fast without bare-metal discipline, you end up with an unmaintainable, duct-taped disaster. Instead, our audit showed near-zero TODO debt, rock-solid type safety, and tight module boundaries.

Take our ordering pipeline. When a customer uses our chat interface, the client transitions through the Next.js edge route /api/chat. If the LLM tool-use layer triggers a place_order request, our backend serverless execution layer (_pricing.js) completely bypasses whatever financial total the LLM passes in. It re-fetches the raw database records directly from Supabase (merch_products.price_cents and modifiers.price_delta_cents) and recalculates the subtotal from scratch. If the LLM-supplied total drifts from the server calculation by even a single penny, the transaction fails instantly.

The same rule applies to customer identity. We never read or trust a customer_id from a client-side payload or an LLM tool argument. User verification is resolved strictly via server-side JWT verification. The AI cannot be tricked into executing a tool action to snoop on a neighbor’s package status or transaction history because the backend refuses to acknowledge its input parameters as an identity source.

2. Software is Clean; the Physical World is Messy

BrewHub isn’t a digital-only playground; it’s going to be a brick-and-mortar storefront in Point Breeze, Philadelphia, dealing with real package volumes, fast-moving café queues, and local neighbors. Our code has to orchestrate three distinct physical hardware surfaces without friction:

The Parcel Floor: We run a dedicated /parcel-pos interface hooked into scanner-driven intake pipelines. It processes tracking-number OCR and logs every transaction to a strict chain-of-custody audit table (parcel_pickup_audit), requiring a recipient name match and the last 4 digits of the tracking code before a package can be released. Outbound drop-offs instantly calculate and print Shippo shipping labels right through a physical Square Terminal in a single flow.
Lobby Visibility: Inside the lobby, software is exposed to the public to build community visibility, not isolate it. Our /cafe-board displays order progress via an “AOL Buddy Queue” using retro online icons, while the /parcel-board runs a digital Solari split-flap simulation tracking inbound shipments. To protect neighbor privacy on open screens, we utilize database SECURITY DEFINER views that mask recipient data using temporal jitter.
The Financial Foundation: This entire physical operation is backed by structural reality. TJC Realty, LLC owns our Point Breeze building outright, debt-free. Because we have eliminated rent and commercial mortgage pressures from our operational cost structure, our software is free to do its job: scaling clean neighborhood utility rather than optimizing for aggressive financial extraction.

3. Mechanics of the Zero-Trust Boundary

We deploy seven specialized Python ADK agents on Google Cloud Run, hosting workflow components and internal server environments. These microservices manage everything from low-inventory triage for managers to narrating coffee roasting lots for origin storytelling.

To prevent buggy prompts or edge-case hallucinations from causing real-world financial errors, we hardcode a strict write posture directly into the infrastructure:

No agent moves money.
No agent releases physical packages or mutates CMRA/mailbox status.
No agent directly sends unverified customer communications.

We enforce these boundaries cryptographically. Every single request moving from our Next.js edge to the Python microservices must be signed with an HMAC-SHA256 signature via internal-hmac.ts and validated by hmac_auth.py on the container side. If a container gets a request lacking that verified cryptographic signature, the connection drops instantly.

When an agent needs to handle an operational task, like our Service Recovery layer resolving a customer grievance, the architecture forces a clear separation of concerns:

[Agent Action Initiated] ──► [Generates Draft Request] ──► Written to `manager_alerts` ──► [Human Authorization Required]

The model cannot unilaterally issue store credit or modify database states. It can only request a hardened server action by writing a structured draft to the database. A human operator must explicitly review and authorize the change within the /manager/* dashboard.

4. The Three-Layer Allergen Kill Switch

Safety boundaries aren’t soft guardrails in a system prompt. They’re hardcoded, out-of-band, completely independent of the LLM’s reasoning capability. When neighbors walk into a physical hub, automated ordering tools cannot be a health liability. Prompts can be circumvented, and system instructions can be bypass-tested. That’s why consumer safety cannot be left to an AI’s “discretion.”

If a user brings up allergies, dietary restrictions, or medical questions, the platform executes an absolute, hard refusal. Franklin will not discuss them, period. We enforce this across a distinct three-layer pipeline:

Pre-LLM Interception (lib/safety/allergen.py): Before a customer’s raw text stream is ever sent to the Anthropic or Gemini APIs, it is intercepted by a high-performance regex matching engine on our backend container. If an allergen keyword is triggered, the request is blocked before the LLM ever sees it, instantly returning a static ALLERGEN_SAFE_RESPONSE.
Mid-Stream Scrubbing (lib/chat/allergen-safety.ts): If an input somehow evades the pre-LLM layer, the outbound token stream is continuously monitored on the Next.js server side during generation. Any unexpected pattern match breaks the SSE stream immediately.
Immutable Auditing: Every single safety interception is logged to the franklin_safety_audit table, creating an unalterable trail of system compliance for long-term accountability.

The Reality of AI Skepticism

Skeptics will ask: “Doesn’t this over-engineer safety? Don’t you trust your prompts?” The answer is simple: trust is not a security model. Even a well-written prompt can be exploited or misled. We encode our constraints into infrastructure, not into the LLM’s instructions. That’s the difference between hoping the system is safe and knowing it is.

True code discipline isn’t about passing an audit to look good for a VC pitch deck; it’s about building an immutable operational architecture stable enough that you can confidently live next door to it. Line by line, we are proving that an ambitious tech stack can be used to protect a community, not exploit it.

Full architectural audit available in the BrewHub Systems code repository.

BrewHub Systems, Inc

Discussion about this post

Ready for more?