§ architecture note

Becoming OpenAI-shaped without becoming OpenAI

2026-05-24 · by Dennis Gubsky · ~6 min read

Loomcycle grew an OpenAI-shaped front door this week. Three releases:

v0.11.0 - POST /v1/_llm/chat. The loomcycle-native gateway. Used by the n8n Chat Model and LangChain consumers that bind to loomcycle directly.
v0.11.3 - POST /v1/chat/completions. OpenAI Chat Completions compatibility shim. Same engine, OpenAI wire format on the outside.
v0.11.4 - POST /v1/embeddings. OpenAI Embeddings compatibility shim. Drop-in for every RAG tool, vector DB, LangChain OpenAIEmbeddings consumer, and "use OpenAI embeddings" tutorial that exists on the internet.

The shape these endpoints take is the obvious one - if you've configured the OpenAI SDK in your stack, change the base_url, swap the API key, point it at your loomcycle. Done.

The shape these endpoints don't take is the part worth writing about.

Why the native gateway came first

The natural assumption when you hear "loomcycle now does OpenAI Chat Completions" is that we wrote the OpenAI shim and then added a native endpoint on top. The order was the other way around, and the reason matters.

Loomcycle's first job is the agentic loop: tool calls, MCP dispatch, hook chains, per-tenant fairness, prompt-cache control. Those things live behind /v1/runs and friends; they're stateful, multi-turn, opinionated about what the model gets to see. None of that fits inside the OpenAI Chat Completions request shape, which is fundamentally a single-turn primitive over messages and tools.

But the loop has a hot inner step - "call this provider with these messages and tools, get back a response" - that is fundamentally a single-turn primitive over messages and tools. It's the thing every agent loop in the world does twenty times per run. And it turns out that the same security policy you want on the agent loop's inner step - provider routing via resolver, single auth surface (one n8n credential to all providers), retry, host allowlist, per-user quota tracking, audit logging - you also want on any direct LLM call that an upstream tool (LangChain, n8n's Chat Model, a custom workflow node, your own code) makes through loomcycle.

So the native gateway, POST /v1/_llm/chat, came first. It exposes loomcycle's loop-step primitive over HTTP with the loomcycle wire format. The compatibility shims sit in front of it, translating OpenAI's wire format into loomcycle's and back. The internal helper that does the pre-dispatch work is shared:

// internal/api/http/gateway.go

// prepareGatewayDispatch performs validation, resolver
// pinning, semaphore acquisition, and providers.Request
// construction - everything that has to happen before the
// dispatch, and that's identical for the loomcycle-native
// and OpenAI-shaped front doors. The caller serves the
// returned dispatch handle in its own wire format.
func prepareGatewayDispatch(
    ctx context.Context, req GatewayRequest,
) (*gatewayDispatch, error) { … }

handleLLMChat (the native handler) and handleOpenAIChat (the OpenAI-shim handler) both end up parse-then-delegate. Any future bug fix to the pre-dispatch security path lands once, takes effect on both front doors. The chat-completions shim refactor was about twenty lines of net new code; the rest was the prepareGatewayDispatch extraction.

What the shim translates

The OpenAI Chat Completions request shape has a lot of fields. The shim handles the ones that map cleanly to loomcycle primitives and ignores or rejects the ones that don't:

model - translated to a loomcycle provider.model tuple via the resolver. If the caller sends "gpt-4o", the resolver picks the bound OpenAI provider entry; if they send "claude-opus-4-7", it picks Anthropic; if they send a configured shorthand, that. The point: the caller doesn't have to know which provider to authenticate against, because loomcycle holds all the provider credentials.
messages - passed through; OpenAI's message shape (role + content + optional name + optional tool_call_id + optional tool_calls) is structurally very close to what every other provider also expects, and loomcycle's adapter layer was already translating both ways.
tools - passed through to the dispatcher; loomcycle's tool policy applies on top (the model can declare it wants to call a tool, but the per-agent policy and the host allowlist still gate the actual dispatch).
stream - supported. SSE on the way out, OpenAI's data: { … } frame format, including the [DONE] sentinel.
tool_choice - supported with an OpenAI-to-loomcycle translation; loomcycle's underlying primitive is the same shape.
temperature, top_p, max_tokens, stop, seed - passed through to the provider as supported. Providers that don't support a parameter (e.g., seed on a non-OpenAI backend) silently drop it, same as the OpenAI SDK would when used against an alternative.

What the shim deliberately drops

Three things the shim does not pretend to support:

Function-call schemas in the legacy functions field. OpenAI deprecated this in favour of tools two years ago. The shim accepts tools, rejects functions with a clear error. There's no value in carrying two ways to do the same thing.

The logprobs response field. Providers that aren't OpenAI mostly don't have a comparable surface; the providers that do have one don't agree on the shape. We'd rather return an honest "not supported" than a lossy translation that quietly silently drops or misrepresents.

Provider-specific OpenAI fields. response_format: { type: "json_object" } is a good example - OpenAI maps it to a specific server-side grammar-constraint mechanism. The closest analogue in Anthropic is a system-prompt instruction; in Ollama it's a grammar file; for OpenAI itself there's the constrained generation mode. The shim passes the field through; what happens with it is provider-dependent. We don't normalize.

What the shim adds (that OpenAI doesn't)

The interesting direction. Even if you're using the OpenAI-shaped front door, you get loomcycle's policy layer for free:

Per-user quota. The bearer token in the Authorization header identifies a tenant. Loomcycle tracks tokens-in and tokens-out against the tenant's budget. When you hit the limit, you get a 429 with a quota-exhausted body; the caller retries or fails. No tenant can consume another's budget, regardless of which model they're calling.
Resolver pin precedence. The deployment can declare "all gpt-4o calls route to our self-hosted vLLM serving Llama 3.1, transparently translated". The caller's code doesn't know. New deployments of well-meaning OpenAI-defaulting SDKs work unchanged; the resolver redirects.
Single audit log. Every call through the gateway, native or shim, lands in the same audit log row with the same fields. If someone has to ask "did this agent call gpt-4o on this customer's data at this time" three months from now, there's one place to look.
Host allowlist enforcement. If your loomcycle is configured to call OpenAI only on a specific proxy host, every call through the shim respects that - not because the OpenAI SDK knows about it, but because loomcycle does.
OpenTelemetry traces across the loop. Distributed trace IDs propagate from the inbound call, through the gateway, through the provider call, and back out. v0.10.0 wired OTel through loop, providers, tools, and MCP; the shim picks the same spans up for free.

Embeddings: simpler, narrower, just as useful

POST /v1/embeddings is the simpler of the two shims. No resolver path, no tier overlay, no streaming, no tool routing. Just take an array of input strings, dispatch to the single configured providers.Embedder, and return the OpenAI-shaped response.

The interesting bit is who the providers.Embedder is: it's the same instance the loomcycle Memory tool uses internally when you set embed: true on a memory write. The Memory tool went through the same provider family last week (v0.9.0, Vector Memory - semantic search on the Memory tool); the embedder is bound to the runtime once, and both the inbound HTTP shim and the inner-loop Memory tool share it. One model, one provider account, one quota, one audit trail.

Every RAG tool we've tried - LangChain's OpenAIEmbeddings, llama-index's OpenAIEmbedding, the various vector-DB embedders that default to OpenAI - works against the shim by changing only the base URL and the auth token. The cottage industry of "use OpenAI embeddings" tutorials becomes a cottage industry of "use whatever loomcycle is configured to embed with" tutorials, without anyone having to rewrite the tutorials.

The sneaky benefit: consumers can switch embedding backends - Voyage, Cohere, OpenAI text-embedding-3, a self-hosted nomic-embed - by changing one line of loomcycle config, and nothing in the consumer code needs to know. The "OpenAI shape" becomes a stable contract; the actual model becomes an operator decision.

When to use which front door

Pragmatic split:

Use the native /v1/_llm/chat gateway if you're writing loomcycle-aware code from scratch, integrating with the @loomcycle/client adapter, or building something where you control the wire format. You get the loomcycle response shape, which is closer to the agentic primitive you actually want. The n8n Chat Model uses this.
Use the /v1/chat/completions shim if you're plugging in a tool that expects OpenAI's wire format - LangChain, the OpenAI SDK, third-party tools, anyone else's code. The shim is a compatibility surface; you trade a small amount of translation overhead for not rewriting any consumer code.
Use the /v1/embeddings shim whenever you need vector embeddings outside of an agent loop. Inside an agent loop, the Memory tool's embed: true is the more idiomatic surface (it stores the vector for you and indexes it for semantic search). The HTTP shim is for embedders that aren't part of a loop - bulk ingestion jobs, RAG ingestion pipelines, vector-DB writes.

The same security policy applies to all three. Any future hardening - better quota enforcement, smarter retries, additional audit fields - lands in prepareGatewayDispatch and shows up on every front door simultaneously.

What this unlocks

The honest reason we built the LLM Gateway, in the order we built it, is that the next piece of work needed both shims and the dispatch helper. The next piece is the @loomcycle/n8n-nodes-loomcycle package - a collection of n8n nodes that lets workflow authors put a loomcycle in front of their n8n agents and get the policy layer, multi-tenant fairness, OTel traces, MCP tools, and everything else for free.

n8n's Tools Agent uses LangChain's BaseChatModel under the hood. To plug into it, you implement a Chat Model sub-node - which the n8n package now does, against /v1/_llm/chat. Nothing the workflow author writes needs to change. The model selector, the tool wiring, the system message - all of it stays the way n8n's Tools Agent already expects. The only thing that changes is which gateway is on the other end of the wire.

That post is up next: What it took to make loomcycle a first-class n8n citizen. Three releases of @loomcycle/n8n-nodes-loomcycle in two days; one LangChain @langchain/core/messages/ai.js:178 rejection trail; a defence-in-depth synthetic tool-call-id story that took longer to debug than the original integration.

Companion writeups: When the agent is in one container and its definition is in another (the substrate that lets policy and agent defs cross deployment boundaries), and Scrubbing the model's incoming mail (the content-scrubber PostTool hook that lives on top of the same hook contract any OpenAI-shim caller can plug into).