Becoming OpenAI-shaped without becoming OpenAI
Loomcycle grew an OpenAI-shaped front door this week. Three releases:
-
v0.11.0—POST /v1/_llm/chat. The loomcycle-native gateway. Used by the n8n Chat Model and LangChain consumers that bind to loomcycle directly. -
v0.11.3—POST /v1/chat/completions. OpenAI Chat Completions compatibility shim. Same engine, OpenAI wire format on the outside. -
v0.11.4—POST /v1/embeddings. OpenAI Embeddings compatibility shim. Drop-in for every RAG tool, vector DB, LangChainOpenAIEmbeddingsconsumer, and "use OpenAI embeddings" tutorial that exists on the internet.
The shape these endpoints take is the obvious one — if you've
configured the OpenAI SDK in your stack, change the
base_url, swap the API key, point it at your
loomcycle. Done.
The shape these endpoints don't take is the part worth writing about.
Why the native gateway came first
The natural assumption when you hear "loomcycle now does OpenAI Chat Completions" is that we wrote the OpenAI shim and then added a native endpoint on top. The order was the other way around, and the reason matters.
Loomcycle's first job is the agentic loop: tool calls,
MCP dispatch, hook chains, per-tenant fairness, prompt-cache
control. Those things live behind /v1/runs and
friends; they're stateful, multi-turn, opinionated about what
the model gets to see. None of that fits inside the OpenAI
Chat Completions request shape, which is fundamentally a
single-turn primitive over messages and tools.
But the loop has a hot inner step — "call this provider with these messages and tools, get back a response" — that is fundamentally a single-turn primitive over messages and tools. It's the thing every agent loop in the world does twenty times per run. And it turns out that the same security policy you want on the agent loop's inner step — provider routing via resolver, single auth surface (one n8n credential to all providers), retry, host allowlist, per-user quota tracking, audit logging — you also want on any direct LLM call that an upstream tool (LangChain, n8n's Chat Model, a custom workflow node, your own code) makes through loomcycle.
So the native gateway, POST /v1/_llm/chat, came
first. It exposes loomcycle's loop-step primitive over HTTP
with the loomcycle wire format. The compatibility shims sit
in front of it, translating OpenAI's wire format
into loomcycle's and back. The internal helper that does the
pre-dispatch work is shared:
// internal/api/http/gateway.go
// prepareGatewayDispatch performs validation, resolver
// pinning, semaphore acquisition, and providers.Request
// construction — everything that has to happen before the
// dispatch, and that's identical for the loomcycle-native
// and OpenAI-shaped front doors. The caller serves the
// returned dispatch handle in its own wire format.
func prepareGatewayDispatch(
ctx context.Context, req GatewayRequest,
) (*gatewayDispatch, error) { … }
handleLLMChat (the native handler) and
handleOpenAIChat (the OpenAI-shim handler) both
end up parse-then-delegate. Any future bug fix to the
pre-dispatch security path lands once, takes effect on both
front doors. The chat-completions shim refactor was about
twenty lines of net new code; the rest was the
prepareGatewayDispatch extraction.
What the shim translates
The OpenAI Chat Completions request shape has a lot of fields. The shim handles the ones that map cleanly to loomcycle primitives and ignores or rejects the ones that don't:
-
model— translated to a loomcycleprovider.modeltuple via the resolver. If the caller sends"gpt-4o", the resolver picks the bound OpenAI provider entry; if they send"claude-opus-4-7", it picks Anthropic; if they send a configured shorthand, that. The point: the caller doesn't have to know which provider to authenticate against, because loomcycle holds all the provider credentials. -
messages— passed through; OpenAI's message shape (role + content + optional name + optional tool_call_id + optional tool_calls) is structurally very close to what every other provider also expects, and loomcycle's adapter layer was already translating both ways. -
tools— passed through to the dispatcher; loomcycle's tool policy applies on top (the model can declare it wants to call a tool, but the per-agent policy and the host allowlist still gate the actual dispatch). -
stream— supported. SSE on the way out, OpenAI'sdata: { … }frame format, including the[DONE]sentinel. -
tool_choice— supported with an OpenAI-to-loomcycle translation; loomcycle's underlying primitive is the same shape. -
temperature,top_p,max_tokens,stop,seed— passed through to the provider as supported. Providers that don't support a parameter (e.g.,seedon a non-OpenAI backend) silently drop it, same as the OpenAI SDK would when used against an alternative.
What the shim deliberately drops
Three things the shim does not pretend to support:
Function-call schemas in the legacy
functions field. OpenAI deprecated this
in favour of tools two years ago. The shim
accepts tools, rejects functions
with a clear error. There's no value in carrying two ways
to do the same thing.
The logprobs response field.
Providers that aren't OpenAI mostly don't have a comparable
surface; the providers that do have one don't agree on the
shape. We'd rather return an honest "not supported" than a
lossy translation that quietly silently drops or misrepresents.
Provider-specific OpenAI fields.
response_format: { type: "json_object" } is a
good example — OpenAI maps it to a specific server-side
grammar-constraint mechanism. The closest analogue in
Anthropic is a system-prompt instruction; in Ollama it's
a grammar file; for OpenAI itself there's the constrained
generation mode. The shim passes the field through; what
happens with it is provider-dependent. We don't normalize.
What the shim adds (that OpenAI doesn't)
The interesting direction. Even if you're using the OpenAI-shaped front door, you get loomcycle's policy layer for free:
-
Per-user quota. The bearer token in the
Authorizationheader identifies a tenant. Loomcycle tracks tokens-in and tokens-out against the tenant's budget. When you hit the limit, you get a429with a quota-exhausted body; the caller retries or fails. No tenant can consume another's budget, regardless of which model they're calling. -
Resolver pin precedence. The deployment
can declare "all
gpt-4ocalls route to our self-hosted vLLM serving Llama 3.1, transparently translated". The caller's code doesn't know. New deployments of well-meaning OpenAI-defaulting SDKs work unchanged; the resolver redirects. -
Single audit log. Every call through the
gateway, native or shim, lands in the same audit log
row with the same fields. If someone has to ask "did this
agent call
gpt-4oon this customer's data at this time" three months from now, there's one place to look. - Host allowlist enforcement. If your loomcycle is configured to call OpenAI only on a specific proxy host, every call through the shim respects that — not because the OpenAI SDK knows about it, but because loomcycle does.
-
OpenTelemetry traces across the loop.
Distributed trace IDs propagate from the inbound call,
through the gateway, through the provider call, and back
out.
v0.10.0wired OTel through loop, providers, tools, and MCP; the shim picks the same spans up for free.
Embeddings: simpler, narrower, just as useful
POST /v1/embeddings is the simpler of the two
shims. No resolver path, no tier overlay, no streaming, no
tool routing. Just take an array of input strings, dispatch
to the single configured providers.Embedder,
and return the OpenAI-shaped response.
The interesting bit is who the providers.Embedder
is: it's the same instance the loomcycle Memory tool
uses internally when you set embed: true on a
memory write. The Memory tool went through the same provider
family last week (v0.9.0, Vector Memory — semantic search on
the Memory tool); the embedder is bound to the runtime once,
and both the inbound HTTP shim and the inner-loop Memory
tool share it. One model, one provider account, one quota,
one audit trail.
Every RAG tool we've tried — LangChain's
OpenAIEmbeddings, llama-index's
OpenAIEmbedding, the various vector-DB
embedders that default to OpenAI — works against the shim
by changing only the base URL and the auth token. The
cottage industry of "use OpenAI embeddings" tutorials
becomes a cottage industry of "use whatever loomcycle is
configured to embed with" tutorials, without anyone having
to rewrite the tutorials.
The sneaky benefit: consumers can switch embedding backends — Voyage, Cohere, OpenAI text-embedding-3, a self-hosted nomic-embed — by changing one line of loomcycle config, and nothing in the consumer code needs to know. The "OpenAI shape" becomes a stable contract; the actual model becomes an operator decision.
When to use which front door
Pragmatic split:
-
Use the native
/v1/_llm/chatgateway if you're writing loomcycle-aware code from scratch, integrating with the@loomcycle/clientadapter, or building something where you control the wire format. You get the loomcycle response shape, which is closer to the agentic primitive you actually want. The n8n Chat Model uses this. -
Use the
/v1/chat/completionsshim if you're plugging in a tool that expects OpenAI's wire format — LangChain, the OpenAI SDK, third-party tools, anyone else's code. The shim is a compatibility surface; you trade a small amount of translation overhead for not rewriting any consumer code. -
Use the
/v1/embeddingsshim whenever you need vector embeddings outside of an agent loop. Inside an agent loop, the Memory tool'sembed: trueis the more idiomatic surface (it stores the vector for you and indexes it for semantic search). The HTTP shim is for embedders that aren't part of a loop — bulk ingestion jobs, RAG ingestion pipelines, vector-DB writes.
The same security policy applies to all three. Any future
hardening — better quota enforcement, smarter retries,
additional audit fields — lands in
prepareGatewayDispatch and shows up on every
front door simultaneously.
What this unlocks
The honest reason we built the LLM Gateway, in the order we
built it, is that the next piece of work needed both shims
and the dispatch helper. The next piece is the
@loomcycle/n8n-nodes-loomcycle package — a
collection of n8n nodes that lets workflow authors put a
loomcycle in front of their n8n agents and get the policy
layer, multi-tenant fairness, OTel traces, MCP tools, and
everything else for free.
n8n's Tools Agent uses LangChain's BaseChatModel
under the hood. To plug into it, you implement a Chat Model
sub-node — which the n8n package now does, against
/v1/_llm/chat. Nothing the workflow author writes
needs to change. The model selector, the tool wiring, the
system message — all of it stays the way n8n's Tools Agent
already expects. The only thing that changes is which
gateway is on the other end of the wire.
That post is up next: What
it took to make loomcycle a first-class n8n citizen.
Three releases of @loomcycle/n8n-nodes-loomcycle
in two days; one LangChain
@langchain/core/messages/ai.js:178 rejection
trail; a defence-in-depth synthetic tool-call-id story
that took longer to debug than the original integration.
Companion writeups: When the agent is in one container and its definition is in another (the substrate that lets policy and agent defs cross deployment boundaries), and Scrubbing the model's incoming mail (the content-scrubber PostTool hook that lives on top of the same hook contract any OpenAI-shim caller can plug into).