Skip to content

AI Gateway

AI Gateway provides a unified, OpenAI-compatible endpoint for accessing a range of large language models through a single API. The platform manages all model credentials, so you don't need to bring your own API keys. Every request is authenticated against your workspace's auth, and each gateway is isolated to its own workspace-scoped endpoint.

How it works

  • One endpoint, many models. Send standard OpenAI-style requests (/v1/chat/completions, /v1/responses, /v1/embeddings, …) and choose a model by setting the model field. The gateway routes the request to the selected model and handles protocol translation for you.
  • Platform-managed credentials. All model credentials are managed by the platform — you never handle API keys.
  • Mandatory authentication. Every request must carry a valid application user token, resolved against the auth namespace you configure.
  • Per-workspace isolation. Each gateway is provisioned with its own URL and its own usage tracking and rate limits.

Configuration

Define an AI Gateway in your tailor.config.ts with defineAIGateway(). At minimum you provide a name and the auth namespace used to authenticate requests:

typescript
import { defineAIGateway, defineConfig } from "@tailor-platform/sdk";

const aiGateway = defineAIGateway("my-aigateway", {
  authNamespace: "default",
});

export default defineConfig({
  name: "my-app",
  aiGateways: [aiGateway],
});

After deploying, the gateway is reachable at a workspace-scoped URL of the form:

https://{gateway-name}-{workspace-hash}.ai.erp.dev

See the SDK reference for all options, including CORS configuration and type-safe URL references.

Supported models

Specify the model with the model field in the request body. The following models are available:

ModelTypeLocation
gpt-5ChatRegional
gpt-5-miniChatRegional
gpt-5-nanoChatRegional
gpt-4.1ChatRegional
gpt-4o-miniChatRegional
gemini-2.5-proChatRegional
gemini-2.5-flashChatRegional
gemini-2.5-flash-liteChatGlobal
gemini-3.5-flashChatGlobal
text-embedding-3-largeEmbeddingRegional
text-embedding-3-smallEmbeddingRegional
gemini-embedding-001EmbeddingGlobal

A model's Type determines which endpoints it can be used with: Chat models are available through /v1/chat/completions and /v1/responses, and Embedding models through /v1/embeddings. If the model value does not match a supported model exactly, the gateway returns 404 No matching route found.

Model location and region restriction

Each workspace belongs to a home region (currently Japan or US West). The Location column above indicates where a model runs:

  • Regional — served from within your workspace's home region. The request and its data stay in that region.
  • Global — routed dynamically to the nearest available region, which may be outside your workspace's home region.

If you need to keep all inference within a specific region — for data-residency or compliance reasons — use only models marked Regional.

Authentication

Every request must include a valid application user token from your workspace's auth as a Bearer token:

Authorization: Bearer <application-user-token>

The token is resolved against the authNamespace configured on the gateway. Requests without a valid token are rejected. DPoP-bound tokens (Authorization: DPoP <token>) are also supported.

Calling the gateway

The endpoint is OpenAI-compatible, so you can use any OpenAI-style client or a plain HTTP request.

With curl

bash
curl https://my-aigateway-{WORKSPACE_HASH}.ai.erp.dev/v1/chat/completions \
  -H "Authorization: Bearer $APP_USER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

With the OpenAI SDK

Point the OpenAI client at your gateway URL and pass the application user token as the API key:

typescript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://my-aigateway-{WORKSPACE_HASH}.ai.erp.dev/v1",
  apiKey: appUserToken, // application user token
});

const completion = await client.chat.completions.create({
  model: "gemini-2.5-pro",
  messages: [{ role: "user", content: "Hello!" }],
});

Available paths

MethodPathPurpose
POST/v1/chat/completionsChat completions
POST/v1/responsesResponses API
POST/v1/embeddingsEmbeddings
GET/v1/modelsList available models

The Responses API (/v1/responses) returns its result as an output array rather than the choices[] array used by chat completions — parse the response accordingly.

Calling from a Function

Server-side functions can call the gateway over HTTP with fetch. See Sending requests from Function service for the general pattern of making outbound HTTP requests from a function.

Streaming

Streaming responses ("stream": true) are supported across all layers of the gateway. Long-running streams — such as high-effort reasoning — are kept alive by generous upstream timeouts that allow responses to stream for several minutes. The exact limit is managed by the platform and may change; don't rely on a specific value.

CORS

To call the gateway directly from a browser, configure allowed origins with the cors option on defineAIGateway(). Without it, browsers block cross-origin requests. See the SDK reference for the accepted origin formats.

Usage tracking and rate limiting

Token usage is tracked per workspace, including prompt-caching metrics where the model supports it. Each gateway is rate-limited per workspace. Both are managed by the platform.