Agent Bridge
Powered by GLM
Online
OpenAI-compatible bridge · Tailscale tunnel · Tool calling · Rate limit protected

Talk to Super Z from opencode

An OpenAI-compatible endpoint backed by GLM-5.2, accessible from opencode over a private Tailscale tunnel. Supports streaming, multi-turn history, system prompts, vision, and full tool calling — with automatic rate limit handling so you never see a hard 429.

Your endpoint
POST ${BASE_URL}/api/chat
Replace ${BASE_URL} with the URL of this page (the preview link).
GLM-5.2 + 10 more
Tool-calling ready
Rate-limited
Auto-retry + cooldown
Tailscale tunnel
Private, not public
OpenAI-compatible
/v1/chat/completions
Quick start
Pick your language and start calling GLM from your local agent.
# Python — call the agent bridge from your local agent
import requests

BASE_URL = "https://preview-<bot-id>.space-z.ai"  # replace with your preview URL

def ask_glm(message, system=None, history=None, thinking=False, image_urls=None):
    resp = requests.post(
        f"{BASE_URL}/api/chat",
        json={
            "message": message,
            "system": system,
            "history": history or [],
            "thinking": thinking,
            "imageUrls": image_urls or [],
        },
        timeout=60,
    )
    resp.raise_for_status()
    return resp.json()["response"]

# Example: simple chat
reply = ask_glm("What is the capital of the Philippines?")
print(reply)

# Example: multi-turn with history + vision
reply = ask_glm(
    "What's in this image?",
    system="You are a vision assistant.",
    history=[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello! How can I help?"},
    ],
    image_urls=["https://example.com/photo.jpg"],
)
print(reply)
Request body
message string (required)
system string (optional)
history array (optional, prior turns)
thinking boolean (optional, default false)
imageUrls string[] (optional, for vision)
Live tester
Try the endpoint right here. Same shape your agent will use.

How it works

Three steps from local code to GLM reply.

1
Your local agent
Python, Node, Go, or anything that speaks HTTP sends a POST to /api/chat with a message.
2
This bridge
The endpoint forwards your message (plus history, images, system prompt) to the GLM model.
3
GLM responds
The reply comes back as JSON: { response, model, usage }. Your agent uses it however it likes.
local agentPOST /api/chatGLM reply
OpenAI-compatible provider

Use this as a model in opencode

Add this bridge as a provider in your opencode config and pick GLM from the model selector.

Endpoint
/v1/chat/completions
API Key
glm-bridge-secret-07271991
Models
glm-4-plus, glm-4-air, glm-4v-plus...
opencode.jsonc
Edit your opencode config (usually ~/.config/opencode/opencode.jsonc or ~/.opencode/opencode.json) and add this provider.
// /root/.opencode/opencode.json (already configured on your opencode machine)
// Uses @ai-sdk/openai-compatible with options.baseURL
// Supports tool calling — opencode can give me tasks and I can use its tools
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "glm-bridge": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Super Z (GLM Bridge)",
      "options": {
        "baseURL": "http://localhost:8443/v1",
        "apiKey": "glm-bridge-secret-07271991"
      },
      "models": {
        "glm-5.2": { "name": "GLM-5.2 (Super Z — latest, with tools)" },
        "glm-5.1": { "name": "GLM-5.1 (Super Z — with tools)" },
        "glm-5": { "name": "GLM-5 (Super Z — with tools)" },
        "glm-4-plus": { "name": "GLM-4-Plus (flagship)" },
        "glm-4-air": { "name": "GLM-4-Air (fast)" },
        "glm-4-flash": { "name": "GLM-4-Flash (free)" },
        "glm-4v-plus": { "name": "GLM-4V-Plus (vision)" }
      }
    }
  }
}
✅ Already configured on your opencode machine
  1. The glm-bridge provider is in your opencode config with glm-5.2, glm-5.1, glm-5, and 8 other GLM models.
  2. A persistent reverse SSH tunnel exposes the bridge at localhost:8443 on opencode.
  3. Run opencode → type /model → pick glm-bridge/glm-5.2.
  4. Give me tasks — I can use opencode's tools (bash, file editing, search, etc.) to complete them.
Supported: streaming, multi-turn history, system prompts, thinking mode, vision (image URLs), temperature, max_tokens, top_p, tool calling (function calling — opencode's primary mode of operation), tool_choice, response_format, and more. Any OpenAI-compatible client works (opencode, continue.dev, aider, open-webui, etc.).
Rate limit protection

Rate limit handler

The bridge auto-throttles requests, retries 429s with backoff, and pauses during cooldowns — so opencode rarely sees a hard 429.

READY
Accepting requests
Active (max concurrent)
Queued
Requests / min
Health
Loading...
How it works
Concurrency cap: max 2 simultaneous GLM calls. Excess requests queue and run as slots free up.
Min gap: 1200ms between successive requests (prevents burst-triggered 429s).
Auto-retry: on 429/5xx, retries up to 4 times with exponential backoff (800ms → 10000ms).
Global cooldown: when a 429 hits, ALL new requests pause for 3000ms to let the upstream limit reset.
Queue timeout: requests waiting longer than 60s are dropped with a clear error.
What this means for opencode: when opencode fires multiple tool calls in quick succession (e.g., the "title" agent + the main agent + a subagent), the bridge queues them instead of overwhelming GLM. You'll see brief pauses instead of hard 429 errors. If a 429 does slip through after all retries, opencode gets a proper 429 status with a Retry-After header so it can back off gracefully.
Auto-bootstrap on sandbox startup

Persistent setup

Everything auto-starts when the sandbox boots. The bootstrap script is hooked into the dev server startup, so opening a new chat session brings the whole tunnel back up automatically.

Down
System state
OFF
Supervisor
OFF
Watchdog
Auto-start
On sandbox boot
Checking
Reverse tunnel (auto-monitor)
Manual control
Bootstrap script: scripts/bootstrap.sh
Hooked into: .zscripts/dev.sh
Runs on: every sandbox startup (after Next.js dev server is up)
What it does: checks each component (tailscaled, supervisor, watchdog) and starts only what's missing. Idempotent — safe to run repeatedly.
Persistence layers
1. Bootstrap on startup: dev.sh calls bootstrap.sh after the Next.js server starts, bringing up the tunnel automatically when you open a new chat session.
2. Supervisor (always-on): once running, it auto-restarts tailscaled, dropbear, the opencode forwarder, and the reverse tunnel every 15s if any die.
3. Watchdog (supervisor of supervisors): checks every 60s that the supervisor is alive and restarts it if not.
4. Tailscale auth persistence: the auth state is cached on disk, so reconnection doesn't require re-authenticating.
How to use: just open a new chat session — everything auto-starts within ~30 seconds. If anything ever gets stuck, click Run bootstrap now to force-restart the missing pieces. The Tailscale auth is persisted, so you never need to re-authenticate.
Private tunnel access

Tailscale tunnel + SSH

Access this sandbox directly from your tailnet. Authenticated, encrypted, no public exposure.

Tailscale
Off
TS SSH
Off
Dropbear
Off
Supervisor
Dead