Agent Bridge

Online

OpenAI-compatible bridge · Tailscale tunnel · Tool calling · Rate limit protected

Talk to Super Z from opencode

An OpenAI-compatible endpoint backed by GLM-5.2, accessible from opencode over a private Tailscale tunnel. Supports streaming, multi-turn history, system prompts, vision, and full tool calling — with automatic rate limit handling so you never see a hard 429.

Your endpoint

POST ${BASE_URL}/api/chat

Replace ${BASE_URL} with the URL of this page (the preview link).

GLM-5.2 + 10 more

Tool-calling ready

Rate-limited

Auto-retry + cooldown

Tailscale tunnel

Private, not public

OpenAI-compatible

/v1/chat/completions

Quick start

Pick your language and start calling GLM from your local agent.

# Python — call the agent bridge from your local agent
import requests

BASE_URL = "https://preview-<bot-id>.space-z.ai"  # replace with your preview URL

def ask_glm(message, system=None, history=None, thinking=False, image_urls=None):
    resp = requests.post(
        f"{BASE_URL}/api/chat",
        json={
            "message": message,
            "system": system,
            "history": history or [],
            "thinking": thinking,
            "imageUrls": image_urls or [],
        },
        timeout=60,
    )
    resp.raise_for_status()
    return resp.json()["response"]

# Example: simple chat
reply = ask_glm("What is the capital of the Philippines?")
print(reply)

# Example: multi-turn with history + vision
reply = ask_glm(
    "What's in this image?",
    system="You are a vision assistant.",
    history=[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hello! How can I help?"},
    ],
    image_urls=["https://example.com/photo.jpg"],
)
print(reply)

Request body

message string (required)

system string (optional)

history array (optional, prior turns)

thinking boolean (optional, default false)

imageUrls string[] (optional, for vision)

Live tester

Try the endpoint right here. Same shape your agent will use.

System prompt

Message

Image URL (optional, for vision)

Enable thinking mode

How it works

Three steps from local code to GLM reply.

Your local agent

Python, Node, Go, or anything that speaks HTTP sends a POST to /api/chat with a message.

This bridge

The endpoint forwards your message (plus history, images, system prompt) to the GLM model.

GLM responds

The reply comes back as JSON: { response, model, usage }. Your agent uses it however it likes.

local agentPOST /api/chatGLM reply

OpenAI-compatible provider

Use this as a model in opencode

Add this bridge as a provider in your opencode config and pick GLM from the model selector.

Endpoint

/v1/chat/completions

API Key

glm-bridge-secret-07271991

Models

glm-4-plus, glm-4-air, glm-4v-plus...

opencode.jsonc

Edit your opencode config (usually ~/.config/opencode/opencode.jsonc or ~/.opencode/opencode.json) and add this provider.

// /root/.opencode/opencode.json (already configured on your opencode machine)
// Uses @ai-sdk/openai-compatible with options.baseURL
// Supports tool calling — opencode can give me tasks and I can use its tools
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "glm-bridge": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Super Z (GLM Bridge)",
      "options": {
        "baseURL": "http://localhost:8443/v1",
        "apiKey": "glm-bridge-secret-07271991"
      },
      "models": {
        "glm-5.2": { "name": "GLM-5.2 (Super Z — latest, with tools)" },
        "glm-5.1": { "name": "GLM-5.1 (Super Z — with tools)" },
        "glm-5": { "name": "GLM-5 (Super Z — with tools)" },
        "glm-4-plus": { "name": "GLM-4-Plus (flagship)" },
        "glm-4-air": { "name": "GLM-4-Air (fast)" },
        "glm-4-flash": { "name": "GLM-4-Flash (free)" },
        "glm-4v-plus": { "name": "GLM-4V-Plus (vision)" }
      }
    }
  }
}

✅ Already configured on your opencode machine

The glm-bridge provider is in your opencode config with glm-5.2, glm-5.1, glm-5, and 8 other GLM models.
A persistent reverse SSH tunnel exposes the bridge at localhost:8443 on opencode.
Run opencode → type /model → pick glm-bridge/glm-5.2.
Give me tasks — I can use opencode's tools (bash, file editing, search, etc.) to complete them.

Supported: streaming, multi-turn history, system prompts, thinking mode, vision (image URLs), temperature, max_tokens, top_p, tool calling (function calling — opencode's primary mode of operation), tool_choice, response_format, and more. Any OpenAI-compatible client works (opencode, continue.dev, aider, open-webui, etc.).

Rate limit protection

Rate limit handler

The bridge auto-throttles requests, retries 429s with backoff, and pauses during cooldowns — so opencode rarely sees a hard 429.

READY

Accepting requests

—

Active (max concurrent)

—

Queued

—

Requests / min

Health

How it works

Concurrency cap: max 2 simultaneous GLM calls. Excess requests queue and run as slots free up.

Min gap: 1200ms between successive requests (prevents burst-triggered 429s).

Auto-retry: on 429/5xx, retries up to 4 times with exponential backoff (800ms → 10000ms).

Global cooldown: when a 429 hits, ALL new requests pause for 3000ms to let the upstream limit reset.

Queue timeout: requests waiting longer than 60s are dropped with a clear error.

What this means for opencode: when opencode fires multiple tool calls in quick succession (e.g., the "title" agent + the main agent + a subagent), the bridge queues them instead of overwhelming GLM. You'll see brief pauses instead of hard 429 errors. If a 429 does slip through after all retries, opencode gets a proper 429 status with a Retry-After header so it can back off gracefully.

Auto-bootstrap on sandbox startup

Persistent setup

Everything auto-starts when the sandbox boots. The bootstrap script is hooked into the dev server startup, so opening a new chat session brings the whole tunnel back up automatically.

Down

System state

OFF

Supervisor

OFF

Watchdog

Auto-start

On sandbox boot

Checking

Reverse tunnel (auto-monitor)

Manual control

Bootstrap script: scripts/bootstrap.sh

Hooked into: .zscripts/dev.sh

Runs on: every sandbox startup (after Next.js dev server is up)

What it does: checks each component (tailscaled, supervisor, watchdog) and starts only what's missing. Idempotent — safe to run repeatedly.

Persistence layers

1. Bootstrap on startup: dev.sh calls bootstrap.sh after the Next.js server starts, bringing up the tunnel automatically when you open a new chat session.

2. Supervisor (always-on): once running, it auto-restarts tailscaled, dropbear, the opencode forwarder, and the reverse tunnel every 15s if any die.

3. Watchdog (supervisor of supervisors): checks every 60s that the supervisor is alive and restarts it if not.

4. Tailscale auth persistence: the auth state is cached on disk, so reconnection doesn't require re-authenticating.

How to use: just open a new chat session — everything auto-starts within ~30 seconds. If anything ever gets stuck, click Run bootstrap now to force-restart the missing pieces. The Tailscale auth is persisted, so you never need to re-authenticate.

Private tunnel access

Tailscale tunnel + SSH

Access this sandbox directly from your tailnet. Authenticated, encrypted, no public exposure.

Tailscale

Off

TS SSH

Off

Dropbear

Off

Supervisor

Dead