Design Doc: State-Mutating Operations API Surface
| Field | Value |
|---|---|
| Status | Implemented |
| Date | 2026-03-06 |
| Author(s) | Claude, Codex |
| Issue | — |
| Supersedes | Earlier drafts in this file and gc-api-state-mutations-v0.md |
Table of Contents
- Summary
- Motivation
- Industry Analysis
- Design Principles
- The Semantic Mismatch (Critical Bug)
- Resource Model
- URL Structure
- Complete Endpoint Catalog
- StateMutator Interface Evolution
- Implementation Architecture
- Concurrency, Idempotency, and Operations
- Security
- Error Handling
- Legacy Endpoint Policy
- Delivery Phases
- Testing Strategy
- Open Questions
- Alternatives Considered
- Appendix: Quick Reference
1. Summary
Gas City needs a coherent write API that:- Separates desired state (what the controller should do) from runtime actions (what to do to a live session right now)
- Fixes the existing semantic mismatch where CLI and API use the same verbs for different state planes
- Covers every CLI mutation with an API equivalent (26 operations across 8 categories have no API today)
- Handles pack-derived resources correctly (you can’t PATCH a derived agent — you create a patch resource)
- Adds optimistic concurrency, idempotency, dry-run, and operation tracking where they reduce ambiguity
- Follows battle-tested patterns from Kubernetes, AWS, Nomad, and Fly.io without importing their ceremony
/v0/, and ships
incrementally across 4 phases. Existing endpoints continue to work — the
migration is additive with explicit deprecation.
2. Motivation
The Two-Writer Problem
Gas City currently has two write models that disagree on semantics: CLI writes desired state tocity.toml:
gc agent suspend worker→ setssuspended=truein city.toml (cmd/gc/cmd_agent.go:488)gc rig add ./payments→ writes[[rigs]]entry + bootstraps filesystemgc suspend→ setsworkspace.suspended=true(cmd/gc/cmd_suspend.go:104)
POST /v0/agent/worker/suspend→ callssp.SetMeta(sessionName, "suspended", "true")(cmd/gc/api_state.go:220)POST /v0/rig/payments/suspend→ sets metadata on all rig sessions (cmd/gc/api_state.go:269)
The Coverage Gap
Beyond the semantic mismatch, 26 CLI mutations have no API equivalent:| Category | Missing Operations |
|---|---|
| City lifecycle | start, stop, restart, suspend, resume |
| Agent CRUD | add, destroy, start, stop, scale |
| Rig CRUD | add, remove, restart |
| Config | apply, validate, provider CRUD |
| Packs | fetch |
| Orders | run, enable/disable |
| Events | emit |
| Misc | handoff, reconcile |
The Provenance Problem
When an agent comes from a pack, the CLI already knows you can’t edit it directly:3. Industry Analysis
Patterns We Adopt
| Pattern | Source | GC Implementation | ||
|---|---|---|---|---|
| Desired state vs observed state | K8s spec/status | Config is spec; runtime view is status | ||
| Resource-oriented URLs | K8s, Nomad | /v0/{resource} flat namespace | ||
| Standard verbs | K8s, REST | GET, POST, PUT, PATCH, DELETE | ||
| Action subresources | K8s, Fly.io | POST /v0/agent/{name}/kill | ||
| Blocking queries | Nomad, Consul | ?index=N&wait=30s (already implemented) | ||
| SSE streaming | Nomad | /v0/events/stream (already implemented) | ||
| Dry-run | K8s, AWS EC2 | ?dry_run=true on desired-state mutations | ||
| Idempotency tokens | AWS | Idempotency-Key header on creates/deletes | ||
| Optimistic concurrency | K8s resourceVersion | If-Match / ETag on desired-state writes | ||
| Structured errors | K8s, AWS | {code, message, details[]} (already implemented) | ||
| Operation tracking | AWS CloudFormation | Async operations return trackable operation IDs | ||
| Finalizer-like deletion | K8s | Drain-before-destroy for agents | ||
| Generation tracking | K8s | generation bumps on spec change; observed_generation on reconcile | ||
| Provenance/origin | K8s field ownership | `origin: inline | patch | derived` on resources |
Patterns We Reject
| Pattern | Source | Why Not |
|---|---|---|
| API groups / discovery docs | K8s | Too much ceremony for single-binary SDK |
| Admission webhooks | K8s | No extension model needed |
| CRDs / dynamic schema | K8s | Static types sufficient |
| Full MVCC | etcd | Event log provides similar semantics more simply |
| Request-only CRUD | AWS Cloud Control | Direct resource verbs are simpler |
| Lease-based mutation | Fly.io | Single controller, no contention |
| Separate API server binary | — | Embedded server has direct state access |
| gRPC transport | — | HTTP JSON sufficient; OpenAPI later |
Key Insight: Nomad Is Our Closest Analog
Nomad is a single-binary orchestrator with an embedded HTTP API that manages desired state (jobs) separately from runtime state (allocations). Gas City’s architecture — controller + agents, config + sessions — maps naturally to this model. We adopt Nomad’s flat URL structure, blocking queries, and plan/dry-run semantics.4. Design Principles
Seven principles govern the write API:- Desired state and runtime actions are different operations. Suspend is desired state (survives restarts). Kill is a runtime action (immediate, ephemeral). The API makes this explicit.
- The API is the supported writer when the controller is running. CLI should delegate to the API. Direct file edits are treated as out-of-band changes the controller re-ingests via fsnotify.
-
Derived resources are overridden by patches, not edited directly. If
an agent came from a pack,
PATCH /v0/agent/{name}onorigin=derivedreturns 409 and tells you to create a patch resource instead. - All mutations are typed and auditable. Every write emits an event. Structural changes (config mutations) also create operation records.
- Optimistic concurrency on desired-state writes. Prevents lost updates from concurrent CLI + dashboard modifications.
- Idempotent creates and deletes. Safe to retry after network failures.
-
The controller is the reconciler; API handlers never shell out. The
API writes config and triggers reconciliation. It does not exec
gcsubcommands.
5. The Semantic Mismatch
This is the most important thing to fix. The table below shows every existing mutation endpoint and its current vs correct behavior:| Endpoint | Current Behavior | Correct Behavior | Fix |
|---|---|---|---|
POST /v0/agent/{name}/suspend | Sets session metadata | Write suspended=true to city.toml | Redefine as desired-state write |
POST /v0/agent/{name}/resume | Removes session metadata | Write suspended=false to city.toml | Redefine as desired-state write |
POST /v0/rig/{name}/suspend | Sets session metadata on all agents | Write suspended=true on rig in city.toml | Redefine as desired-state write |
POST /v0/rig/{name}/resume | Removes session metadata on all agents | Write suspended=false on rig in city.toml | Redefine as desired-state write |
POST /v0/agent/{name}/kill | Calls sp.Stop() | Correct (runtime action) | Keep as-is |
POST /v0/agent/{name}/drain | Sets drain metadata | Correct (runtime action) | Keep as-is |
POST /v0/agent/{name}/undrain | Removes drain metadata | Correct (runtime action) | Keep as-is |
POST /v0/agent/{name}/nudge | Sends to session | Correct (runtime action) | Keep as-is |
6. Resource Model
6.1 Resource Envelope
Desired-state resources use a lightweight envelope inspired by Kubernetes but without the full ceremony:| Field | Purpose |
|---|---|
resource_version | Optimistic concurrency token. Changes on every mutation. Used with If-Match/ETag. |
generation | Bumps only when spec changes. Unchanged by metadata-only updates. |
observed_generation | Set by the reconciler when it processes a generation. observed_generation < generation means convergence pending. |
origin | inline (in city.toml), patch (via [[patches]]), or derived (from pack expansion). Controls mutability. |
conditions | Structured status signals. Types: Ready, Healthy, Degraded, BootstrapComplete. |
6.2 Provenance and Mutability Rules
| Origin | Mutable via resource endpoint? | How to modify |
|---|---|---|
inline | Yes — PATCH/PUT/DELETE work | Direct config edit |
patch | Yes — PATCH/PUT/DELETE on the patch | Modifies [[patches]] entry |
derived | No — returns 409 | Create a patch resource via POST /v0/patches/agents |
gc agent suspend on a
pack-derived agent tells you to use [[patches]].
6.3 Resource Kinds
Desired-state resources (persisted in city.toml):City— workspace-level settingsAgent— agent definitions (includes agents with pool config)Rig— external project registrationsProvider— provider presetsAgentPatch— override for a derived agentRigPatch— override for a derived rigProviderPatch— override for a derived provider
- Agent list/detail with session state, active bead, etc. (existing
/v0/agents) - Rig list/detail with running counts (existing
/v0/rigs)
Operation— tracks async mutation progress
6.4 Agent vs AgentPool: One Resource
An agent with apool block in its spec is a pool. An agent without one is
a singleton. This matches the config model (config.Agent with optional
*PoolConfig). There is no separate AgentPool resource kind.
Rationale: Agents and pools share 95% of their fields. A separate kind would
force structural changes (delete singleton + create pool) for what is
logically a config change. One resource with an optional pool block is
simpler for both API consumers and the implementation.
7. URL Structure
Flat Namespace with Semantic Clarity
URLs use a flat/v0/ namespace. We do NOT split into /v0/state/ and
/v0/runtime/ prefixes, despite the conceptual distinction between desired
state and runtime actions. Reasons:
- Simplicity. Users don’t want to think about which URL prefix to use for suspend vs kill. The verb on the action subresource makes it clear.
- Backward compatibility. Existing endpoints stay at their current paths. No mass migration.
- Nomad precedent. Nomad uses flat
/v1/job/{id}for both spec updates and evaluations without a state/runtime split.
| Pattern | Semantics |
|---|---|
GET /v0/{resource} | Read current state |
POST /v0/{resources} | Create new resource (desired state) |
PUT /v0/{resource}/{id} | Replace resource spec (desired state) |
PATCH /v0/{resource}/{id} | Partial update spec (desired state) |
DELETE /v0/{resource}/{id} | Remove resource (desired state) |
POST /v0/{resource}/{id}/{action} | Imperative runtime action |
operation response field shows whether the
mutation is synchronous (config commit) or async (reconciliation pending).
URL Conventions
/v0/agents), singular for instances (/v0/agent/{name}).
8. Complete Endpoint Catalog
8.1 Health & Status
controller_uptime, suspended,
config_generation, and observed_generation.
8.2 City Lifecycle
GET /v0/city — Returns city desired state as a resource with envelope.
Includes spec.suspended, spec.provider, spec.session_template, etc.
PATCH /v0/city — Partial update of city desired state. This is how
suspend/resume works at the city level:
POST /v0/city/start — Triggers reconciliation pass. Starts agents
per current config. Supports {"dry_run": true}.
POST /v0/city/stop — Graceful shutdown of all agents.
Accepts {"timeout": "10s"}.
POST /v0/city/restart — Stop then start. Atomic.
POST /v0/city/reconcile — Force immediate reconciliation without
restart. Like Nomad’s POST /v1/job/{id}/evaluate.
8.3 Agents
POST /v0/agents — Create Agent (desired state)
Adds agent to city.toml. Returns resource with envelope. If pool block
is present, creates a pool agent. Requires Idempotency-Key.
201:
PUT /v0/agent/{name} — Replace Agent Spec (desired state)
Full spec replacement. Requires If-Match. Returns 409 if origin=derived.
PATCH /v0/agent/{name} — Partial Agent Update (desired state)
Merge-patch semantics matching AgentPatch. Requires If-Match. Returns
409 if origin=derived with instructions to use patch resource.
DELETE /v0/agent/{name} — Destroy Agent (desired state)
Removes from city.toml. Requires Idempotency-Key. Default behavior:
drain running sessions first, then remove config.
Query params:
?force=true— skip drain, immediate kill + remove?drain_timeout=30s— override default
force not set.
POST /v0/agent/{name}/suspend — (REDEFINED)
Now writes suspended=true to city.toml (desired state), matching CLI
behavior. Previously set session metadata (runtime only). This is a
semantic fix, not a new endpoint.
POST /v0/agent/{name}/resume — (REDEFINED)
Now writes suspended=false to city.toml, matching CLI behavior.
POST /v0/agent/{name}/start — Start Session (runtime action)
Starts agent session(s). For pools, accepts {"count": 2}.
POST /v0/agent/{name}/stop — Stop Session (runtime action)
Stops running session(s). For pools, accepts {"count": 1, "timeout": "10s"}.
POST /v0/agent/{name}/restart — Restart Session (runtime action)
Stops then starts. The reconciler handles the restart naturally.
POST /v0/agent/{name}/scale — Scale Pool (runtime action)
Adjusts pool instance count. Only valid for pool agents.
GET /v0/agent/{name} response:
8.4 Rigs
POST /v0/rigs — Create Rig (desired state)
Registers project directory, initializes bead store, writes city.toml.
Bootstrap work (bead init, hook install, route generation) may be async.
202 Accepted with operation:
PATCH /v0/rig/{name} — Update Rig (desired state)
DELETE /v0/rig/{name} — Remove Rig (desired state)
Stops rig agents, removes config entry. Does NOT delete the project
directory or bead data. Accepts ?force=true and ?keep_beads=true.
POST /v0/rig/{name}/suspend and resume — (REDEFINED)
Now write to city.toml (desired state), matching CLI behavior.
POST /v0/rig/{name}/restart — Restart Rig (runtime action)
Kills all agents in the rig. Reconciler restarts them.
8.5 Providers
GET /v0/providers — Lists all providers (built-in + user-defined) with
origin and in_use_by fields.
POST /v0/providers — Create custom provider.
DELETE /v0/provider/{name} — Returns 409 if agents reference it.
PATCH /v0/provider/{name} on origin=builtin creates a
[[patches.providers]] entry (you can’t edit built-in definitions, only
override them).
8.6 Patch Resources
[[patches.agent]], [[patches.rigs]], and
[[patches.providers]] sections of city.toml.
POST /v0/patches/agents — Create agent patch:
PATCH /v0/agent/payments/reviewer on an origin=derived
agent, the error response includes:
8.7 Config Operations
GET /v0/config — Returns fully-resolved config as JSON.
POST /v0/config/apply — Declarative bulk config mutation. Accepts a
partial config document and merges it into city.toml. Supports
{"dry_run": true} for preview.
POST /v0/config/validate — Validates without applying. Returns
validation errors and warnings.
GET /v0/config/explain — Returns config provenance (where each
value came from). Accepts ?agent= and ?rig= filters.
8.8 Orders
POST /v0/order/{name}/run — Manual trigger, bypasses gate.
POST /v0/order/{name}/enable / disable — Persists as
OrderOverride in city.toml.
8.9 Packs
8.10 Operations
phase: "Succeeded" inline. Slow mutations (bootstrap, drain-then-delete)
return 202 Accepted with phase: "Running".
8.11 Events
POST /v0/events — Emit custom event:
8.12 Beads, Mail, Convoys, Sling
Existing endpoints are kept with minimal additions:Idempotency-Key support but no behavioral changes.
8.13 Endpoint Summary
| Category | Existing | Redefined | New | Total |
|---|---|---|---|---|
| Health/Status | 2 | 0 | 1 | 3 |
| City | 0 | 0 | 6 | 6 |
| Agents | 8 | 2 | 8 | 18 |
| Rigs | 4 | 2 | 3 | 9 |
| Providers | 0 | 0 | 6 | 6 |
| Patches | 0 | 0 | 15 | 15 |
| Config | 0 | 0 | 4 | 4 |
| Orders | 0 | 0 | 7 | 7 |
| Packs | 0 | 0 | 2 | 2 |
| Operations | 0 | 0 | 4 | 4 |
| Events | 2 | 0 | 1 | 3 |
| Beads | 7 | 0 | 4 | 11 |
| 9 | 0 | 1 | 10 | |
| Convoys | 4 | 0 | 3 | 7 |
| Sling | 1 | 0 | 0 | 1 |
| Total | 37 | 4 | 65 | 106 |
9. StateMutator Interface Evolution
9.1 Current Interface
9.2 Proposed Decomposition
9.3 Capability Discovery
The API server discovers capabilities via type assertion, enabling incremental implementation:handleAgentAction which already
type-asserts to StateMutator. The server gracefully degrades when running
against a controller that hasn’t implemented all interfaces yet.
10. Implementation Architecture
10.1 Config Mutation Flow
- Durability — changes survive controller restart
- Consistency — same validation pipeline regardless of source
- Observability —
git diff city.tomlshows all API-applied changes - Safety — out-of-band edits are detected and re-ingested
10.2 Concurrency Model
- Reads take
mu.RLock()(concurrent, non-blocking) - Config writes take
configMu.Lock()(serialized, prevents lost updates) - Runtime actions (kill, drain, nudge) take neither lock — they go directly to the session provider
10.3 No Metadata Store — Derive Everything
Gas City’s design principle: no status files — query live state. State files go stale on crash and create false positives. Every piece of metadata the API needs is derivable from existing sources of truth:| Need | Derivation |
|---|---|
| Optimistic concurrency (ETag) | SHA256 hash of the resource’s serialized TOML section |
| Provenance/origin | Raw config vs expanded config comparison (CLI already does this) |
| Convergence tracking | Event log records controller.config_reloaded events |
| Idempotency cache | In-memory map with TTL (single-process, single-user) |
| Operation tracking | Event log with correlation IDs (Phase 3) |
- Load raw config (no pack expansion) → look for agent
- Found? →
origin=inline - Not found? Load expanded config → found there? →
origin=derived
10.4 Suspend/Resume Fix
The suspend/resume semantic fix is implemented by changing thecontrollerState methods:
Before (runtime only):
configedit.Editor handles the serialization lock, raw config load,
validation, and atomic write. The caller just provides the mutation function.
10.5 CSRF and Read-Only Middleware Extension
Extend to all mutation methods (currently POST-only):withReadOnly.
11. Concurrency, Idempotency, and Operations
11.1 Optimistic Concurrency
Required on desired-state writes (PATCH, PUT, DELETE on resources). Not required on runtime actions (kill, drain, nudge — these are inherently imperative). Read responses include:ETag: "rv_184"headermetadata.resource_version: "rv_184"in body
If-Match: "rv_184"header
412 Precondition Failed.
11.2 Idempotency
Required on non-idempotent creates and deletes:POST /v0/agents—Idempotency-KeyrequiredPOST /v0/rigs—Idempotency-KeyrequiredDELETE /v0/agent/{name}—Idempotency-Keyrequired
- Same key + same request body hash → return original result
- Same key + different body hash →
422 Unprocessable Entity - Expired/evicted key → treated as new request
11.3 Dry-Run
Supported on desired-state mutation endpoints via?dry_run=true.
Behavior:
- Full validation runs
- Provenance checks run
- Optimistic concurrency checks run
- Response shows the would-be resource and reconciliation preview
- No city.toml write
- No operation record
- No audit event
12. Security
Threat Model
Same as today: single-user, local-machine operation.| Threat | Mitigation |
|---|---|
| Cross-origin browser attacks | CORS (localhost-only) + CSRF header |
| Non-localhost exposure | Automatic read-only mode |
| Stale concurrent writes | Optimistic concurrency (If-Match) |
| Config injection | Full validation on all config mutations |
| Path traversal | Rig paths validated |
| Oversized requests | 1 MiB body limit |
| Duplicate side effects | Idempotency keys |
Destructive Operation Safety
| Operation | Protection |
|---|---|
| DELETE agent | Drain-first; ?force=true to skip |
| DELETE rig | 409 if agents running; ?force=true to skip |
| City stop | No extra protection (matches Ctrl-C) |
| Config apply | Dry-run available; validation always runs |
| DELETE bead | 409 if open children exist |
Future: Token Auth
When implemented, tokens will have scoped capabilities:| Scope | Access |
|---|---|
gc.read | All GET endpoints |
gc.write | Desired-state mutations |
gc.runtime | Runtime actions (kill, drain, nudge) |
gc.admin | City lifecycle, config apply |
gc.operations | Read/cancel operations |
13. Error Handling
Error Codes
| HTTP | Code | When |
|---|---|---|
| 400 | invalid | Malformed body, invalid field values |
| 404 | not_found | Resource doesn’t exist |
| 409 | conflict | Duplicate create, derived resource direct edit, busy delete |
| 412 | precondition_failed | Stale If-Match value |
| 422 | idempotency_mismatch | Same key, different request body |
| 403 | read_only | Non-localhost mutation |
| 403 | csrf | Missing X-GC-Request header |
| 501 | not_implemented | Capability not available on this controller |
| 500 | internal | Unexpected server error |
Recovery Model
- Desired-state commits are atomic (succeed or fail completely)
- Follow-on reconciliation may fail independently
- Failure is represented in operation status and resource conditions
- Retryable failures can be requeued via
POST /v0/operation/{id}/retry - Spec persisted but not yet healthy is the correct failure mode (K8s model)
14. Legacy Endpoint Policy
| Existing Endpoint | Policy |
|---|---|
POST /v0/agent/{name}/suspend | Redefined: now writes city.toml |
POST /v0/agent/{name}/resume | Redefined: now writes city.toml |
POST /v0/rig/{name}/suspend | Redefined: now writes city.toml |
POST /v0/rig/{name}/resume | Redefined: now writes city.toml |
POST /v0/agent/{name}/kill | Kept, same path |
POST /v0/agent/{name}/drain | Kept, same path |
POST /v0/agent/{name}/undrain | Kept, same path |
POST /v0/agent/{name}/nudge | Kept, same path |
POST /v0/bead/{id}/update | Kept, deprecated in favor of PATCH /v0/bead/{id} |
| All bead/mail/convoy/sling | Kept, gain audit events + optional idempotency |
15. Delivery Phases
Phase 1: Fix Semantics + Agent/Rig CRUD ✓
The critical fix. Suspend/resume becomes desired-state. Add structural CRUD for agents and rigs. Endpoints delivered:internal/fsys/atomic.go— atomic file write helper (temp + rename)internal/fsys/fsys.go— addedRemoveto FS interfaceinternal/configedit/configedit.go— serialized config editor with provenance detectioninternal/configedit/configedit_test.go— 33 testsinternal/api/state.go—AgentUpdate/RigUpdatetypes, extendedStateMutatorinternal/api/handler_agent_crud.go— agent create/update/delete handlersinternal/api/handler_rig_crud.go— rig create/update/delete handlersinternal/api/handler_city.go— city suspend/resume handlerinternal/api/middleware.go—isMutationMethod()for CSRF/read-onlycmd/gc/api_state.go— suspend/resume rewritten to useconfigedit.Editor
- PUT (full replace) — PATCH-only is simpler and avoids the PUT=PATCH trap
- ETags / optimistic concurrency
- start/stop/restart/scale actions (remain as existing POST actions)
- Idempotency keys, dry-run mode
Phase 2: Providers + Config + Patch Resources ✅
Status: Delivered. 20 endpoints across 3 commits. Endpoints delivered:configedit.Editormethods: CreateProvider, UpdateProvider, DeleteProvider, SetAgentPatch, DeleteAgentPatch, SetRigPatch, DeleteRigPatch, SetProviderPatch, DeleteProviderPatchapi.ProviderUpdatetype with*string/*intfieldsapi.StateMutatorextended with provider + patch CRUDcmd/gc/api_state.gobridge to configedit- Handler tests for all 20 endpoints
- ConfigEdit unit tests for all 9 new Editor methods
internal/api/handler_providers.go— provider list/getinternal/api/handler_provider_crud.go— provider create/update/deleteinternal/api/handler_provider_crud_test.go— provider testsinternal/api/handler_config.go— config GET/explain/validateinternal/api/handler_config_test.go— config testsinternal/api/handler_patches.go— patch resource handlersinternal/api/handler_patches_test.go— patch testsinternal/api/state.go— ProviderUpdate type, extended StateMutatorinternal/api/fake_state_test.go— extended fakeinternal/configedit/configedit.go— 9 new Editor methodsinternal/configedit/configedit_test.go— 15 new testscmd/gc/api_state.go— bridge methods
- Config apply (POST /v0/config) — complex diff/merge engine
- PUT (full replace) for providers
- Optimistic concurrency (ETags)
Phase 3: City Lifecycle + Orders + Operations ✅
Status: Delivered. Orders, events, enhanced status, rig restart all implemented. Endpoints implemented:GET /v0/city— city info- Order CRUD: list/show/enable/disable
POST /v0/events— event emission- Enhanced status with uptime, version, agent counts
POST /v0/rig/{name}/restart— kills all agents in rig (reconciler restarts)POST /v0/agent/{name}/restart— kills agent session (reconciler restarts)
Phase 4: Polish + Bead/Mail Extensions + Packs ✅
Status: Delivered. All bead/mail extensions, cursor pagination, and idempotency implemented. Endpoints implemented:- Cursor pagination on list endpoints (beads, mail, convoys, events)
via
?cursor=<opaque>&limit=Nwithnext_cursorin response Idempotency-Keyheader onPOST /v0/beadsandPOST /v0/mail(in-memory cache with 30-minute TTL; 422 on key reuse with different body)X-GC-Request-Idon all responses (via middleware)
Phase 5: CLI as API Client ✅
Status: Delivered. No new endpoints — CLI routes writes through API when controller is running. Implementation:internal/api/client.go— HTTP client wrapping mutation endpoints (SuspendCity, ResumeCity, SuspendAgent, ResumeAgent, SuspendRig, ResumeRig)cmd/gc/apiroute.go—apiClient(cityPath)detects running controller with API, returns client or nil for fallback to direct mutation- CLI commands wired:
gc suspend,gc resume,gc agent suspend/resume,gc rig suspend/resume
internal/api/client_test.go— 8 tests covering all client methods, error responses, and CSRF header propagation
16. Testing Strategy
Unit Tests
Every handler gets a*_test.go using httptest.NewServer with mock
State/DesiredStateMutator/RuntimeMutator. Coverage:
- Happy path (create, read, update, delete)
- Validation errors (missing fields, invalid values)
- Provenance rejection (409 on derived resource PATCH)
- Optimistic concurrency (412 on stale If-Match)
- Idempotency (replay returns cached result; mismatch returns 422)
- Dry-run (validation without write)
- CSRF rejection
- Read-only mode rejection
Integration Tests
Build-tagged//go:build integration tests:
- Start real controller with API enabled
- Create agent via API → verify city.toml updated → agent starts
- Suspend via API → verify city.toml has
suspended=true→ survives restart - Concurrent PATCH with stale version → verify 412
- Rig create with bootstrap → verify operation progresses to Succeeded
Backward Compatibility Tests
- All existing request/response shapes unchanged
POST /v0/bead/{id}/updatestill worksPOST /v0/agent/{name}/suspendstill works (now with correct semantics)
17. Open Questions
Before Accepting
-
Metadata store format— Resolved: no metadata store. All metadata is derived from city.toml, the expanded config, and the event log. No state files. -
Optimistic concurrency on legacy suspend/resume: Should the redefined
POST /v0/agent/{name}/suspendrequireIf-Match? The old endpoint didn’t. Adding it is technically a breaking change. Recommendation: Optional in Phase 1. Clients that don’t sendIf-Matchget last-writer-wins (same as today). Clients that do send it get safety. -
Agent vs AgentPool as separate resources: The Codex doc suggests a
separate
AgentPoolkind. This doc proposes oneAgentkind with optional pool block. Recommendation: One resource. The config model already works this way. Singleton→pool conversion is a spec change, not a type change. -
Patch resource naming: Should patch resources for agent
payments/reviewerbe namedpayments-reviewer-override(opaque) or just targetpayments/reviewer(one patch per target)? Recommendation: One patch per target, named by target. Multiple patches for the same target would be confusing. -
Config apply scope: Should
POST /v0/config/applyaccept JSON only, or also TOML? Recommendation: JSON only for v0.
During Implementation
- Default retention period for operations (recommend: 7 days)
- Whether
POST /v0/bead/{id}/updateshould emit aDeprecationheader - Which phase adds
gc order runAPI parity (recommend: Phase 3)
18. Alternatives Considered
A. Keep Current Split Model
Leave CLI as desired-state writer, API as runtime writer. Add missing endpoints ad hoc. Rejected: Suspend/resume semantics stay broken. Every new endpoint rediscovers the same rules. Pack-derived writes stay ambiguous.B. /v0/state/* and /v0/runtime/* URL Prefix Split
Separate URL namespaces for desired-state and runtime operations (from
gc-api-state-mutations-v0.md).
Rejected for v0: Adds cognitive overhead (users must pick the right
prefix). Backward-incompatible with existing endpoints. The HTTP method
- action subresource already communicates intent. The conceptual distinction is preserved in documentation and error messages, not URL structure.
C. Full Kubernetes-Shaped API
Fullmetadata, API groups, discovery documents, admission webhooks, scale
subresources.
Rejected: Too much ceremony for a single-binary SDK serving one city.
We adopt K8s patterns (spec/status, generation, conditions, optimistic
concurrency) without K8s structure.
D. Thin CLI Wrapper
Shell out togc rig add, gc agent add, etc. from API handlers.
Rejected: Couples API to CLI output format. Prevents typed idempotency,
concurrency control, and structured operations. Repeats the dashboard
subprocess problem.