openai/symphony

Public

mirrored fromhttps://github.com/openai/symphonyAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
b0e0ff0082236a73c12a48483d0c6036fdd31fe1

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

SPEC.md

2110lines · modecode

1# Symphony Service Specification
2
3Status: Draft v1 (language-agnostic)
4
5Purpose: Define a service that orchestrates coding agents to get project work done.
6
7## 1. Problem Statement
8
9Symphony is a long-running automation service that continuously reads work from an issue tracker
10(Linear in this specification version), creates an isolated workspace for each issue, and runs a
11coding agent session for that issue inside the workspace.
12
13The service solves four operational problems:
14
15- It turns issue execution into a repeatable daemon workflow instead of manual scripts.
16- It isolates agent execution in per-issue workspaces so agent commands run only inside per-issue
17 workspace directories.
18- It keeps the workflow policy in-repo (`WORKFLOW.md`) so teams version the agent prompt and runtime
19 settings with their code.
20- It provides enough observability to operate and debug multiple concurrent agent runs.
21
22Implementations are expected to document their trust and safety posture explicitly. This
23specification does not require a single approval, sandbox, or operator-confirmation policy; some
24implementations may target trusted environments with a high-trust configuration, while others may
25require stricter approvals or sandboxing.
26
27Important boundary:
28
29- Symphony is a scheduler/runner and tracker reader.
30- Ticket writes (state transitions, comments, PR links) are typically performed by the coding agent
31 using tools available in the workflow/runtime environment.
32- A successful run may end at a workflow-defined handoff state (for example `Human Review`), not
33 necessarily `Done`.
34
35## 2. Goals and Non-Goals
36
37### 2.1 Goals
38
39- Poll the issue tracker on a fixed cadence and dispatch work with bounded concurrency.
40- Maintain a single authoritative orchestrator state for dispatch, retries, and reconciliation.
41- Create deterministic per-issue workspaces and preserve them across runs.
42- Stop active runs when issue state changes make them ineligible.
43- Recover from transient failures with exponential backoff.
44- Load runtime behavior from a repository-owned `WORKFLOW.md` contract.
45- Expose operator-visible observability (at minimum structured logs).
46- Support restart recovery without requiring a persistent database.
47
48### 2.2 Non-Goals
49
50- Rich web UI or multi-tenant control plane.
51- Prescribing a specific dashboard or terminal UI implementation.
52- General-purpose workflow engine or distributed job scheduler.
53- Built-in business logic for how to edit tickets, PRs, or comments. (That logic lives in the
54 workflow prompt and agent tooling.)
55- Mandating strong sandbox controls beyond what the coding agent and host OS provide.
56- Mandating a single default approval, sandbox, or operator-confirmation posture for all
57 implementations.
58
59## 3. System Overview
60
61### 3.1 Main Components
62
631. `Workflow Loader`
64 - Reads `WORKFLOW.md`.
65 - Parses YAML front matter and prompt body.
66 - Returns `{config, prompt_template}`.
67
682. `Config Layer`
69 - Exposes typed getters for workflow config values.
70 - Applies defaults and environment variable indirection.
71 - Performs validation used by the orchestrator before dispatch.
72
733. `Issue Tracker Client`
74 - Fetches candidate issues in active states.
75 - Fetches current states for specific issue IDs (reconciliation).
76 - Fetches terminal-state issues during startup cleanup.
77 - Normalizes tracker payloads into a stable issue model.
78
794. `Orchestrator`
80 - Owns the poll tick.
81 - Owns the in-memory runtime state.
82 - Decides which issues to dispatch, retry, stop, or release.
83 - Tracks session metrics and retry queue state.
84
855. `Workspace Manager`
86 - Maps issue identifiers to workspace paths.
87 - Ensures per-issue workspace directories exist.
88 - Runs workspace lifecycle hooks.
89 - Cleans workspaces for terminal issues.
90
916. `Agent Runner`
92 - Creates workspace.
93 - Builds prompt from issue + workflow template.
94 - Launches the coding agent app-server client.
95 - Streams agent updates back to the orchestrator.
96
977. `Status Surface` (optional)
98 - Presents human-readable runtime status (for example terminal output, dashboard, or other
99 operator-facing view).
100
1018. `Logging`
102 - Emits structured runtime logs to one or more configured sinks.
103
104### 3.2 Abstraction Levels
105
106Symphony is easiest to port when kept in these layers:
107
1081. `Policy Layer` (repo-defined)
109 - `WORKFLOW.md` prompt body.
110 - Team-specific rules for ticket handling, validation, and handoff.
111
1122. `Configuration Layer` (typed getters)
113 - Parses front matter into typed runtime settings.
114 - Handles defaults, environment tokens, and path normalization.
115
1163. `Coordination Layer` (orchestrator)
117 - Polling loop, issue eligibility, concurrency, retries, reconciliation.
118
1194. `Execution Layer` (workspace + agent subprocess)
120 - Filesystem lifecycle, workspace preparation, coding-agent protocol.
121
1225. `Integration Layer` (Linear adapter)
123 - API calls and normalization for tracker data.
124
1256. `Observability Layer` (logs + optional status surface)
126 - Operator visibility into orchestrator and agent behavior.
127
128### 3.3 External Dependencies
129
130- Issue tracker API (Linear for `tracker.kind: linear` in this specification version).
131- Local filesystem for workspaces and logs.
132- Optional workspace population tooling (for example Git CLI, if used).
133- Coding-agent executable that supports JSON-RPC-like app-server mode over stdio.
134- Host environment authentication for the issue tracker and coding agent.
135
136## 4. Core Domain Model
137
138### 4.1 Entities
139
140#### 4.1.1 Issue
141
142Normalized issue record used by orchestration, prompt rendering, and observability output.
143
144Fields:
145
146- `id` (string)
147 - Stable tracker-internal ID.
148- `identifier` (string)
149 - Human-readable ticket key (example: `ABC-123`).
150- `title` (string)
151- `description` (string or null)
152- `priority` (integer or null)
153 - Lower numbers are higher priority in dispatch sorting.
154- `state` (string)
155 - Current tracker state name.
156- `branch_name` (string or null)
157 - Tracker-provided branch metadata if available.
158- `url` (string or null)
159- `labels` (list of strings)
160 - Normalized to lowercase.
161- `blocked_by` (list of blocker refs)
162 - Each blocker ref contains:
163 - `id` (string or null)
164 - `identifier` (string or null)
165 - `state` (string or null)
166- `created_at` (timestamp or null)
167- `updated_at` (timestamp or null)
168
169#### 4.1.2 Workflow Definition
170
171Parsed `WORKFLOW.md` payload:
172
173- `config` (map)
174 - YAML front matter root object.
175- `prompt_template` (string)
176 - Markdown body after front matter, trimmed.
177
178#### 4.1.3 Service Config (Typed View)
179
180Typed runtime values derived from `WorkflowDefinition.config` plus environment resolution.
181
182Examples:
183
184- poll interval
185- workspace root
186- active and terminal issue states
187- concurrency limits
188- coding-agent executable/args/timeouts
189- workspace hooks
190
191#### 4.1.4 Workspace
192
193Filesystem workspace assigned to one issue identifier.
194
195Fields (logical):
196
197- `path` (workspace path; current runtime typically uses absolute paths, but relative roots are
198 possible if configured without path separators)
199- `workspace_key` (sanitized issue identifier)
200- `created_now` (boolean, used to gate `after_create` hook)
201
202#### 4.1.5 Run Attempt
203
204One execution attempt for one issue.
205
206Fields (logical):
207
208- `issue_id`
209- `issue_identifier`
210- `attempt` (integer or null, `null` for first run, `>=1` for retries/continuation)
211- `workspace_path`
212- `started_at`
213- `status`
214- `error` (optional)
215
216#### 4.1.6 Live Session (Agent Session Metadata)
217
218State tracked while a coding-agent subprocess is running.
219
220Fields:
221
222- `session_id` (string, `<thread_id>-<turn_id>`)
223- `thread_id` (string)
224- `turn_id` (string)
225- `codex_app_server_pid` (string or null)
226- `last_codex_event` (string/enum or null)
227- `last_codex_timestamp` (timestamp or null)
228- `last_codex_message` (summarized payload)
229- `codex_input_tokens` (integer)
230- `codex_output_tokens` (integer)
231- `codex_total_tokens` (integer)
232- `last_reported_input_tokens` (integer)
233- `last_reported_output_tokens` (integer)
234- `last_reported_total_tokens` (integer)
235- `turn_count` (integer)
236 - Number of coding-agent turns started within the current worker lifetime.
237
238#### 4.1.7 Retry Entry
239
240Scheduled retry state for an issue.
241
242Fields:
243
244- `issue_id`
245- `identifier` (best-effort human ID for status surfaces/logs)
246- `attempt` (integer, 1-based for retry queue)
247- `due_at_ms` (monotonic clock timestamp)
248- `timer_handle` (runtime-specific timer reference)
249- `error` (string or null)
250
251#### 4.1.8 Orchestrator Runtime State
252
253Single authoritative in-memory state owned by the orchestrator.
254
255Fields:
256
257- `poll_interval_ms` (current effective poll interval)
258- `max_concurrent_agents` (current effective global concurrency limit)
259- `running` (map `issue_id -> running entry`)
260- `claimed` (set of issue IDs reserved/running/retrying)
261- `retry_attempts` (map `issue_id -> RetryEntry`)
262- `completed` (set of issue IDs; bookkeeping only, not dispatch gating)
263- `codex_totals` (aggregate tokens + runtime seconds)
264- `codex_rate_limits` (latest rate-limit snapshot from agent events)
265
266### 4.2 Stable Identifiers and Normalization Rules
267
268- `Issue ID`
269 - Use for tracker lookups and internal map keys.
270- `Issue Identifier`
271 - Use for human-readable logs and workspace naming.
272- `Workspace Key`
273 - Derive from `issue.identifier` by replacing any character not in `[A-Za-z0-9._-]` with `_`.
274 - Use the sanitized value for the workspace directory name.
275- `Normalized Issue State`
276 - Compare states after `trim` + `lowercase`.
277- `Session ID`
278 - Compose from coding-agent `thread_id` and `turn_id` as `<thread_id>-<turn_id>`.
279
280## 5. Workflow Specification (Repository Contract)
281
282### 5.1 File Discovery and Path Resolution
283
284Workflow file path precedence:
285
2861. Explicit application/runtime setting (set by CLI startup path).
2872. Default: `WORKFLOW.md` in the current process working directory.
288
289Loader behavior:
290
291- If the file cannot be read, return `missing_workflow_file` error.
292- The workflow file is expected to be repository-owned and version-controlled.
293
294### 5.2 File Format
295
296`WORKFLOW.md` is a Markdown file with optional YAML front matter.
297
298Design note:
299
300- `WORKFLOW.md` should be self-contained enough to describe and run different workflows (prompt,
301 runtime settings, hooks, and tracker selection/config) without requiring out-of-band
302 service-specific configuration.
303
304Parsing rules:
305
306- If file starts with `---`, parse lines until the next `---` as YAML front matter.
307- Remaining lines become the prompt body.
308- If front matter is absent, treat the entire file as prompt body and use an empty config map.
309- YAML front matter must decode to a map/object; non-map YAML is an error.
310- Prompt body is trimmed before use.
311
312Returned workflow object:
313
314- `config`: front matter root object (not nested under a `config` key).
315- `prompt_template`: trimmed Markdown body.
316
317### 5.3 Front Matter Schema
318
319Top-level keys:
320
321- `tracker`
322- `polling`
323- `workspace`
324- `hooks`
325- `agent`
326- `codex`
327
328Unknown keys should be ignored for forward compatibility.
329
330Note:
331
332- The workflow front matter is extensible. Optional extensions may define additional top-level keys
333 (for example `server`) without changing the core schema above.
334- Extensions should document their field schema, defaults, validation rules, and whether changes
335 apply dynamically or require restart.
336- Common extension: `server.port` (integer) enables the optional HTTP server described in Section
337 13.7.
338
339#### 5.3.1 `tracker` (object)
340
341Fields:
342
343- `kind` (string)
344 - Required for dispatch.
345 - Current supported value: `linear`
346- `endpoint` (string)
347 - Default for `tracker.kind == "linear"`: `https://api.linear.app/graphql`
348- `api_key` (string)
349 - May be a literal token or `$VAR_NAME`.
350 - Canonical environment variable for `tracker.kind == "linear"`: `LINEAR_API_KEY`.
351 - If `$VAR_NAME` resolves to an empty string, treat the key as missing.
352- `project_slug` (string)
353 - Required for dispatch when `tracker.kind == "linear"`.
354- `active_states` (list of strings or comma-separated string)
355 - Default: `Todo`, `In Progress`
356- `terminal_states` (list of strings or comma-separated string)
357 - Default: `Closed`, `Cancelled`, `Canceled`, `Duplicate`, `Done`
358
359#### 5.3.2 `polling` (object)
360
361Fields:
362
363- `interval_ms` (integer or string integer)
364 - Default: `30000`
365 - Changes should be re-applied at runtime and affect future tick scheduling without restart.
366
367#### 5.3.3 `workspace` (object)
368
369Fields:
370
371- `root` (path string or `$VAR`)
372 - Default: `<system-temp>/symphony_workspaces`
373 - `~` and strings containing path separators are expanded.
374 - Bare strings without path separators are preserved as-is (relative roots are allowed but
375 discouraged).
376
377#### 5.3.4 `hooks` (object)
378
379Fields:
380
381- `after_create` (multiline shell script string, optional)
382 - Runs only when a workspace directory is newly created.
383 - Failure aborts workspace creation.
384- `before_run` (multiline shell script string, optional)
385 - Runs before each agent attempt after workspace preparation and before launching the coding
386 agent.
387 - Failure aborts the current attempt.
388- `after_run` (multiline shell script string, optional)
389 - Runs after each agent attempt (success, failure, timeout, or cancellation) once the workspace
390 exists.
391 - Failure is logged but ignored.
392- `before_remove` (multiline shell script string, optional)
393 - Runs before workspace deletion if the directory exists.
394 - Failure is logged but ignored; cleanup still proceeds.
395- `timeout_ms` (integer, optional)
396 - Default: `60000`
397 - Applies to all workspace hooks.
398 - Non-positive values should be treated as invalid and fall back to the default.
399 - Changes should be re-applied at runtime for future hook executions.
400
401#### 5.3.5 `agent` (object)
402
403Fields:
404
405- `max_concurrent_agents` (integer or string integer)
406 - Default: `10`
407 - Changes should be re-applied at runtime and affect subsequent dispatch decisions.
408- `max_retry_backoff_ms` (integer or string integer)
409 - Default: `300000` (5 minutes)
410 - Changes should be re-applied at runtime and affect future retry scheduling.
411- `max_concurrent_agents_by_state` (map `state_name -> positive integer`)
412 - Default: empty map.
413 - State keys are normalized (`trim` + `lowercase`) for lookup.
414 - Invalid entries (non-positive or non-numeric) are ignored.
415
416#### 5.3.6 `codex` (object)
417
418Fields:
419
420For Codex-owned config values such as `approval_policy`, `thread_sandbox`, and
421`turn_sandbox_policy`, supported values are defined by the targeted Codex app-server version.
422Implementors should treat them as pass-through Codex config values rather than relying on a
423hand-maintained enum in this spec. To inspect the installed Codex schema, run
424`codex app-server generate-json-schema --out <dir>` and inspect the relevant definitions referenced
425by `v2/ThreadStartParams.json` and `v2/TurnStartParams.json`. Implementations may validate these
426fields locally if they want stricter startup checks.
427
428- `command` (string shell command)
429 - Default: `codex app-server`
430 - The runtime launches this command via `bash -lc` in the workspace directory.
431 - The launched process must speak a compatible app-server protocol over stdio.
432- `approval_policy` (Codex `AskForApproval` value)
433 - Default: implementation-defined.
434- `thread_sandbox` (Codex `SandboxMode` value)
435 - Default: implementation-defined.
436- `turn_sandbox_policy` (Codex `SandboxPolicy` value)
437 - Default: implementation-defined.
438- `turn_timeout_ms` (integer)
439 - Default: `3600000` (1 hour)
440- `read_timeout_ms` (integer)
441 - Default: `5000`
442- `stall_timeout_ms` (integer)
443 - Default: `300000` (5 minutes)
444 - If `<= 0`, stall detection is disabled.
445
446### 5.4 Prompt Template Contract
447
448The Markdown body of `WORKFLOW.md` is the per-issue prompt template.
449
450Rendering requirements:
451
452- Use a strict template engine (Liquid-compatible semantics are sufficient).
453- Unknown variables must fail rendering.
454- Unknown filters must fail rendering.
455
456Template input variables:
457
458- `issue` (object)
459 - Includes all normalized issue fields, including labels and blockers.
460- `attempt` (integer or null)
461 - `null`/absent on first attempt.
462 - Integer on retry or continuation run.
463
464Fallback prompt behavior:
465
466- If the workflow prompt body is empty, the runtime may use a minimal default prompt
467 (`You are working on an issue from Linear.`).
468- Workflow file read/parse failures are configuration/validation errors and should not silently fall
469 back to a prompt.
470
471### 5.5 Workflow Validation and Error Surface
472
473Error classes:
474
475- `missing_workflow_file`
476- `workflow_parse_error`
477- `workflow_front_matter_not_a_map`
478- `template_parse_error` (during prompt rendering)
479- `template_render_error` (unknown variable/filter, invalid interpolation)
480
481Dispatch gating behavior:
482
483- Workflow file read/YAML errors block new dispatches until fixed.
484- Template errors fail only the affected run attempt.
485
486## 6. Configuration Specification
487
488### 6.1 Source Precedence and Resolution Semantics
489
490Configuration precedence:
491
4921. Workflow file path selection (runtime setting -> cwd default).
4932. YAML front matter values.
4943. Environment indirection via `$VAR_NAME` inside selected YAML values.
4954. Built-in defaults.
496
497Value coercion semantics:
498
499- Path/command fields support:
500 - `~` home expansion
501 - `$VAR` expansion for env-backed path values
502 - Apply expansion only to values intended to be local filesystem paths; do not rewrite URIs or
503 arbitrary shell command strings.
504
505### 6.2 Dynamic Reload Semantics
506
507Dynamic reload is required:
508
509- The software should watch `WORKFLOW.md` for changes.
510- On change, it should re-read and re-apply workflow config and prompt template without restart.
511- The software should attempt to adjust live behavior to the new config (for example polling
512 cadence, concurrency limits, active/terminal states, codex settings, workspace paths/hooks, and
513 prompt content for future runs).
514- Reloaded config applies to future dispatch, retry scheduling, reconciliation decisions, hook
515 execution, and agent launches.
516- Implementations are not required to restart in-flight agent sessions automatically when config
517 changes.
518- Extensions that manage their own listeners/resources (for example an HTTP server port change) may
519 require restart unless the implementation explicitly supports live rebind.
520- Implementations should also re-validate/reload defensively during runtime operations (for example
521 before dispatch) in case filesystem watch events are missed.
522- Invalid reloads should not crash the service; keep operating with the last known good effective
523 configuration and emit an operator-visible error.
524
525### 6.3 Dispatch Preflight Validation
526
527This validation is a scheduler preflight run before attempting to dispatch new work. It validates
528the workflow/config needed to poll and launch workers, not a full audit of all possible workflow
529behavior.
530
531Startup validation:
532
533- Validate configuration before starting the scheduling loop.
534- If startup validation fails, fail startup and emit an operator-visible error.
535
536Per-tick dispatch validation:
537
538- Re-validate before each dispatch cycle.
539- If validation fails, skip dispatch for that tick, keep reconciliation active, and emit an
540 operator-visible error.
541
542Validation checks:
543
544- Workflow file can be loaded and parsed.
545- `tracker.kind` is present and supported.
546- `tracker.api_key` is present after `$` resolution.
547- `tracker.project_slug` is present when required by the selected tracker kind.
548- `codex.command` is present and non-empty.
549
550### 6.4 Config Fields Summary (Cheat Sheet)
551
552This section is intentionally redundant so a coding agent can implement the config layer quickly.
553
554- `tracker.kind`: string, required, currently `linear`
555- `tracker.endpoint`: string, default `https://api.linear.app/graphql` when `tracker.kind=linear`
556- `tracker.api_key`: string or `$VAR`, canonical env `LINEAR_API_KEY` when `tracker.kind=linear`
557- `tracker.project_slug`: string, required when `tracker.kind=linear`
558- `tracker.active_states`: list/string, default `Todo, In Progress`
559- `tracker.terminal_states`: list/string, default `Closed, Cancelled, Canceled, Duplicate, Done`
560- `polling.interval_ms`: integer, default `30000`
561- `workspace.root`: path, default `<system-temp>/symphony_workspaces`
562- `hooks.after_create`: shell script or null
563- `hooks.before_run`: shell script or null
564- `hooks.after_run`: shell script or null
565- `hooks.before_remove`: shell script or null
566- `hooks.timeout_ms`: integer, default `60000`
567- `agent.max_concurrent_agents`: integer, default `10`
568- `agent.max_turns`: integer, default `20`
569- `agent.max_retry_backoff_ms`: integer, default `300000` (5m)
570- `agent.max_concurrent_agents_by_state`: map of positive integers, default `{}`
571- `codex.command`: shell command string, default `codex app-server`
572- `codex.approval_policy`: Codex `AskForApproval` value, default implementation-defined
573- `codex.thread_sandbox`: Codex `SandboxMode` value, default implementation-defined
574- `codex.turn_sandbox_policy`: Codex `SandboxPolicy` value, default implementation-defined
575- `codex.turn_timeout_ms`: integer, default `3600000`
576- `codex.read_timeout_ms`: integer, default `5000`
577- `codex.stall_timeout_ms`: integer, default `300000`
578- `server.port` (extension): integer, optional; enables the optional HTTP server, `0` may be used
579 for ephemeral local bind, and CLI `--port` overrides it
580
581## 7. Orchestration State Machine
582
583The orchestrator is the only component that mutates scheduling state. All worker outcomes are
584reported back to it and converted into explicit state transitions.
585
586### 7.1 Issue Orchestration States
587
588This is not the same as tracker states (`Todo`, `In Progress`, etc.). This is the service's internal
589claim state.
590
5911. `Unclaimed`
592 - Issue is not running and has no retry scheduled.
593
5942. `Claimed`
595 - Orchestrator has reserved the issue to prevent duplicate dispatch.
596 - In practice, claimed issues are either `Running` or `RetryQueued`.
597
5983. `Running`
599 - Worker task exists and the issue is tracked in `running` map.
600
6014. `RetryQueued`
602 - Worker is not running, but a retry timer exists in `retry_attempts`.
603
6045. `Released`
605 - Claim removed because issue is terminal, non-active, missing, or retry path completed without
606 re-dispatch.
607
608Important nuance:
609
610- A successful worker exit does not mean the issue is done forever.
611- The worker may continue through multiple back-to-back coding-agent turns before it exits.
612- After each normal turn completion, the worker re-checks the tracker issue state.
613- If the issue is still in an active state, the worker should start another turn on the same live
614 coding-agent thread in the same workspace, up to `agent.max_turns`.
615- The first turn should use the full rendered task prompt.
616- Continuation turns should send only continuation guidance to the existing thread, not resend the
617 original task prompt that is already present in thread history.
618- Once the worker exits normally, the orchestrator still schedules a short continuation retry
619 (about 1 second) so it can re-check whether the issue remains active and needs another worker
620 session.
621
622### 7.2 Run Attempt Lifecycle
623
624A run attempt transitions through these phases:
625
6261. `PreparingWorkspace`
6272. `BuildingPrompt`
6283. `LaunchingAgentProcess`
6294. `InitializingSession`
6305. `StreamingTurn`
6316. `Finishing`
6327. `Succeeded`
6338. `Failed`
6349. `TimedOut`
63510. `Stalled`
63611. `CanceledByReconciliation`
637
638Distinct terminal reasons are important because retry logic and logs differ.
639
640### 7.3 Transition Triggers
641
642- `Poll Tick`
643 - Reconcile active runs.
644 - Validate config.
645 - Fetch candidate issues.
646 - Dispatch until slots are exhausted.
647
648- `Worker Exit (normal)`
649 - Remove running entry.
650 - Update aggregate runtime totals.
651 - Schedule continuation retry (attempt `1`) after the worker exhausts or finishes its in-process
652 turn loop.
653
654- `Worker Exit (abnormal)`
655 - Remove running entry.
656 - Update aggregate runtime totals.
657 - Schedule exponential-backoff retry.
658
659- `Codex Update Event`
660 - Update live session fields, token counters, and rate limits.
661
662- `Retry Timer Fired`
663 - Re-fetch active candidates and attempt re-dispatch, or release claim if no longer eligible.
664
665- `Reconciliation State Refresh`
666 - Stop runs whose issue states are terminal or no longer active.
667
668- `Stall Timeout`
669 - Kill worker and schedule retry.
670
671### 7.4 Idempotency and Recovery Rules
672
673- The orchestrator serializes state mutations through one authority to avoid duplicate dispatch.
674- `claimed` and `running` checks are required before launching any worker.
675- Reconciliation runs before dispatch on every tick.
676- Restart recovery is tracker-driven and filesystem-driven (no durable orchestrator DB required).
677- Startup terminal cleanup removes stale workspaces for issues already in terminal states.
678
679## 8. Polling, Scheduling, and Reconciliation
680
681### 8.1 Poll Loop
682
683At startup, the service validates config, performs startup cleanup, schedules an immediate tick, and
684then repeats every `polling.interval_ms`.
685
686The effective poll interval should be updated when workflow config changes are re-applied.
687
688Tick sequence:
689
6901. Reconcile running issues.
6912. Run dispatch preflight validation.
6923. Fetch candidate issues from tracker using active states.
6934. Sort issues by dispatch priority.
6945. Dispatch eligible issues while slots remain.
6956. Notify observability/status consumers of state changes.
696
697If per-tick validation fails, dispatch is skipped for that tick, but reconciliation still happens
698first.
699
700### 8.2 Candidate Selection Rules
701
702An issue is dispatch-eligible only if all are true:
703
704- It has `id`, `identifier`, `title`, and `state`.
705- Its state is in `active_states` and not in `terminal_states`.
706- It is not already in `running`.
707- It is not already in `claimed`.
708- Global concurrency slots are available.
709- Per-state concurrency slots are available.
710- Blocker rule for `Todo` state passes:
711 - If the issue state is `Todo`, do not dispatch when any blocker is non-terminal.
712
713Sorting order (stable intent):
714
7151. `priority` ascending (1..4 are preferred; null/unknown sorts last)
7162. `created_at` oldest first
7173. `identifier` lexicographic tie-breaker
718
719### 8.3 Concurrency Control
720
721Global limit:
722
723- `available_slots = max(max_concurrent_agents - running_count, 0)`
724
725Per-state limit:
726
727- `max_concurrent_agents_by_state[state]` if present (state key normalized)
728- otherwise fallback to global limit
729
730The runtime counts issues by their current tracked state in the `running` map.
731
732### 8.4 Retry and Backoff
733
734Retry entry creation:
735
736- Cancel any existing retry timer for the same issue.
737- Store `attempt`, `identifier`, `error`, `due_at_ms`, and new timer handle.
738
739Backoff formula:
740
741- Normal continuation retries after a clean worker exit use a short fixed delay of `1000` ms.
742- Failure-driven retries use `delay = min(10000 * 2^(attempt - 1), agent.max_retry_backoff_ms)`.
743- Power is capped by the configured max retry backoff (default `300000` / 5m).
744
745Retry handling behavior:
746
7471. Fetch active candidate issues (not all issues).
7482. Find the specific issue by `issue_id`.
7493. If not found, release claim.
7504. If found and still candidate-eligible:
751 - Dispatch if slots are available.
752 - Otherwise requeue with error `no available orchestrator slots`.
7535. If found but no longer active, release claim.
754
755Note:
756
757- Terminal-state workspace cleanup is handled by startup cleanup and active-run reconciliation
758 (including terminal transitions for currently running issues).
759- Retry handling mainly operates on active candidates and releases claims when the issue is absent,
760 rather than performing terminal cleanup itself.
761
762### 8.5 Active Run Reconciliation
763
764Reconciliation runs every tick and has two parts.
765
766Part A: Stall detection
767
768- For each running issue, compute `elapsed_ms` since:
769 - `last_codex_timestamp` if any event has been seen, else
770 - `started_at`
771- If `elapsed_ms > codex.stall_timeout_ms`, terminate the worker and queue a retry.
772- If `stall_timeout_ms <= 0`, skip stall detection entirely.
773
774Part B: Tracker state refresh
775
776- Fetch current issue states for all running issue IDs.
777- For each running issue:
778 - If tracker state is terminal: terminate worker and clean workspace.
779 - If tracker state is still active: update the in-memory issue snapshot.
780 - If tracker state is neither active nor terminal: terminate worker without workspace cleanup.
781- If state refresh fails, keep workers running and try again on the next tick.
782
783### 8.6 Startup Terminal Workspace Cleanup
784
785When the service starts:
786
7871. Query tracker for issues in terminal states.
7882. For each returned issue identifier, remove the corresponding workspace directory.
7893. If the terminal-issues fetch fails, log a warning and continue startup.
790
791This prevents stale terminal workspaces from accumulating after restarts.
792
793## 9. Workspace Management and Safety
794
795### 9.1 Workspace Layout
796
797Workspace root:
798
799- `workspace.root` (normalized path; the current config layer expands path-like values and preserves
800 bare relative names)
801
802Per-issue workspace path:
803
804- `<workspace.root>/<sanitized_issue_identifier>`
805
806Workspace persistence:
807
808- Workspaces are reused across runs for the same issue.
809- Successful runs do not auto-delete workspaces.
810
811### 9.2 Workspace Creation and Reuse
812
813Input: `issue.identifier`
814
815Algorithm summary:
816
8171. Sanitize identifier to `workspace_key`.
8182. Compute workspace path under workspace root.
8193. Ensure the workspace path exists as a directory.
8204. Mark `created_now=true` only if the directory was created during this call; otherwise
821 `created_now=false`.
8225. If `created_now=true`, run `after_create` hook if configured.
823
824Notes:
825
826- This section does not assume any specific repository/VCS workflow.
827- Workspace preparation beyond directory creation (for example dependency bootstrap, checkout/sync,
828 code generation) is implementation-defined and is typically handled via hooks.
829
830### 9.3 Optional Workspace Population (Implementation-Defined)
831
832The spec does not require any built-in VCS or repository bootstrap behavior.
833
834Implementations may populate or synchronize the workspace using implementation-defined logic and/or
835hooks (for example `after_create` and/or `before_run`).
836
837Failure handling:
838
839- Workspace population/synchronization failures return an error for the current attempt.
840- If failure happens while creating a brand-new workspace, implementations may remove the partially
841 prepared directory.
842- Reused workspaces should not be destructively reset on population failure unless that policy is
843 explicitly chosen and documented.
844
845### 9.4 Workspace Hooks
846
847Supported hooks:
848
849- `hooks.after_create`
850- `hooks.before_run`
851- `hooks.after_run`
852- `hooks.before_remove`
853
854Execution contract:
855
856- Execute in a local shell context appropriate to the host OS, with the workspace directory as
857 `cwd`.
858- On POSIX systems, `sh -lc <script>` (or a stricter equivalent such as `bash -lc <script>`) is a
859 conforming default.
860- Hook timeout uses `hooks.timeout_ms`; default: `60000 ms`.
861- Log hook start, failures, and timeouts.
862
863Failure semantics:
864
865- `after_create` failure or timeout is fatal to workspace creation.
866- `before_run` failure or timeout is fatal to the current run attempt.
867- `after_run` failure or timeout is logged and ignored.
868- `before_remove` failure or timeout is logged and ignored.
869
870### 9.5 Safety Invariants
871
872This is the most important portability constraint.
873
874Invariant 1: Run the coding agent only in the per-issue workspace path.
875
876- Before launching the coding-agent subprocess, validate:
877 - `cwd == workspace_path`
878
879Invariant 2: Workspace path must stay inside workspace root.
880
881- Normalize both paths to absolute.
882- Require `workspace_path` to have `workspace_root` as a prefix directory.
883- Reject any path outside the workspace root.
884
885Invariant 3: Workspace key is sanitized.
886
887- Only `[A-Za-z0-9._-]` allowed in workspace directory names.
888- Replace all other characters with `_`.
889
890## 10. Agent Runner Protocol (Coding Agent Integration)
891
892This section defines the language-neutral contract for integrating a coding agent app-server.
893
894Compatibility profile:
895
896- The normative contract is message ordering, required behaviors, and the logical fields that must
897 be extracted (for example session IDs, completion state, approval handling, and usage/rate-limit
898 telemetry).
899- Exact JSON field names may vary slightly across compatible app-server versions.
900- Implementations should tolerate equivalent payload shapes when they carry the same logical
901 meaning, especially for nested IDs, approval requests, user-input-required signals, and
902 token/rate-limit metadata.
903
904### 10.1 Launch Contract
905
906Subprocess launch parameters:
907
908- Command: `codex.command`
909- Invocation: `bash -lc <codex.command>`
910- Working directory: workspace path
911- Stdout/stderr: separate streams
912- Framing: line-delimited protocol messages on stdout (JSON-RPC-like JSON per line)
913
914Notes:
915
916- The default command is `codex app-server`.
917- Approval policy, cwd, and prompt are expressed in the protocol messages in Section 10.2.
918
919Recommended additional process settings:
920
921- Max line size: 10 MB (for safe buffering)
922
923### 10.2 Session Startup Handshake
924
925Reference: https://developers.openai.com/codex/app-server/
926
927The client must send these protocol messages in order:
928
929Illustrative startup transcript (equivalent payload shapes are acceptable if they preserve the same
930semantics):
931
932```json
933{"id":1,"method":"initialize","params":{"clientInfo":{"name":"symphony","version":"1.0"},"capabilities":{}}}
934{"method":"initialized","params":{}}
935{"id":2,"method":"thread/start","params":{"approvalPolicy":"<implementation-defined>","sandbox":"<implementation-defined>","cwd":"/abs/workspace"}}
936{"id":3,"method":"turn/start","params":{"threadId":"<thread-id>","input":[{"type":"text","text":"<rendered prompt-or-continuation-guidance>"}],"cwd":"/abs/workspace","title":"ABC-123: Example","approvalPolicy":"<implementation-defined>","sandboxPolicy":{"type":"<implementation-defined>"}}}
937```
938
9391. `initialize` request
940 - Params include:
941 - `clientInfo` object (for example `{name, version}`)
942 - `capabilities` object (may be empty)
943 - If the targeted Codex app-server requires capability negotiation for dynamic tools, include the
944 necessary capability flag(s) here.
945 - Wait for response (`read_timeout_ms`)
9462. `initialized` notification
9473. `thread/start` request
948 - Params include:
949 - `approvalPolicy` = implementation-defined session approval policy value
950 - `sandbox` = implementation-defined session sandbox value
951 - `cwd` = absolute workspace path
952 - If optional client-side tools are implemented, include their advertised tool specs using the
953 protocol mechanism supported by the targeted Codex app-server version.
9544. `turn/start` request
955 - Params include:
956 - `threadId`
957 - `input` = single text item containing rendered prompt for the first turn, or continuation
958 guidance for later turns on the same thread
959 - `cwd`
960 - `title` = `<issue.identifier>: <issue.title>`
961 - `approvalPolicy` = implementation-defined turn approval policy value
962 - `sandboxPolicy` = implementation-defined object-form sandbox policy payload when required by
963 the targeted app-server version
964
965Session identifiers:
966
967- Read `thread_id` from `thread/start` result `result.thread.id`
968- Read `turn_id` from each `turn/start` result `result.turn.id`
969- Emit `session_id = "<thread_id>-<turn_id>"`
970- Reuse the same `thread_id` for all continuation turns inside one worker run
971
972### 10.3 Streaming Turn Processing
973
974The client reads line-delimited messages until the turn terminates.
975
976Completion conditions:
977
978- `turn/completed` -> success
979- `turn/failed` -> failure
980- `turn/cancelled` -> failure
981- turn timeout (`turn_timeout_ms`) -> failure
982- subprocess exit -> failure
983
984Continuation processing:
985
986- If the worker decides to continue after a successful turn, it should issue another `turn/start`
987 on the same live `threadId`.
988- The app-server subprocess should remain alive across those continuation turns and be stopped only
989 when the worker run is ending.
990
991Line handling requirements:
992
993- Read protocol messages from stdout only.
994- Buffer partial stdout lines until newline arrives.
995- Attempt JSON parse on complete stdout lines.
996- Stderr is not part of the protocol stream:
997 - ignore it or log it as diagnostics
998 - do not attempt protocol JSON parsing on stderr
999
1000### 10.4 Emitted Runtime Events (Upstream to Orchestrator)
1001
1002The app-server client emits structured events to the orchestrator callback. Each event should
1003include:
1004
1005- `event` (enum/string)
1006- `timestamp` (UTC timestamp)
1007- `codex_app_server_pid` (if available)
1008- optional `usage` map (token counts)
1009- payload fields as needed
1010
1011Important emitted events may include:
1012
1013- `session_started`
1014- `startup_failed`
1015- `turn_completed`
1016- `turn_failed`
1017- `turn_cancelled`
1018- `turn_ended_with_error`
1019- `turn_input_required`
1020- `approval_auto_approved`
1021- `unsupported_tool_call`
1022- `notification`
1023- `other_message`
1024- `malformed`
1025
1026### 10.5 Approval, Tool Calls, and User Input Policy
1027
1028Approval, sandbox, and user-input behavior is implementation-defined.
1029
1030Policy requirements:
1031
1032- Each implementation should document its chosen approval, sandbox, and operator-confirmation
1033 posture.
1034- Approval requests and user-input-required events must not leave a run stalled indefinitely. An
1035 implementation should either satisfy them, surface them to an operator, auto-resolve them, or
1036 fail the run according to its documented policy.
1037
1038Example high-trust behavior:
1039
1040- Auto-approve command execution approvals for the session.
1041- Auto-approve file-change approvals for the session.
1042- Treat user-input-required turns as hard failure.
1043
1044Unsupported dynamic tool calls:
1045
1046- Supported dynamic tool calls that are explicitly implemented and advertised by the runtime should
1047 be handled according to their extension contract.
1048- If the agent requests a dynamic tool call (`item/tool/call`) that is not supported, return a tool
1049 failure response and continue the session.
1050- This prevents the session from stalling on unsupported tool execution paths.
1051
1052Optional client-side tool extension:
1053
1054- An implementation may expose a limited set of client-side tools to the app-server session.
1055- Current optional standardized tool: `linear_graphql`.
1056- If implemented, supported tools should be advertised to the app-server session during startup
1057 using the protocol mechanism supported by the targeted Codex app-server version.
1058- Unsupported tool names should still return a failure result and continue the session.
1059
1060`linear_graphql` extension contract:
1061
1062- Purpose: execute a raw GraphQL query or mutation against Linear using Symphony's configured
1063 tracker auth for the current session.
1064- Availability: only meaningful when `tracker.kind == "linear"` and valid Linear auth is configured.
1065- Preferred input shape:
1066
1067 ```json
1068 {
1069 "query": "single GraphQL query or mutation document",
1070 "variables": {
1071 "optional": "graphql variables object"
1072 }
1073 }
1074 ```
1075
1076- `query` must be a non-empty string.
1077- `query` must contain exactly one GraphQL operation.
1078- `variables` is optional and, when present, must be a JSON object.
1079- Implementations may additionally accept a raw GraphQL query string as shorthand input.
1080- Execute one GraphQL operation per tool call.
1081- If the provided document contains multiple operations, reject the tool call as invalid input.
1082- `operationName` selection is intentionally out of scope for this extension.
1083- Reuse the configured Linear endpoint and auth from the active Symphony workflow/runtime config; do
1084 not require the coding agent to read raw tokens from disk.
1085- Tool result semantics:
1086 - transport success + no top-level GraphQL `errors` -> `success=true`
1087 - top-level GraphQL `errors` present -> `success=false`, but preserve the GraphQL response body
1088 for debugging
1089 - invalid input, missing auth, or transport failure -> `success=false` with an error payload
1090- Return the GraphQL response or error payload as structured tool output that the model can inspect
1091 in-session.
1092
1093Illustrative responses (equivalent payload shapes are acceptable if they preserve the same outcome):
1094
1095```json
1096{"id":"<approval-id>","result":{"approved":true}}
1097{"id":"<tool-call-id>","result":{"success":false,"error":"unsupported_tool_call"}}
1098```
1099
1100Hard failure on user input requirement:
1101
1102- If the agent requests user input, fail the run attempt immediately.
1103- The client detects this via:
1104 - explicit method (`item/tool/requestUserInput`), or
1105 - turn methods/flags indicating input is required.
1106
1107### 10.6 Timeouts and Error Mapping
1108
1109Timeouts:
1110
1111- `codex.read_timeout_ms`: request/response timeout during startup and sync requests
1112- `codex.turn_timeout_ms`: total turn stream timeout
1113- `codex.stall_timeout_ms`: enforced by orchestrator based on event inactivity
1114
1115Error mapping (recommended normalized categories):
1116
1117- `codex_not_found`
1118- `invalid_workspace_cwd`
1119- `response_timeout`
1120- `turn_timeout`
1121- `port_exit`
1122- `response_error`
1123- `turn_failed`
1124- `turn_cancelled`
1125- `turn_input_required`
1126
1127### 10.7 Agent Runner Contract
1128
1129The `Agent Runner` wraps workspace + prompt + app-server client.
1130
1131Behavior:
1132
11331. Create/reuse workspace for issue.
11342. Build prompt from workflow template.
11353. Start app-server session.
11364. Forward app-server events to orchestrator.
11375. On any error, fail the worker attempt (the orchestrator will retry).
1138
1139Note:
1140
1141- Workspaces are intentionally preserved after successful runs.
1142
1143## 11. Issue Tracker Integration Contract (Linear-Compatible)
1144
1145### 11.1 Required Operations
1146
1147An implementation must support these tracker adapter operations:
1148
11491. `fetch_candidate_issues()`
1150 - Return issues in configured active states for a configured project.
1151
11522. `fetch_issues_by_states(state_names)`
1153 - Used for startup terminal cleanup.
1154
11553. `fetch_issue_states_by_ids(issue_ids)`
1156 - Used for active-run reconciliation.
1157
1158### 11.2 Query Semantics (Linear)
1159
1160Linear-specific requirements for `tracker.kind == "linear"`:
1161
1162- `tracker.kind == "linear"`
1163- GraphQL endpoint (default `https://api.linear.app/graphql`)
1164- Auth token sent in `Authorization` header
1165- `tracker.project_slug` maps to Linear project `slugId`
1166- Candidate issue query filters project using `project: { slugId: { eq: $projectSlug } }`
1167- Issue-state refresh query uses GraphQL issue IDs with variable type `[ID!]`
1168- Pagination required for candidate issues
1169- Page size default: `50`
1170- Network timeout: `30000 ms`
1171
1172Important:
1173
1174- Linear GraphQL schema details can drift. Keep query construction isolated and test the exact query
1175 fields/types required by this specification.
1176
1177A non-Linear implementation may change transport details, but the normalized outputs must match the
1178domain model in Section 4.
1179
1180### 11.3 Normalization Rules
1181
1182Candidate issue normalization should produce fields listed in Section 4.1.1.
1183
1184Additional normalization details:
1185
1186- `labels` -> lowercase strings
1187- `blocked_by` -> derived from inverse relations where relation type is `blocks`
1188- `priority` -> integer only (non-integers become null)
1189- `created_at` and `updated_at` -> parse ISO-8601 timestamps
1190
1191### 11.4 Error Handling Contract
1192
1193Recommended error categories:
1194
1195- `unsupported_tracker_kind`
1196- `missing_tracker_api_key`
1197- `missing_tracker_project_slug`
1198- `linear_api_request` (transport failures)
1199- `linear_api_status` (non-200 HTTP)
1200- `linear_graphql_errors`
1201- `linear_unknown_payload`
1202- `linear_missing_end_cursor` (pagination integrity error)
1203
1204Orchestrator behavior on tracker errors:
1205
1206- Candidate fetch failure: log and skip dispatch for this tick.
1207- Running-state refresh failure: log and keep active workers running.
1208- Startup terminal cleanup failure: log warning and continue startup.
1209
1210### 11.5 Tracker Writes (Important Boundary)
1211
1212Symphony does not require first-class tracker write APIs in the orchestrator.
1213
1214- Ticket mutations (state transitions, comments, PR metadata) are typically handled by the coding
1215 agent using tools defined by the workflow prompt.
1216- The service remains a scheduler/runner and tracker reader.
1217- Workflow-specific success often means "reached the next handoff state" (for example
1218 `Human Review`) rather than tracker terminal state `Done`.
1219- If the optional `linear_graphql` client-side tool extension is implemented, it is still part of
1220 the agent toolchain rather than orchestrator business logic.
1221
1222## 12. Prompt Construction and Context Assembly
1223
1224### 12.1 Inputs
1225
1226Inputs to prompt rendering:
1227
1228- `workflow.prompt_template`
1229- normalized `issue` object
1230- optional `attempt` integer (retry/continuation metadata)
1231
1232### 12.2 Rendering Rules
1233
1234- Render with strict variable checking.
1235- Render with strict filter checking.
1236- Convert issue object keys to strings for template compatibility.
1237- Preserve nested arrays/maps (labels, blockers) so templates can iterate.
1238
1239### 12.3 Retry/Continuation Semantics
1240
1241`attempt` should be passed to the template because the workflow prompt may provide different
1242instructions for:
1243
1244- first run (`attempt` null or absent)
1245- continuation run after a successful prior session
1246- retry after error/timeout/stall
1247
1248### 12.4 Failure Semantics
1249
1250If prompt rendering fails:
1251
1252- Fail the run attempt immediately.
1253- Let the orchestrator treat it like any other worker failure and decide retry behavior.
1254
1255## 13. Logging, Status, and Observability
1256
1257### 13.1 Logging Conventions
1258
1259Required context fields for issue-related logs:
1260
1261- `issue_id`
1262- `issue_identifier`
1263
1264Required context for coding-agent session lifecycle logs:
1265
1266- `session_id`
1267
1268Message formatting requirements:
1269
1270- Use stable `key=value` phrasing.
1271- Include action outcome (`completed`, `failed`, `retrying`, etc.).
1272- Include concise failure reason when present.
1273- Avoid logging large raw payloads unless necessary.
1274
1275### 13.2 Logging Outputs and Sinks
1276
1277The spec does not prescribe where logs must go (stderr, file, remote sink, etc.).
1278
1279Requirements:
1280
1281- Operators must be able to see startup/validation/dispatch failures without attaching a debugger.
1282- Implementations may write to one or more sinks.
1283- If a configured log sink fails, the service should continue running when possible and emit an
1284 operator-visible warning through any remaining sink.
1285
1286### 13.3 Runtime Snapshot / Monitoring Interface (Optional but Recommended)
1287
1288If the implementation exposes a synchronous runtime snapshot (for dashboards or monitoring), it
1289should return:
1290
1291- `running` (list of running session rows)
1292- each running row should include `turn_count`
1293- `retrying` (list of retry queue rows)
1294- `codex_totals`
1295 - `input_tokens`
1296 - `output_tokens`
1297 - `total_tokens`
1298 - `seconds_running` (aggregate runtime seconds as of snapshot time, including active sessions)
1299- `rate_limits` (latest coding-agent rate limit payload, if available)
1300
1301Recommended snapshot error modes:
1302
1303- `timeout`
1304- `unavailable`
1305
1306### 13.4 Optional Human-Readable Status Surface
1307
1308A human-readable status surface (terminal output, dashboard, etc.) is optional and
1309implementation-defined.
1310
1311If present, it should draw from orchestrator state/metrics only and must not be required for
1312correctness.
1313
1314### 13.5 Session Metrics and Token Accounting
1315
1316Token accounting rules:
1317
1318- Agent events may include token counts in multiple payload shapes.
1319- Prefer absolute thread totals when available, such as:
1320 - `thread/tokenUsage/updated` payloads
1321 - `total_token_usage` within token-count wrapper events
1322- Ignore delta-style payloads such as `last_token_usage` for dashboard/API totals.
1323- Extract input/output/total token counts leniently from common field names within the selected
1324 payload.
1325- For absolute totals, track deltas relative to last reported totals to avoid double-counting.
1326- Do not treat generic `usage` maps as cumulative totals unless the event type defines them that
1327 way.
1328- Accumulate aggregate totals in orchestrator state.
1329
1330Runtime accounting:
1331
1332- Runtime should be reported as a live aggregate at snapshot/render time.
1333- Implementations may maintain a cumulative counter for ended sessions and add active-session
1334 elapsed time derived from `running` entries (for example `started_at`) when producing a
1335 snapshot/status view.
1336- Add run duration seconds to the cumulative ended-session runtime when a session ends (normal exit
1337 or cancellation/termination).
1338- Continuous background ticking of runtime totals is not required.
1339
1340Rate-limit tracking:
1341
1342- Track the latest rate-limit payload seen in any agent update.
1343- Any human-readable presentation of rate-limit data is implementation-defined.
1344
1345### 13.6 Humanized Agent Event Summaries (Optional)
1346
1347Humanized summaries of raw agent protocol events are optional.
1348
1349If implemented:
1350
1351- Treat them as observability-only output.
1352- Do not make orchestrator logic depend on humanized strings.
1353
1354### 13.7 Optional HTTP Server Extension
1355
1356This section defines an optional HTTP interface for observability and operational control.
1357
1358If implemented:
1359
1360- The HTTP server is an extension and is not required for conformance.
1361- The implementation may serve server-rendered HTML or a client-side application for the dashboard.
1362- The dashboard/API must be observability/control surfaces only and must not become required for
1363 orchestrator correctness.
1364
1365Enablement (extension):
1366
1367- Start the HTTP server when a CLI `--port` argument is provided.
1368- Start the HTTP server when `server.port` is present in `WORKFLOW.md` front matter.
1369- `server.port` is extension configuration and is intentionally not part of the core front-matter
1370 schema in Section 5.3.
1371- Precedence: CLI `--port` overrides `server.port` when both are present.
1372- `server.port` must be an integer. Positive values bind that port. `0` may be used to request an
1373 ephemeral port for local development and tests.
1374- Implementations should bind loopback by default (`127.0.0.1` or host equivalent) unless explicitly
1375 configured otherwise.
1376- Changes to HTTP listener settings (for example `server.port`) do not need to hot-rebind;
1377 restart-required behavior is conformant.
1378
1379#### 13.7.1 Human-Readable Dashboard (`/`)
1380
1381- Host a human-readable dashboard at `/`.
1382- The returned document should depict the current state of the system (for example active sessions,
1383 retry delays, token consumption, runtime totals, recent events, and health/error indicators).
1384- It is up to the implementation whether this is server-generated HTML or a client-side app that
1385 consumes the JSON API below.
1386
1387#### 13.7.2 JSON REST API (`/api/v1/*`)
1388
1389Provide a JSON REST API under `/api/v1/*` for current runtime state and operational debugging.
1390
1391Minimum endpoints:
1392
1393- `GET /api/v1/state`
1394 - Returns a summary view of the current system state (running sessions, retry queue/delays,
1395 aggregate token/runtime totals, latest rate limits, and any additional tracked summary fields).
1396 - Suggested response shape:
1397
1398 ```json
1399 {
1400 "generated_at": "2026-02-24T20:15:30Z",
1401 "counts": {
1402 "running": 2,
1403 "retrying": 1
1404 },
1405 "running": [
1406 {
1407 "issue_id": "abc123",
1408 "issue_identifier": "MT-649",
1409 "state": "In Progress",
1410 "session_id": "thread-1-turn-1",
1411 "turn_count": 7,
1412 "last_event": "turn_completed",
1413 "last_message": "",
1414 "started_at": "2026-02-24T20:10:12Z",
1415 "last_event_at": "2026-02-24T20:14:59Z",
1416 "tokens": {
1417 "input_tokens": 1200,
1418 "output_tokens": 800,
1419 "total_tokens": 2000
1420 }
1421 }
1422 ],
1423 "retrying": [
1424 {
1425 "issue_id": "def456",
1426 "issue_identifier": "MT-650",
1427 "attempt": 3,
1428 "due_at": "2026-02-24T20:16:00Z",
1429 "error": "no available orchestrator slots"
1430 }
1431 ],
1432 "codex_totals": {
1433 "input_tokens": 5000,
1434 "output_tokens": 2400,
1435 "total_tokens": 7400,
1436 "seconds_running": 1834.2
1437 },
1438 "rate_limits": null
1439 }
1440 ```
1441
1442- `GET /api/v1/<issue_identifier>`
1443 - Returns issue-specific runtime/debug details for the identified issue, including any information
1444 the implementation tracks that is useful for debugging.
1445 - Suggested response shape:
1446
1447 ```json
1448 {
1449 "issue_identifier": "MT-649",
1450 "issue_id": "abc123",
1451 "status": "running",
1452 "workspace": {
1453 "path": "/tmp/symphony_workspaces/MT-649"
1454 },
1455 "attempts": {
1456 "restart_count": 1,
1457 "current_retry_attempt": 2
1458 },
1459 "running": {
1460 "session_id": "thread-1-turn-1",
1461 "turn_count": 7,
1462 "state": "In Progress",
1463 "started_at": "2026-02-24T20:10:12Z",
1464 "last_event": "notification",
1465 "last_message": "Working on tests",
1466 "last_event_at": "2026-02-24T20:14:59Z",
1467 "tokens": {
1468 "input_tokens": 1200,
1469 "output_tokens": 800,
1470 "total_tokens": 2000
1471 }
1472 },
1473 "retry": null,
1474 "logs": {
1475 "codex_session_logs": [
1476 {
1477 "label": "latest",
1478 "path": "/var/log/symphony/codex/MT-649/latest.log",
1479 "url": null
1480 }
1481 ]
1482 },
1483 "recent_events": [
1484 {
1485 "at": "2026-02-24T20:14:59Z",
1486 "event": "notification",
1487 "message": "Working on tests"
1488 }
1489 ],
1490 "last_error": null,
1491 "tracked": {}
1492 }
1493 ```
1494
1495 - If the issue is unknown to the current in-memory state, return `404` with an error response (for
1496 example `{\"error\":{\"code\":\"issue_not_found\",\"message\":\"...\"}}`).
1497
1498- `POST /api/v1/refresh`
1499 - Queues an immediate tracker poll + reconciliation cycle (best-effort trigger; implementations
1500 may coalesce repeated requests).
1501 - Suggested request body: empty body or `{}`.
1502 - Suggested response (`202 Accepted`) shape:
1503
1504 ```json
1505 {
1506 "queued": true,
1507 "coalesced": false,
1508 "requested_at": "2026-02-24T20:15:30Z",
1509 "operations": ["poll", "reconcile"]
1510 }
1511 ```
1512
1513API design notes:
1514
1515- The JSON shapes above are the recommended baseline for interoperability and debugging ergonomics.
1516- Implementations may add fields, but should avoid breaking existing fields within a version.
1517- Endpoints should be read-only except for operational triggers like `/refresh`.
1518- Unsupported methods on defined routes should return `405 Method Not Allowed`.
1519- API errors should use a JSON envelope such as `{"error":{"code":"...","message":"..."}}`.
1520- If the dashboard is a client-side app, it should consume this API rather than duplicating state
1521 logic.
1522
1523## 14. Failure Model and Recovery Strategy
1524
1525### 14.1 Failure Classes
1526
15271. `Workflow/Config Failures`
1528 - Missing `WORKFLOW.md`
1529 - Invalid YAML front matter
1530 - Unsupported tracker kind or missing tracker credentials/project slug
1531 - Missing coding-agent executable
1532
15332. `Workspace Failures`
1534 - Workspace directory creation failure
1535 - Workspace population/synchronization failure (implementation-defined; may come from hooks)
1536 - Invalid workspace path configuration
1537 - Hook timeout/failure
1538
15393. `Agent Session Failures`
1540 - Startup handshake failure
1541 - Turn failed/cancelled
1542 - Turn timeout
1543 - User input requested (hard fail)
1544 - Subprocess exit
1545 - Stalled session (no activity)
1546
15474. `Tracker Failures`
1548 - API transport errors
1549 - Non-200 status
1550 - GraphQL errors
1551 - malformed payloads
1552
15535. `Observability Failures`
1554 - Snapshot timeout
1555 - Dashboard render errors
1556 - Log sink configuration failure
1557
1558### 14.2 Recovery Behavior
1559
1560- Dispatch validation failures:
1561 - Skip new dispatches.
1562 - Keep service alive.
1563 - Continue reconciliation where possible.
1564
1565- Worker failures:
1566 - Convert to retries with exponential backoff.
1567
1568- Tracker candidate-fetch failures:
1569 - Skip this tick.
1570 - Try again on next tick.
1571
1572- Reconciliation state-refresh failures:
1573 - Keep current workers.
1574 - Retry on next tick.
1575
1576- Dashboard/log failures:
1577 - Do not crash the orchestrator.
1578
1579### 14.3 Partial State Recovery (Restart)
1580
1581Current design is intentionally in-memory for scheduler state.
1582
1583After restart:
1584
1585- No retry timers are restored from prior process memory.
1586- No running sessions are assumed recoverable.
1587- Service recovers by:
1588 - startup terminal workspace cleanup
1589 - fresh polling of active issues
1590 - re-dispatching eligible work
1591
1592### 14.4 Operator Intervention Points
1593
1594Operators can control behavior by:
1595
1596- Editing `WORKFLOW.md` (prompt and most runtime settings).
1597- `WORKFLOW.md` changes should be detected and re-applied automatically without restart.
1598- Changing issue states in the tracker:
1599 - terminal state -> running session is stopped and workspace cleaned when reconciled
1600 - non-active state -> running session is stopped without cleanup
1601- Restarting the service for process recovery or deployment (not as the normal path for applying
1602 workflow config changes).
1603
1604## 15. Security and Operational Safety
1605
1606### 15.1 Trust Boundary Assumption
1607
1608Each implementation defines its own trust boundary.
1609
1610Operational safety requirements:
1611
1612- Implementations should state clearly whether they are intended for trusted environments, more
1613 restrictive environments, or both.
1614- Implementations should state clearly whether they rely on auto-approved actions, operator
1615 approvals, stricter sandboxing, or some combination of those controls.
1616- Workspace isolation and path validation are important baseline controls, but they are not a
1617 substitute for whatever approval and sandbox policy an implementation chooses.
1618
1619### 15.2 Filesystem Safety Requirements
1620
1621Mandatory:
1622
1623- Workspace path must remain under configured workspace root.
1624- Coding-agent cwd must be the per-issue workspace path for the current run.
1625- Workspace directory names must use sanitized identifiers.
1626
1627Recommended additional hardening for ports:
1628
1629- Run under a dedicated OS user.
1630- Restrict workspace root permissions.
1631- Mount workspace root on a dedicated volume if possible.
1632
1633### 15.3 Secret Handling
1634
1635- Support `$VAR` indirection in workflow config.
1636- Do not log API tokens or secret env values.
1637- Validate presence of secrets without printing them.
1638
1639### 15.4 Hook Script Safety
1640
1641Workspace hooks are arbitrary shell scripts from `WORKFLOW.md`.
1642
1643Implications:
1644
1645- Hooks are fully trusted configuration.
1646- Hooks run inside the workspace directory.
1647- Hook output should be truncated in logs.
1648- Hook timeouts are required to avoid hanging the orchestrator.
1649
1650### 15.5 Harness Hardening Guidance
1651
1652Running Codex agents against repositories, issue trackers, and other inputs that may contain
1653sensitive data or externally-controlled content can be dangerous. A permissive deployment can lead
1654to data leaks, destructive mutations, or full machine compromise if the agent is induced to execute
1655harmful commands or use overly-powerful integrations.
1656
1657Implementations should explicitly evaluate their own risk profile and harden the execution harness
1658where appropriate. This specification intentionally does not mandate a single hardening posture, but
1659ports should not assume that tracker data, repository contents, prompt inputs, or tool arguments are
1660fully trustworthy just because they originate inside a normal workflow.
1661
1662Possible hardening measures include:
1663
1664- Tightening Codex approval and sandbox settings described elsewhere in this specification instead
1665 of running with a maximally permissive configuration.
1666- Adding external isolation layers such as OS/container/VM sandboxing, network restrictions, or
1667 separate credentials beyond the built-in Codex policy controls.
1668- Filtering which Linear issues, projects, teams, labels, or other tracker sources are eligible for
1669 dispatch so untrusted or out-of-scope tasks do not automatically reach the agent.
1670- Narrowing the optional `linear_graphql` tool so it can only read or mutate data inside the
1671 intended project scope, rather than exposing general workspace-wide tracker access.
1672- Reducing the set of client-side tools, credentials, filesystem paths, and network destinations
1673 available to the agent to the minimum needed for the workflow.
1674
1675The correct controls are deployment-specific, but implementations should document them clearly and
1676treat harness hardening as part of the core safety model rather than an optional afterthought.
1677
1678## 16. Reference Algorithms (Language-Agnostic)
1679
1680### 16.1 Service Startup
1681
1682```text
1683function start_service():
1684 configure_logging()
1685 start_observability_outputs()
1686 start_workflow_watch(on_change=reload_and_reapply_workflow)
1687
1688 state = {
1689 poll_interval_ms: get_config_poll_interval_ms(),
1690 max_concurrent_agents: get_config_max_concurrent_agents(),
1691 running: {},
1692 claimed: set(),
1693 retry_attempts: {},
1694 completed: set(),
1695 codex_totals: {input_tokens: 0, output_tokens: 0, total_tokens: 0, seconds_running: 0},
1696 codex_rate_limits: null
1697 }
1698
1699 validation = validate_dispatch_config()
1700 if validation is not ok:
1701 log_validation_error(validation)
1702 fail_startup(validation)
1703
1704 startup_terminal_workspace_cleanup()
1705 schedule_tick(delay_ms=0)
1706
1707 event_loop(state)
1708```
1709
1710### 16.2 Poll-and-Dispatch Tick
1711
1712```text
1713on_tick(state):
1714 state = reconcile_running_issues(state)
1715
1716 validation = validate_dispatch_config()
1717 if validation is not ok:
1718 log_validation_error(validation)
1719 notify_observers()
1720 schedule_tick(state.poll_interval_ms)
1721 return state
1722
1723 issues = tracker.fetch_candidate_issues()
1724 if issues failed:
1725 log_tracker_error()
1726 notify_observers()
1727 schedule_tick(state.poll_interval_ms)
1728 return state
1729
1730 for issue in sort_for_dispatch(issues):
1731 if no_available_slots(state):
1732 break
1733
1734 if should_dispatch(issue, state):
1735 state = dispatch_issue(issue, state, attempt=null)
1736
1737 notify_observers()
1738 schedule_tick(state.poll_interval_ms)
1739 return state
1740```
1741
1742### 16.3 Reconcile Active Runs
1743
1744```text
1745function reconcile_running_issues(state):
1746 state = reconcile_stalled_runs(state)
1747
1748 running_ids = keys(state.running)
1749 if running_ids is empty:
1750 return state
1751
1752 refreshed = tracker.fetch_issue_states_by_ids(running_ids)
1753 if refreshed failed:
1754 log_debug("keep workers running")
1755 return state
1756
1757 for issue in refreshed:
1758 if issue.state in terminal_states:
1759 state = terminate_running_issue(state, issue.id, cleanup_workspace=true)
1760 else if issue.state in active_states:
1761 state.running[issue.id].issue = issue
1762 else:
1763 state = terminate_running_issue(state, issue.id, cleanup_workspace=false)
1764
1765 return state
1766```
1767
1768### 16.4 Dispatch One Issue
1769
1770```text
1771function dispatch_issue(issue, state, attempt):
1772 worker = spawn_worker(
1773 fn -> run_agent_attempt(issue, attempt, parent_orchestrator_pid) end
1774 )
1775
1776 if worker spawn failed:
1777 return schedule_retry(state, issue.id, next_attempt(attempt), {
1778 identifier: issue.identifier,
1779 error: "failed to spawn agent"
1780 })
1781
1782 state.running[issue.id] = {
1783 worker_handle,
1784 monitor_handle,
1785 identifier: issue.identifier,
1786 issue,
1787 session_id: null,
1788 codex_app_server_pid: null,
1789 last_codex_message: null,
1790 last_codex_event: null,
1791 last_codex_timestamp: null,
1792 codex_input_tokens: 0,
1793 codex_output_tokens: 0,
1794 codex_total_tokens: 0,
1795 last_reported_input_tokens: 0,
1796 last_reported_output_tokens: 0,
1797 last_reported_total_tokens: 0,
1798 retry_attempt: normalize_attempt(attempt),
1799 started_at: now_utc()
1800 }
1801
1802 state.claimed.add(issue.id)
1803 state.retry_attempts.remove(issue.id)
1804 return state
1805```
1806
1807### 16.5 Worker Attempt (Workspace + Prompt + Agent)
1808
1809```text
1810function run_agent_attempt(issue, attempt, orchestrator_channel):
1811 workspace = workspace_manager.create_for_issue(issue.identifier)
1812 if workspace failed:
1813 fail_worker("workspace error")
1814
1815 if run_hook("before_run", workspace.path) failed:
1816 fail_worker("before_run hook error")
1817
1818 session = app_server.start_session(workspace=workspace.path)
1819 if session failed:
1820 run_hook_best_effort("after_run", workspace.path)
1821 fail_worker("agent session startup error")
1822
1823 max_turns = config.agent.max_turns
1824 turn_number = 1
1825
1826 while true:
1827 prompt = build_turn_prompt(workflow_template, issue, attempt, turn_number, max_turns)
1828 if prompt failed:
1829 app_server.stop_session(session)
1830 run_hook_best_effort("after_run", workspace.path)
1831 fail_worker("prompt error")
1832
1833 turn_result = app_server.run_turn(
1834 session=session,
1835 prompt=prompt,
1836 issue=issue,
1837 on_message=(msg) -> send(orchestrator_channel, {codex_update, issue.id, msg})
1838 )
1839
1840 if turn_result failed:
1841 app_server.stop_session(session)
1842 run_hook_best_effort("after_run", workspace.path)
1843 fail_worker("agent turn error")
1844
1845 refreshed_issue = tracker.fetch_issue_states_by_ids([issue.id])
1846 if refreshed_issue failed:
1847 app_server.stop_session(session)
1848 run_hook_best_effort("after_run", workspace.path)
1849 fail_worker("issue state refresh error")
1850
1851 issue = refreshed_issue[0] or issue
1852
1853 if issue.state is not active:
1854 break
1855
1856 if turn_number >= max_turns:
1857 break
1858
1859 turn_number = turn_number + 1
1860
1861 app_server.stop_session(session)
1862 run_hook_best_effort("after_run", workspace.path)
1863
1864 exit_normal()
1865```
1866
1867### 16.6 Worker Exit and Retry Handling
1868
1869```text
1870on_worker_exit(issue_id, reason, state):
1871 running_entry = state.running.remove(issue_id)
1872 state = add_runtime_seconds_to_totals(state, running_entry)
1873
1874 if reason == normal:
1875 state.completed.add(issue_id) # bookkeeping only
1876 state = schedule_retry(state, issue_id, 1, {
1877 identifier: running_entry.identifier,
1878 delay_type: continuation
1879 })
1880 else:
1881 state = schedule_retry(state, issue_id, next_attempt_from(running_entry), {
1882 identifier: running_entry.identifier,
1883 error: format("worker exited: %reason")
1884 })
1885
1886 notify_observers()
1887 return state
1888```
1889
1890```text
1891on_retry_timer(issue_id, state):
1892 retry_entry = state.retry_attempts.pop(issue_id)
1893 if missing:
1894 return state
1895
1896 candidates = tracker.fetch_candidate_issues()
1897 if fetch failed:
1898 return schedule_retry(state, issue_id, retry_entry.attempt + 1, {
1899 identifier: retry_entry.identifier,
1900 error: "retry poll failed"
1901 })
1902
1903 issue = find_by_id(candidates, issue_id)
1904 if issue is null:
1905 state.claimed.remove(issue_id)
1906 return state
1907
1908 if available_slots(state) == 0:
1909 return schedule_retry(state, issue_id, retry_entry.attempt + 1, {
1910 identifier: issue.identifier,
1911 error: "no available orchestrator slots"
1912 })
1913
1914 return dispatch_issue(issue, state, attempt=retry_entry.attempt)
1915```
1916
1917## 17. Test and Validation Matrix
1918
1919A conforming implementation should include tests that cover the behaviors defined in this
1920specification.
1921
1922Validation profiles:
1923
1924- `Core Conformance`: deterministic tests required for all conforming implementations.
1925- `Extension Conformance`: required only for optional features that an implementation chooses to
1926 ship.
1927- `Real Integration Profile`: environment-dependent smoke/integration checks recommended before
1928 production use.
1929
1930Unless otherwise noted, Sections 17.1 through 17.7 are `Core Conformance`. Bullets that begin with
1931`If ... is implemented` are `Extension Conformance`.
1932
1933### 17.1 Workflow and Config Parsing
1934
1935- Workflow file path precedence:
1936 - explicit runtime path is used when provided
1937 - cwd default is `WORKFLOW.md` when no explicit runtime path is provided
1938- Workflow file changes are detected and trigger re-read/re-apply without restart
1939- Invalid workflow reload keeps last known good effective configuration and emits an
1940 operator-visible error
1941- Missing `WORKFLOW.md` returns typed error
1942- Invalid YAML front matter returns typed error
1943- Front matter non-map returns typed error
1944- Config defaults apply when optional values are missing
1945- `tracker.kind` validation enforces currently supported kind (`linear`)
1946- `tracker.api_key` works (including `$VAR` indirection)
1947- `$VAR` resolution works for tracker API key and path values
1948- `~` path expansion works
1949- `codex.command` is preserved as a shell command string
1950- Per-state concurrency override map normalizes state names and ignores invalid values
1951- Prompt template renders `issue` and `attempt`
1952- Prompt rendering fails on unknown variables (strict mode)
1953
1954### 17.2 Workspace Manager and Safety
1955
1956- Deterministic workspace path per issue identifier
1957- Missing workspace directory is created
1958- Existing workspace directory is reused
1959- Existing non-directory path at workspace location is handled safely (replace or fail per
1960 implementation policy)
1961- Optional workspace population/synchronization errors are surfaced
1962- Temporary artifacts (`tmp`, `.elixir_ls`) are removed during prep
1963- `after_create` hook runs only on new workspace creation
1964- `before_run` hook runs before each attempt and failure/timeouts abort the current attempt
1965- `after_run` hook runs after each attempt and failure/timeouts are logged and ignored
1966- `before_remove` hook runs on cleanup and failures/timeouts are ignored
1967- Workspace path sanitization and root containment invariants are enforced before agent launch
1968- Agent launch uses the per-issue workspace path as cwd and rejects out-of-root paths
1969
1970### 17.3 Issue Tracker Client
1971
1972- Candidate issue fetch uses active states and project slug
1973- Linear query uses the specified project filter field (`slugId`)
1974- Empty `fetch_issues_by_states([])` returns empty without API call
1975- Pagination preserves order across multiple pages
1976- Blockers are normalized from inverse relations of type `blocks`
1977- Labels are normalized to lowercase
1978- Issue state refresh by ID returns minimal normalized issues
1979- Issue state refresh query uses GraphQL ID typing (`[ID!]`) as specified in Section 11.2
1980- Error mapping for request errors, non-200, GraphQL errors, malformed payloads
1981
1982### 17.4 Orchestrator Dispatch, Reconciliation, and Retry
1983
1984- Dispatch sort order is priority then oldest creation time
1985- `Todo` issue with non-terminal blockers is not eligible
1986- `Todo` issue with terminal blockers is eligible
1987- Active-state issue refresh updates running entry state
1988- Non-active state stops running agent without workspace cleanup
1989- Terminal state stops running agent and cleans workspace
1990- Reconciliation with no running issues is a no-op
1991- Normal worker exit schedules a short continuation retry (attempt 1)
1992- Abnormal worker exit increments retries with 10s-based exponential backoff
1993- Retry backoff cap uses configured `agent.max_retry_backoff_ms`
1994- Retry queue entries include attempt, due time, identifier, and error
1995- Stall detection kills stalled sessions and schedules retry
1996- Slot exhaustion requeues retries with explicit error reason
1997- If a snapshot API is implemented, it returns running rows, retry rows, token totals, and rate
1998 limits
1999- If a snapshot API is implemented, timeout/unavailable cases are surfaced
2000
2001### 17.5 Coding-Agent App-Server Client
2002
2003- Launch command uses workspace cwd and invokes `bash -lc <codex.command>`
2004- Startup handshake sends `initialize`, `initialized`, `thread/start`, `turn/start`
2005- `initialize` includes client identity/capabilities payload required by the targeted Codex
2006 app-server protocol
2007- Policy-related startup payloads use the implementation's documented approval/sandbox settings
2008- `thread/start` and `turn/start` parse nested IDs and emit `session_started`
2009- Request/response read timeout is enforced
2010- Turn timeout is enforced
2011- Partial JSON lines are buffered until newline
2012- Stdout and stderr are handled separately; protocol JSON is parsed from stdout only
2013- Non-JSON stderr lines are logged but do not crash parsing
2014- Command/file-change approvals are handled according to the implementation's documented policy
2015- Unsupported dynamic tool calls are rejected without stalling the session
2016- User input requests are handled according to the implementation's documented policy and do not
2017 stall indefinitely
2018- Usage and rate-limit payloads are extracted from nested payload shapes
2019- Compatible payload variants for approvals, user-input-required signals, and usage/rate-limit
2020 telemetry are accepted when they preserve the same logical meaning
2021- If optional client-side tools are implemented, the startup handshake advertises the supported tool
2022 specs required for discovery by the targeted app-server version
2023- If the optional `linear_graphql` client-side tool extension is implemented:
2024 - the tool is advertised to the session
2025 - valid `query` / `variables` inputs execute against configured Linear auth
2026 - top-level GraphQL `errors` produce `success=false` while preserving the GraphQL body
2027 - invalid arguments, missing auth, and transport failures return structured failure payloads
2028 - unsupported tool names still fail without stalling the session
2029
2030### 17.6 Observability
2031
2032- Validation failures are operator-visible
2033- Structured logging includes issue/session context fields
2034- Logging sink failures do not crash orchestration
2035- Token/rate-limit aggregation remains correct across repeated agent updates
2036- If a human-readable status surface is implemented, it is driven from orchestrator state and does
2037 not affect correctness
2038- If humanized event summaries are implemented, they cover key wrapper/agent event classes without
2039 changing orchestrator behavior
2040
2041### 17.7 CLI and Host Lifecycle
2042
2043- CLI accepts an optional positional workflow path argument (`path-to-WORKFLOW.md`)
2044- CLI uses `./WORKFLOW.md` when no workflow path argument is provided
2045- CLI errors on nonexistent explicit workflow path or missing default `./WORKFLOW.md`
2046- CLI surfaces startup failure cleanly
2047- CLI exits with success when application starts and shuts down normally
2048- CLI exits nonzero when startup fails or the host process exits abnormally
2049
2050### 17.8 Real Integration Profile (Recommended)
2051
2052These checks are recommended for production readiness and may be skipped in CI when credentials,
2053network access, or external service permissions are unavailable.
2054
2055- A real tracker smoke test can be run with valid credentials supplied by `LINEAR_API_KEY` or a
2056 documented local bootstrap mechanism (for example `~/.linear_api_key`).
2057- Real integration tests should use isolated test identifiers/workspaces and clean up tracker
2058 artifacts when practical.
2059- A skipped real-integration test should be reported as skipped, not silently treated as passed.
2060- If a real-integration profile is explicitly enabled in CI or release validation, failures should
2061 fail that job.
2062
2063## 18. Implementation Checklist (Definition of Done)
2064
2065Use the same validation profiles as Section 17:
2066
2067- Section 18.1 = `Core Conformance`
2068- Section 18.2 = `Extension Conformance`
2069- Section 18.3 = `Real Integration Profile`
2070
2071### 18.1 Required for Conformance
2072
2073- Workflow path selection supports explicit runtime path and cwd default
2074- `WORKFLOW.md` loader with YAML front matter + prompt body split
2075- Typed config layer with defaults and `$` resolution
2076- Dynamic `WORKFLOW.md` watch/reload/re-apply for config and prompt
2077- Polling orchestrator with single-authority mutable state
2078- Issue tracker client with candidate fetch + state refresh + terminal fetch
2079- Workspace manager with sanitized per-issue workspaces
2080- Workspace lifecycle hooks (`after_create`, `before_run`, `after_run`, `before_remove`)
2081- Hook timeout config (`hooks.timeout_ms`, default `60000`)
2082- Coding-agent app-server subprocess client with JSON line protocol
2083- Codex launch command config (`codex.command`, default `codex app-server`)
2084- Strict prompt rendering with `issue` and `attempt` variables
2085- Exponential retry queue with continuation retries after normal exit
2086- Configurable retry backoff cap (`agent.max_retry_backoff_ms`, default 5m)
2087- Reconciliation that stops runs on terminal/non-active tracker states
2088- Workspace cleanup for terminal issues (startup sweep + active transition)
2089- Structured logs with `issue_id`, `issue_identifier`, and `session_id`
2090- Operator-visible observability (structured logs; optional snapshot/status surface)
2091
2092### 18.2 Recommended Extensions (Not Required for Conformance)
2093
2094- Optional HTTP server honors CLI `--port` over `server.port`, uses a safe default bind host, and
2095 exposes the baseline endpoints/error semantics in Section 13.7 if shipped.
2096- Optional `linear_graphql` client-side tool extension exposes raw Linear GraphQL access through the
2097 app-server session using configured Symphony auth.
2098- TODO: Persist retry queue and session metadata across process restarts.
2099- TODO: Make observability settings configurable in workflow front matter without prescribing UI
2100 implementation details.
2101- TODO: Add first-class tracker write APIs (comments/state transitions) in the orchestrator instead
2102 of only via agent tools.
2103- TODO: Add pluggable issue tracker adapters beyond Linear.
2104
2105### 18.3 Operational Validation Before Production (Recommended)
2106
2107- Run the `Real Integration Profile` from Section 17.8 with valid credentials and network access.
2108- Verify hook execution and workflow path resolution on the target host OS/shell environment.
2109- If the optional HTTP server is shipped, verify the configured port behavior and loopback/default
2110 bind expectations on the target environment.
2111