microsoft/TypeAgent

Public

mirrored fromhttps://github.com/microsoft/TypeAgentAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
copilot/remove-deprecated-dependencies

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

ts/docs/architecture/dispatcher.md

643lines · modepreview

# Dispatcher — Architecture & Design

> **Scope:** This document describes the dispatcher package — the core
> routing engine that processes user input, translates natural language
> into typed actions, manages agent lifecycles, and orchestrates action
> execution. For the grammar language and matching algorithms, see
> `actionGrammar.md`. For the completion pipeline, see `completion.md`.

## Overview

The dispatcher is TypeAgent's central orchestration layer. It accepts
user input (natural language or structured `@`-commands), resolves it to
one or more typed actions, and dispatches those actions to the
appropriate application agents. The dispatcher is split across four npm
packages:

| Package                       | Role                                                  |
| ----------------------------- | ----------------------------------------------------- |
| `agent-dispatcher`            | Core implementation — translation, execution, context |
| `@typeagent/dispatcher-types` | Public interface (`Dispatcher`) and shared types      |
| `@typeagent/dispatcher-rpc`   | RPC serialization layer for remote clients            |
| `dispatcher-node-providers`   | Node.js agent loading and file-system storage         |

```
User input
     │
     ▼
┌───────────────────────────────────────────────────────┐
│  Dispatcher                                           │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │   Command    │  │ Translation  │  │  Execution   │ │
│  │  Resolution  │─→│   Pipeline   │─→│    Engine    │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
│          ↕                 ↕                 ↕        │
│  ┌──────────────────────────────────────────────────┐ │
│  │     AppAgentManager (agent registry & state)     │ │
│  └──────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────┘
     │
     ▼
  AppAgent.executeAction() / executeCommand()
```

### Key concepts

| Term                        | Meaning                                                                                                    |
| --------------------------- | ---------------------------------------------------------------------------------------------------------- |
| **Typed action**            | A structured object (`{ actionName, parameters }`) representing a user intent, validated against a schema. |
| **Schema**                  | A TypeScript action type definition that describes the set of actions an agent can handle.                 |
| **`@`-command**             | A structured command prefixed with `@` (e.g., `@config agent player`). No LLM translation needed.          |
| **Translation**             | The process of converting natural language to typed actions — via cache match or LLM call.                 |
| **`CommandHandlerContext`** | The central state object that holds all runtime state for a dispatcher session.                            |
| **`AppAgentManager`**       | The agent registry that manages agent manifests, schemas, state, and lifecycle.                            |
| **Flow**                    | A multi-step recipe (sequence of actions) that an agent can register for compound operations.              |

---

## Package structure

```
packages/dispatcher/
├── dispatcher/                     # Core implementation (agent-dispatcher)
│   └── src/
│       ├── dispatcher.ts           # Factory & Dispatcher object creation
│       ├── index.ts                # Public exports
│       ├── internal.ts             # Internal APIs
│       ├── agentProvider/          # AppAgentProvider interface
│       ├── command/                # Command resolution & execution
│       ├── context/                # Session, agent manager, memory, chat history
│       │   ├── dispatcher/         # Built-in dispatcher agent
│       │   └── system/             # Built-in system agent & handlers
│       ├── execute/                # Action execution orchestrator
│       ├── translation/            # Cache matching & LLM translation
│       ├── reasoning/              # LLM reasoning adapters (Claude, Copilot)
│       ├── search/                 # Internet search integration
│       ├── storageProvider/        # Abstract storage interface
│       ├── helpers/                # Config, console, status utilities
│       └── utils/                  # Cache factories, metrics, exceptions
├── types/                          # Public types (@typeagent/dispatcher-types)
│   └── src/
│       ├── dispatcher.ts           # Dispatcher interface & result types
│       ├── clientIO.ts             # ClientIO display interface
│       └── displayLogEntry.ts      # Display log entry types
├── rpc/                            # RPC layer (@typeagent/dispatcher-rpc)
│   └── src/
│       ├── dispatcherServer.ts     # Server-side RPC handler
│       ├── dispatcherClient.ts     # Client-side RPC proxy
│       ├── clientIOServer.ts       # ClientIO RPC server
│       └── clientIOClient.ts       # ClientIO RPC client
└── nodeProviders/                  # Node.js integration (dispatcher-node-providers)
    └── src/
        ├── agentProvider/
        │   └── npmAgentProvider.ts  # NPM-based agent loading
        └── storageProvider/
            └── fsStorageProvider.ts  # File-system storage provider
```

---

## Core interfaces

### `Dispatcher`

The public API surface — returned by `createDispatcher()` and consumed by
hosts (shell, CLI, web).

```typescript
interface Dispatcher {
  processCommand(
    command,
    clientRequestId?,
    attachments?,
    options?,
  ): Promise<CommandResult | undefined>;
  getCommandCompletion(prefix, direction): Promise<CommandCompletionResult>;
  checkCache(request): Promise<CommandResult | undefined>;
  getDynamicDisplay(appAgentName, type, displayId): Promise<DynamicDisplay>;
  getStatus(): Promise<DispatcherStatus>;
  getAgentSchemas(agentName?): Promise<AgentSchemaInfo[]>;
  respondToChoice(choiceId, response): Promise<CommandResult | undefined>;
  getDisplayHistory(afterSeq?): Promise<DisplayLogEntry[]>;
  cancelCommand(requestId): void;
  close(): Promise<void>;
}
```

### `CommandHandlerContext`

The central state object that threads through every operation. Created
once per session via `initializeCommandHandlerContext()` and shared
across all requests within that session. Key members:

| Field                  | Type                           | Purpose                                           |
| ---------------------- | ------------------------------ | ------------------------------------------------- |
| `agents`               | `AppAgentManager`              | Agent registry, schema state, lifecycle           |
| `session`              | `Session`                      | Persistent configuration (models, caching, etc.)  |
| `agentCache`           | `AgentCache`                   | Construction and grammar cache for translations   |
| `agentGrammarRegistry` | `AgentGrammarRegistry`         | NFA-based grammar matching engine                 |
| `translatorCache`      | `Map<string, Translator>`      | Per-schema LLM translator instances               |
| `chatHistory`          | `ChatHistory`                  | Conversation memory for context-aware translation |
| `conversationMemory`   | `ConversationMemory`           | Structured RAG for entity resolution              |
| `displayLog`           | `DisplayLog`                   | Persistent output log with sequence numbering     |
| `commandLock`          | `Limiter`                      | Serializes commands (one at a time)               |
| `activeRequests`       | `Map<string, AbortController>` | In-flight request cancellation                    |
| `clientIO`             | `ClientIO`                     | Abstract display interface for the host           |

### `ClientIO`

The abstract display layer that decouples the dispatcher from any
specific UI. Implemented by the shell (Electron), CLI (console), and
web clients. Handles:

- Displaying action results and status messages
- Streaming partial actions during LLM generation
- Presenting choices to the user
- Setting display metadata (emoji, agent name)

---

## Command processing pipeline

Every user interaction enters through `processCommand()`, which acquires
a command lock (ensuring serial execution), sets up request tracking, and
delegates to the resolution → translation → execution pipeline.

### 1. Command resolution

```
processCommand(input)
     │
     ▼
normalizeCommand(input)          # Add "@" prefix if missing
     │
     ▼
resolveCommand(normalizedInput)  # Greedy token matching
     │
     ├─→ "@config agent player"  → system command (config handler)
     ├─→ "@player play ..."      → agent command (player handler)
     └─→ "play Yesterday"        → natural language (translation pipeline)
```

`resolveCommand()` uses **greedy exact-match tokenization** against
a hierarchical `CommandDescriptorTable`. It consumes the first token
as a potential agent name, then walks nested subcommand tables. If a
token doesn't match, it rolls back and defaults to the system agent.

The result classifies the input as either:

- **A structured command** — routed directly to the agent's
  `executeCommand()` handler.
- **A natural language request** — forwarded to the translation
  pipeline for action resolution.

### 2. Translation pipeline

Natural language requests pass through a two-stage pipeline: cache match
first, then LLM translation as fallback.

```
interpretRequest(request)
     │
     ▼
┌──────────────────────────────────────────────┐
│  Can use cache?                               │
│  (no attachments, no special instructions,    │
│   cache enabled in session config)            │
└──────┬───────────────────────────┬────────────┘
       │ yes                       │ no
       ▼                           │
  matchRequest()                   │
  ┌────────────────────┐           │
  │ agentCache.match() │           │
  │ (grammar + constr.)│           │
  └────────┬───────────┘           │
           │                       │
    ┌──────┴──────┐                │
    │ hit         │ miss           │
    ▼             ▼                ▼
  validate   translateRequest()
  wildcards  ┌──────────────────────────────────┐
    │        │ pickInitialSchema()              │
    │        │ → semantic search or last-used   │
    │        │                                  │
    │        │ translateWithTranslator()         │
    │        │ → LLM call (GPT, Claude, etc.)  │
    │        │                                  │
    │        │ finalizeAction()                  │
    │        │ → handle schema switches         │
    │        │ → resolve unknowns               │
    │        └──────────────────────────────────┘
    │                    │
    └────────────────────┘
             │
             ▼
    confirmTranslation()   # User override if needed
             │
             ▼
       InterpretResult { requestAction, fromCache, elapsedMs }
```

#### Cache matching (`matchRequest`)

The cache layer provides the fastest path to action resolution. It checks
two sources in parallel:

1. **Grammar rules** — grammar based matching against compiled `.agr` files.
   Produces exact structural matches with wildcard captures.
2. **Constructions** — Previously cached LLM translations that matched
   similar requests. Stored as templates with wildcard slots.

If a match is found, the dispatcher validates it:

- **Entity wildcards** are checked against conversation memory — can
  the captured value resolve to a known entity?
- **Regular wildcards** are validated by the target agent's
  `validateWildcardMatch()` callback.

Only fully validated matches are accepted. Invalid matches fall through
to LLM translation.

#### LLM translation (`translateRequest`)

When no cache match exists, the dispatcher invokes an LLM to translate
the request:

1. **Schema selection** — `pickInitialSchema()` chooses the most likely
   target schema via embedding-based semantic search or falls back to
   the last-used schema.

2. **Translator creation** — `getTranslatorForSchema()` builds a
   `TypeAgentTranslator` configured with the selected schema's action
   types. If optimization is enabled, `getTranslatorForSelectedActions()`
   narrows the action set using semantic search.

3. **Translation** — The translator calls the LLM with the request,
   conversation history, and attachments. Supports streaming partial
   results (relayed to the host via `streamPartialAction()`).

4. **Finalization** — `finalizeAction()` handles multi-step resolution:
   - If the LLM returns an `"unknown"` action, the dispatcher searches
     for a better agent via `findAssistantForRequest()` (semantic search
     over all registered schemas).
   - If the LLM returns a switch action, the dispatcher re-translates
     with the new schema.
   - For `MultipleAction` results, each sub-request is finalized independently.

#### Activity context

When an activity is active (e.g., the user is in a "music" session),
the translation pipeline prioritizes activity-related schemas:

1. Translate with activity schemas first.
2. If the result contains unknown actions and the activity isn't
   restricted, retry with non-activity schemas.
3. Merge the results — executable actions from the activity translation
   combined with resolutions from the broader schema set.

#### History context

The dispatcher builds a `HistoryContext` from the chat history to
provide conversation-aware translation:

- Recent user/assistant exchanges (for coreference resolution)
- Top-K named entities mentioned in conversation
- User-provided instructions and guidance
- Current activity context

This context is passed to both the cache layer and the LLM translator.

### 3. Action execution

Once actions are resolved, the execution engine takes over:

```
executeActions(actions[], entities?)
     │
     ▼
toPendingActions()         # Entity resolution pass
     │
     ▼
canExecute()?              # Check for unknown/disabled schemas
     │ yes
     ▼
┌── action loop ────────────────────────────────────────┐
│                                                       │
│  Is pending request? ──yes──→ translatePendingRequest │
│         │ no                         │                │
│         ▼                            ▼                │
│  Check for flow? ──yes──→ processFlow()               │
│         │ no                                          │
│         ▼                                             │
│  appAgent.executeAction(action, actionContext)        │
│         │                                             │
│         ▼                                             │
│  Process ActionResult:                                │
│  - Display output to client                           │
│  - Register choices (if any)                          │
│  - Set activity context (if changed)                  │
│  - Extract entities → conversation memory             │
│  - Queue additional actions (if any)                  │
│  - Collect metrics                                    │
│                                                       │
└───────────────────────────────────────────────────────┘
```

**Entity resolution** — Before execution, `toPendingActions()` resolves
entity references. Named entities (e.g., "that song", "the meeting") are
looked up in conversation memory. Ambiguous references trigger a user
clarification prompt via the `ClientIO` layer.

**Flow execution** — Some actions have registered flow definitions
(multi-step recipes). When `getFlow(schemaName, actionName)` returns a
flow, the `flowInterpreter` executes it step-by-step with parameter
interpolation, rather than calling the agent's `executeAction()` directly.

**Streaming** — During LLM translation, the dispatcher can relay partial
action objects to agents via `streamPartialAction()`. This allows agents
to begin rendering results before translation is complete (e.g.,
progressively displaying search results).

---

## Agent management

### `AppAgentManager`

The `AppAgentManager` is the central registry for all agents. It handles
registration, lifecycle management, state tracking, and schema caching.

#### Registration flow

```
addProvider(provider)
     │
     ▼
provider.getAppAgentNames()     # Discover available agents
     │
     ▼
┌── for each agent ─────────────────────────────────────┐
│                                                       │
│  provider.getAppAgentManifest(name)                   │
│       │                                               │
│       ▼                                               │
│  addAgentManifest(name, manifest)                     │
│  ┌─────────────────────────────────────────────┐      │
│  │ For each schema in manifest:                │      │
│  │  - Parse action schema (.ts → ActionSchema) │      │
│  │  - Cache schema file on disk                │      │
│  │  - Add to semantic map (embeddings)         │      │
│  │  - Load static grammar (.agr)               │      │
│  │  - Register grammar in AgentGrammarRegistry │      │
│  │  - Load flow definitions                    │      │
│  └─────────────────────────────────────────────┘      │
│                                                       │
└───────────────────────────────────────────────────────┘
```

Agents are loaded lazily — the manifest and schemas are registered at
startup, but the agent code itself (`AppAgent` instance) is only loaded
when first needed via `ensureAppAgent()`.

#### Agent lifecycle

```
Registration     Initialization         Active                 Cleanup
───────────────────────────────────────────────────────────────────────
addAgentManifest → ensureSessionContext → executeAction/Command → closeSessionContext
                  │                     │                        │
                  ├─ ensureAppAgent()   ├─ updateAgentContext() ├─ updateAgentContext(false)
                  ├─ initializeAgent()  ├─ getDynamicSchema()   ├─ closeAgentContext()
                  └─ createSession()    └─ getDynamicGrammar()  └─ removeAgent()
```

- **`ensureSessionContext()`** — Lazily initializes an agent: loads the
  `AppAgent` instance from the provider, calls `initializeAgentContext()`,
  creates a `SessionContext`.
- **`updateAction()`** — Enables or disables a specific schema. When
  enabling, loads dynamic schemas and grammars from the agent.
- **`closeSessionContext()`** — Disables all schemas, calls
  `closeAgentContext()`, releases resources.

#### State management

Each schema's state is tracked independently:

| State         | Meaning                                            |
| ------------- | -------------------------------------------------- |
| **Enabled**   | Schema is registered and turned on in config       |
| **Active**    | Enabled AND not transiently disabled               |
| **Loading**   | Schema is in the process of being initialized      |
| **Transient** | Temporarily toggled on/off for the current request |

`setState()` computes state changes against the current configuration
and applies them — enabling schemas triggers `updateAgentContext(true)`,
disabling triggers `updateAgentContext(false)`.

The dispatcher agent and system commands are always enabled and cannot
be disabled.

#### Semantic search

The `AppAgentManager` maintains an `ActionSchemaSemanticMap` — an
embedding index of all registered action schemas. This powers:

- **Schema selection** during LLM translation (`pickInitialSchema`)
- **Unknown action resolution** (`findAssistantForRequest`)
- **Execution validation** (suggesting correct agents when a schema
  is disabled)

Embeddings are cached to disk and loaded at startup.

---

## Built-in agents

The dispatcher registers two built-in agents via `inlineAgentProvider`:

### System agent

Handles `@`-prefixed system commands:

- `@config` — Session configuration (models, caching, agents)
- `@help` — Help and documentation
- `@history` — Chat history management
- `@trace` — Debug tracing
- `@clear` — Clear display
- `@notify` — Notification management
- `@construction` — Cache construction management
- `@explain` — Explanation of cached translations

Each command has a `CommandDescriptor` that defines expected parameters,
subcommands, and help text.

### Dispatcher agent

Handles meta-actions that the dispatcher itself fulfills:

- **Unknown action** — When no agent can handle a request
- **Clarification** — When the input is ambiguous
- **Multiple actions** — When the LLM identifies multiple intents
- **Search/lookup** — Finding the right agent for a request

---

## Session & persistence

### `Session`

The `Session` object stores persistent configuration that survives across
requests and can be saved to disk:

- **Translation** — LLM model selection, schema generation strategy,
  streaming mode, history depth
- **Cache** — Whether to use construction/grammar matching, wildcard
  expansion settings
- **Explainer** — Model and settings for explaining cached translations
- **Clarification** — Rules for when to ask the user for clarification
- **Memory** — Whether to extract knowledge from requests and results
- **Agent state** — Per-agent/schema enabled/disabled flags

Sessions are stored in `~/.typeagent/sessions/` and loaded at startup.

### `ChatHistory`

Tracks the conversation log:

- User requests and assistant responses
- Named entities extracted from exchanges
- User-provided instructions
- Activity context markers

Used to build `HistoryContext` for context-aware translation and to
feed conversation memory.

### `ConversationMemory`

A structured RAG system (via the `knowledge-processor` package) that
extracts facts and entities from action results. Powers:

- Entity resolution during action execution
- Wildcard validation in cache matching
- Context-aware translation via history

### `DisplayLog`

A sequence-numbered log of all output displayed to the user. Supports:

- Streaming access (fetch entries after a sequence number)
- Session replay (reconstructing display history)
- Remote client synchronization via RPC

---

## RPC layer

The RPC package enables remote clients (e.g., the Electron shell running
the dispatcher in a separate process):

```
┌──────────────┐         RPC Channel          ┌───────────────────┐
│  Shell/CLI   │ ◄━━━━━━━━━━━━━━━━━━━━━━━━━►  │  Dispatcher       │
│              │                              │                   │
│  Dispatcher  │  processCommand ──────────►  │  DispatcherServer │
│  Client      │  getCompletion  ──────────►  │                   │
│              │                              │                   │
│  ClientIO    │  ◄──────────── appendDisplay │  ClientIO Client  │
│  Server      │  ◄──────────── proposeAction │                   │
└──────────────┘                              └───────────────────┘
```

Two RPC pairs are involved:

1. **Dispatcher RPC** — The shell calls dispatcher methods
   (`processCommand`, `getCommandCompletion`, etc.) via
   `DispatcherClient` → `DispatcherServer`.
2. **ClientIO RPC** — The dispatcher calls display methods
   (`appendDisplay`, `proposeAction`, etc.) via
   `ClientIOClient` → `ClientIOServer`.

This inversion means the dispatcher can push display updates to the
client without polling.

---

## Request lifecycle

A complete request lifecycle, from keystroke to result:

```
1. User types "play Yesterday by the Beatles"
2. Shell calls dispatcher.processCommand("play Yesterday by the Beatles")
3. processCommand() acquires commandLock, creates AbortController
4. normalizeCommand() → "@dispatcher play Yesterday by the Beatles"
5. resolveCommand() → no matching command → natural language request
6. interpretRequest() entered
7.   matchRequest() → agentCache.match()
8.     Grammar NFA matches "play $(track) by $(artist)" rule
9.     Wildcards validated: track="Yesterday", artist="the Beatles"
10.    Cache hit → InterpretResult { fromCache: "grammar" }
11. executeActions() entered
12.   toPendingActions() → resolve entities (none pending)
13.   canExecute() → player schema is active ✓
14.   executeAction({ actionName: "play", parameters: { track: "Yesterday", artist: "the Beatles" } })
15.     No flow registered → call playerAgent.executeAction()
16.     Agent returns ActionResult with display content
17.   Display result via clientIO.appendDisplay()
18.   Extract entities → conversationMemory
19.   Collect metrics
20. endProcessCommand() → return CommandResult
21. Release commandLock
```

If step 8 had missed (no grammar match), the flow would continue:

```
8'. translateRequest() entered
9'.   pickInitialSchema() → semantic search → "player" schema
10'.  translateWithTranslator() → LLM call with schema + history
11'.  LLM returns { actionName: "play", parameters: { track: "Yesterday", artist: "the Beatles" } }
12'.  finalizeAction() → action is valid, no switch needed
13'.  Cache translation as construction for future matches
14'.  Continue from step 11 above
```

---

## Error handling

The dispatcher uses structured error handling at several levels:

- **Command lock** — A `Limiter` ensures only one command executes at a
  time. Concurrent requests queue behind the lock.
- **Cancellation** — Each request gets an `AbortController`. Calling
  `cancelCommand(requestId)` signals the abort, which propagates through
  the translation and execution pipeline.
- **Unknown actions** — When no agent matches, the dispatcher displays
  an error and uses semantic search to suggest the closest matching
  agents/schemas.
- **Disabled schemas** — `canExecute()` checks that all target schemas
  are active before execution. Disabled schemas produce user-visible
  errors with guidance on enabling them.
- **Translation failures** — LLM errors are caught and surfaced via
  `clientIO.appendDisplay()`. The dispatcher does not retry automatically.
- **Agent errors** — Exceptions from `executeAction()` are caught,
  logged, and displayed without crashing the session.

---

## Dependencies

The dispatcher integrates with several sibling packages:

| Package                    | Integration point                                 |
| -------------------------- | ------------------------------------------------- |
| `@typeagent/agent-sdk`     | `AppAgent` interface that all agents implement    |
| `@typeagent/action-schema` | Parses TypeScript schemas into `ActionSchemaFile` |
| `action-grammar`           | NFA/DFA grammar compilation and matching          |
| `agent-cache`              | Construction cache and grammar store              |
| `aiclient`                 | LLM API calls (Azure OpenAI, OpenAI, Claude)      |
| `knowledge-processor`      | Structured RAG for conversation memory            |
| `telemetry`                | Event logging and performance metrics             |
| `@typeagent/agent-rpc`     | RPC channel abstraction                           |