openai/chatkit-python

Public

mirrored fromhttps://github.com/openai/chatkit-pythonAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
v1.6.4

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

docs/guides/accept-rich-user-input.md

422lines · modecode

1# Accept rich user input
2
3This guide explains how a ChatKit server accepts user input beyond plain text, such as attachments, structured follow-up answers, and @-mentions, and makes it available to your inference pipeline.
4
5At a high level:
6
7- Attachments let users upload files that your model can read.
8- Structured input lets the assistant ask for specific follow-up answers.
9- @-mentions let users tag entities so the model does not have to guess from free text.
10
11## Attachments: let users upload files
12
13Let users attach files/images by turning on client support, choosing an upload strategy, wiring the upload endpoints, and converting attachments to model inputs.
14
15### Enable attachments in the client
16
17Turn on attachments in the composer and configure client-side limits:
18
19```ts
20const chatkit = useChatKit({
21 // ...
22 composer: {
23 attachments: {
24 enabled: true,
25 // configure accepted MIME types, count, and size limits here
26 },
27 },
28});
29```
30
31Under the hood this maps to `ChatKitOptions.composer.attachments`; see the [`composer.attachments` docs](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/composeroption/#attachments) for all available options.
32
33### Configure an upload strategy
34
35Set [`ChatKitOptions.api.uploadStrategy`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/fileuploadstrategy/) to:
36
37- **Direct**: your backend exposes a single upload URL that accepts the bytes and writes attachment metadata to your `Store`. Simpler and faster when you control uploads directly from the app server.
38- **Two-phase**: the client makes a ChatKit API request to create an attachment metadata record (which forwards the request to `AttachmentStore`), you return an `upload_url` as part of the created attachment metadata, and the client uploads bytes in a second step. Prefer this when you front object storage with presigned/temporary URLs or want to offload upload bandwidth (for example, to a third-party blob storage).
39
40Both strategies still require an `AttachmentStore` for delete cleanup. Choose direct for simplicity on the same origin; choose two-phase for cloud storage and larger files.
41
42### Enforce attachment access control
43
44Neither attachment metadata nor file bytes are protected by ChatKit. Use the `context` passed into your `AttachmentStore` methods to authorize every create/read/delete. Only return IDs, bytes, or signed URLs when the caller owns the attachment, and prefer short-lived download URLs. Skipping these checks can leak customer data.
45
46### Direct upload
47
48Add the upload endpoint referenced in `uploadStrategy`. It must:
49
50- accept `multipart/form-data` with a `file` field,
51- store the bytes wherever you like,
52- create `Attachment` metadata, persist it via `Store.save_attachment`, and
53- return the `Attachment` JSON.
54
55Implement `AttachmentStore.delete_attachment` to delete the stored bytes; `ChatKitServer` will then call `Store.delete_attachment` to drop metadata.
56
57Example client configuration:
58
59```js
60{
61 type: "direct",
62 uploadUrl: "/files",
63}
64```
65
66Example FastAPI direct upload endpoint:
67
68```python
69@app.post("/files")
70async def upload_file(request: Request):
71 form_data = await request.form()
72 file = form_data.get("file")
73
74 # Your blob store upload
75 attachment = await upload_to_blob_store(file)
76
77 return Response(content=attachment.model_dump_json(), media_type="application/json")
78```
79
80### Two-phase upload
81
82Implement `AttachmentStore.create_attachment` to:
83
84- build an `upload_url` that accepts `multipart/form-data` with a `file` field (direct PUTs are currently not supported),
85- build the `Attachment` model,
86- persist it via `Store.save_attachment`, and
87- return it.
88
89Implement `AttachmentStore.delete_attachment` to delete the stored bytes; `ChatKitServer` will call `Store.delete_attachment` afterward.
90
91- The client POSTs the bytes to `upload_url` after it receives the created attachment metadata in the response.
92
93Client configuration:
94
95```js
96{
97 type: "two_phase",
98}
99```
100
101Example two-phase store issuing a multipart upload URL:
102
103```python
104attachment_store = BlobAttachmentStore()
105server = MyChatKitServer(store=data_store, attachment_store=attachment_store)
106
107
108class BlobAttachmentStore(AttachmentStore[RequestContext]):
109 def generate_attachment_id(self, mime_type: str, context: RequestContext) -> str:
110 return f\"att_{uuid4().hex}\"
111
112 async def create_attachment(
113 self, input: AttachmentCreateParams, context: RequestContext
114 ) -> Attachment:
115 att_id = self.generate_attachment_id(input.mime_type, context)
116 upload_url = issue_multipart_upload_url(att_id, input.mime_type) # your blob store
117 attachment = Attachment(
118 id=att_id,
119 mime_type=input.mime_type,
120 name=input.name,
121 upload_url=upload_url,
122 )
123 await data_store.save_attachment(attachment, context=context)
124 return attachment
125
126 async def delete_attachment(self, attachment_id: str, context: RequestContext) -> None:
127 await delete_blob(att_id=attachment_id) # your blob store
128```
129
130### Convert attachments to model input
131
132Attachments arrive on `input_user_message.attachments` in `ChatKitServer.respond`. The default `ThreadItemConverter` does not handle them, so subclass and implement `attachment_to_message_content` to return a `ResponseInputContentParam` before calling `Runner.run_streamed`.
133
134Example using a blob fetch helper:
135
136```python
137from chatkit.agents import ThreadItemConverter
138from chatkit.types import ImageAttachment
139from openai.types.responses import ResponseInputFileParam, ResponseInputImageParam
140
141
142async def read_bytes(attachment_id: str) -> bytes:
143 ... # fetch from your blob store
144
145
146def as_data_url(mime: str, content: bytes) -> str:
147 return "data:" + mime + ";base64," + base64.b64encode(content).decode("utf-8")
148
149
150class MyConverter(ThreadItemConverter):
151 async def attachment_to_message_content(self, attachment):
152 content = await read_bytes(attachment.id)
153 if isinstance(attachment, ImageAttachment):
154 return ResponseInputImageParam(
155 type="input_image",
156 detail="auto",
157 image_url=as_data_url(attachment.mime_type, content),
158 )
159 if attachment.mime_type == "application/pdf":
160 return ResponseInputFileParam(
161 type="input_file",
162 file_data=as_data_url(attachment.mime_type, content),
163 filename=attachment.name or "unknown",
164 )
165 # For other text formats, check for API support first before
166 # sending as a ResponseInputFileParam.
167```
168
169### Show image attachment previews in thread
170
171Set `ImageAttachment.preview_url` to allow the client to render thumbnails.
172
173- If your preview URLs are **permanent/public**, set `preview_url` once when creating the attachment and persist it.
174- If your storage uses **expiring URLs**, generate a fresh `preview_url` when returning attachment metadata (for example, in `Store.load_thread_items` and `Store.load_attachment`) rather than persisting a long-lived URL. In this case, returning a short-lived signed URL directly is the simplest approach. Alternatively, you may return a redirect that resolves to a temporary signed URL, as long as the final URL serves image bytes with appropriate CORS headers.
175
176## Dictation: speech-to-text input
177
178Enable dictation so users can record audio and have it transcribed into text before sending.
179
180### Enable dictation in the client
181
182Turn on dictation in the composer:
183
184```ts
185const chatkit = useChatKit({
186 // ...
187 composer: {
188 dictation: {
189 enabled: true,
190 },
191 },
192});
193```
194
195This maps to `ChatKitOptions.composer.dictation`.
196
197### Implement `ChatKitServer.transcribe`
198
199When dictation is enabled, the client records audio and sends it to your backend for transcription. Implement `ChatKitServer.transcribe` to accept audio input and return a transcription result.
200
201The client sends one of:
202
203- `"audio/webm;codecs=opus"` (preferred for Chrome/Firefox/Safari 18.4+)
204- `"audio/mp4"` (fallback for older Safari/iOS)
205- `"audio/ogg;codecs=opus"` (alternative for some environments)
206
207The raw value is available as `audio_input.mime_type`. Use `audio_input.media_type` when you only need the base media type (`"audio/webm"`, `"audio/ogg"`, or `"audio/mp4"`).
208
209Example transcription method using the OpenAI Audio API:
210
211```python
212async def transcribe(self, audio_input: AudioInput, context: RequestContext) -> TranscriptionResult:
213 ext = {
214 "audio/webm": "webm",
215 "audio/mp4": "m4a",
216 "audio/ogg": "ogg",
217 }.get(audio_input.media_type)
218 if not ext:
219 raise HTTPException(status_code=400, detail="Unexpected audio format")
220
221 audio_file = io.BytesIO(audio_input.data)
222 audio_file.name = f"audio.{ext}"
223 transcription = client.audio.transcriptions.create(
224 model="gpt-4o-transcribe",
225 file=audio_file
226 )
227 return TranscriptionResult(text=transcription.text)
228```
229
230Return a `TranscriptionResult` that includes the final `text` that should appear in the composer.
231
232## Structured input: ask for specific answers
233
234Structured input lets your assistant ask the user for small, typed follow-up answers as part of the conversation. The prompt takes over the composer with focused controls, plus a skip option. Use structured input when the next step depends on specific choices or short fields, and free-text back-and-forth would be slower or easier to misread.
235
236The server streams a [`StructuredInputItem`](../api/chatkit/types.md#chatkit.types.StructuredInputItem) during `respond`. ChatKit renders the questions in the composer area instead of the normal free-text input, then records the result on the same thread item when the user answers or skips.
237
238### Stream a structured input item
239
240Yield a `StructuredInputItem` from `respond` when you need the user to answer before continuing.
241
242```python
243from datetime import datetime
244
245from chatkit.types import (
246 StructuredInputFreeform,
247 StructuredInputItem,
248 StructuredInputMultipleChoice,
249 StructuredInputMultipleChoiceOption,
250 ThreadItemDoneEvent,
251)
252
253
254yield ThreadItemDoneEvent(
255 item=StructuredInputItem(
256 id=self.store.generate_item_id("message", thread, context),
257 thread_id=thread.id,
258 created_at=datetime.now(),
259 inputs=[
260 StructuredInputMultipleChoice(
261 id="priority",
262 question="What priority should I use?",
263 options=[
264 StructuredInputMultipleChoiceOption(value="Low"),
265 StructuredInputMultipleChoiceOption(value="Medium"),
266 StructuredInputMultipleChoiceOption(value="High"),
267 ],
268 ),
269 StructuredInputFreeform(
270 id="notes",
271 question="Any extra context?",
272 description="Optional details to include.",
273 ),
274 ],
275 )
276)
277return
278```
279
280Use [`StructuredInputMultipleChoice`](../api/chatkit/types.md#chatkit.types.StructuredInputMultipleChoice) for choice prompts and [`StructuredInputFreeform`](../api/chatkit/types.md#chatkit.types.StructuredInputFreeform) for short text answers. Set `multiple=True` on multiple-choice input when the user may submit more than one value.
281
282After the user submits or skips, ChatKit records the result on the structured input item.
283
284### Convert structured input submissions into model input
285
286The default [`ThreadItemConverter`](../api/chatkit/agents.md#chatkit.agents.ThreadItemConverter) includes structured input items in model input. It describes the prompt status and each answer as answered, skipped, or unanswered.
287
288If your model needs a different format, override `ThreadItemConverter.structured_input_to_input` before calling `Runner.run_streamed`.
289
290## @-mentions: tag entities in user messages
291
292Enable @-mentions so users can tag entities (like documents, tickets, or users) instead of pasting raw identifiers. Mentions travel through ChatKit as structured tags so the model can resolve entities instead of guessing from free text.
293
294### Enable as-you-type entity lookup in the composer
295
296To enable entity tagging as @-mentions in the composer, configure [`entities.onTagSearch`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#ontagsearch) as a ChatKit.js option.
297
298It should return a list of [Entity](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entity/) objects that match the query string.
299
300If you want to hint that @-mentions are available, enable the composer `@` button by setting [`entities.showComposerMenu`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#showcomposermenu). When clicked, it inserts `@` into the composer and opens the tag search menu automatically.
301
302```ts
303const chatkit = useChatKit({
304 // ...
305 entities: {
306 onTagSearch: async (query: string) => {
307 return [
308 {
309 id: "article_123",
310 title: "The Future of AI",
311 group: "Trending",
312 icon: "globe",
313 data: { type: "article" }
314 },
315 {
316 id: "article_124",
317 title: "One weird trick to improve your sleep",
318 group: "Trending",
319 icon: "globe",
320 data: { type: "article" }
321 },
322 ]
323 },
324 // Optional: show the "@" button in the composer for added discoverability.
325 showComposerMenu: true,
326 },
327})
328```
329
330### Convert tags into model input in your server
331
332Mentions arrive server-side as structured tags. Override `ThreadItemConverter.tag_to_message_content` to describe what each tag refers to and translate it into model-readable content.
333
334Example converter method that wraps the tagged entity details in custom markup:
335
336```python
337from chatkit.agents import ThreadItemConverter
338from chatkit.types import UserMessageTagContent
339from openai.types.responses import ResponseInputTextParam
340
341
342class MyThreadItemConverter(ThreadItemConverter):
343 async def tag_to_message_content(
344 self, tag: UserMessageTagContent
345 ) -> ResponseInputTextParam:
346 if tag.type == "article":
347 # Load or unpack the entity the tag refers to
348 summary = await fetch_article_summary(tag.id)
349 return ResponseInputTextParam(
350 type="input_text",
351 text=(
352 "<ARTICLE_TAG>\n"
353 f"ID: {tag.id}\n"
354 f"Title: {tag.text}\n"
355 f"Summary: {summary}\n"
356 "</ARTICLE_TAG>"
357 ),
358 )
359```
360
361### Pair mentions with retrieval tool calls
362
363When the referenced content is too large to inline, keep the tag lean (id + short summary) and let the model fetch details via a tool. In your system prompt, tell the assistant to call the retrieval tool when it sees an `ARTICLE_TAG`.
364
365Example tool paired with the converter above:
366
367```python
368from agents import Agent, StopAtTools, RunContextWrapper, function_tool
369from chatkit.agents import AgentContext
370
371
372@function_tool(description_override="Fetch full article content by id.")
373async def fetch_article(ctx: RunContextWrapper[AgentContext], article_id: str):
374 article = await load_article_content(article_id)
375 return {
376 "title": article.title,
377 "content": article.body,
378 "url": article.url,
379 }
380
381
382assistant = Agent[AgentContext](
383 ...,
384 tools=[fetch_article],
385)
386```
387
388In `tag_to_message_content`, include the id the tool expects (for example, `tag.id` or `tag.data["article_id"]`). The model can then decide to call `fetch_article` to pull the full text instead of relying solely on the brief summary in the tag.
389
390### Prompt the model about mentions
391
392Add short system guidance to help the assistant understand the input item that adds details about the @-mention.
393
394For example:
395
396```
397- <ARTICLE_TAG>...</ARTICLE_TAG> is a summary of an article the user referenced.
398- Use it as trusted context when answering questions about that article.
399- Do not restate the summary verbatim; answer the user’s question concisely.
400- Call the `fetch_article` tool with the article id from the tag when more
401 detail is needed or the user asks for specifics not in the summary.
402```
403
404Combined with the converter above, the model receives explicit, disambiguated entity context while users keep a rich mention UI.
405
406### Handle clicks and previews
407
408Clicks and hover previews apply to the tagged entities shown in past user messages. Mark an entity as interactive when you return it from `onTagSearch` so the client knows to wire these callbacks:
409
410```ts
411{
412 id: "article_123",
413 title: "The Future of AI",
414 group: "Trending",
415 icon: "globe",
416 interactive: true, // clickable/previewable
417 data: { type: "article" }
418}
419```
420
421- `entities.onClick` fires when a user clicks a tag in the transcript. Handle navigation or open a detail view. See the [onClick option](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onclick).
422- `entities.onRequestPreview` runs when the user hovers or taps a tag that has `interactive: true`. Return a `BasicRoot` widget; you can build one with `WidgetTemplate.build_basic(...)` if you are building the preview widgets server-side. See the [onRequestPreview option](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onrequestpreview).
423