openai/chatkit-python

Public

mirrored fromhttps://github.com/openai/chatkit-pythonAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
v1.6.2

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

docs/guides/accept-rich-user-input.md

364lines · modecode

1# Accept rich user input
2
3This guide explains how a ChatKit server accepts user input beyond plain text—such as attachments and @-mentions—and makes it available to your inference pipeline.
4
5At a high level:
6
7- Attachments let users upload files that your model can read.
8- @-mentions let users tag entities so the model does not have to guess from free text.
9
10## Attachments: let users upload files
11
12Let users attach files/images by turning on client support, choosing an upload strategy, wiring the upload endpoints, and converting attachments to model inputs.
13
14### Enable attachments in the client
15
16Turn on attachments in the composer and configure client-side limits:
17
18```ts
19const chatkit = useChatKit({
20 // ...
21 composer: {
22 attachments: {
23 enabled: true,
24 // configure accepted MIME types, count, and size limits here
25 },
26 },
27});
28```
29
30Under the hood this maps to `ChatKitOptions.composer.attachments`; see the [`composer.attachments` docs](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/composeroption/#attachments) for all available options.
31
32### Configure an upload strategy
33
34Set [`ChatKitOptions.api.uploadStrategy`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/fileuploadstrategy/) to:
35
36- **Direct**: your backend exposes a single upload URL that accepts the bytes and writes attachment metadata to your `Store`. Simpler and faster when you control uploads directly from the app server.
37- **Two-phase**: the client makes a ChatKit API request to create an attachment metadata record (which forwards the request to `AttachmentStore`), you return an `upload_url` as part of the created attachment metadata, and the client uploads bytes in a second step. Prefer this when you front object storage with presigned/temporary URLs or want to offload upload bandwidth (for example, to a third-party blob storage).
38
39Both strategies still require an `AttachmentStore` for delete cleanup. Choose direct for simplicity on the same origin; choose two-phase for cloud storage and larger files.
40
41### Enforce attachment access control
42
43Neither attachment metadata nor file bytes are protected by ChatKit. Use the `context` passed into your `AttachmentStore` methods to authorize every create/read/delete. Only return IDs, bytes, or signed URLs when the caller owns the attachment, and prefer short-lived download URLs. Skipping these checks can leak customer data.
44
45### Direct upload
46
47Add the upload endpoint referenced in `uploadStrategy`. It must:
48
49- accept `multipart/form-data` with a `file` field,
50- store the bytes wherever you like,
51- create `Attachment` metadata, persist it via `Store.save_attachment`, and
52- return the `Attachment` JSON.
53
54Implement `AttachmentStore.delete_attachment` to delete the stored bytes; `ChatKitServer` will then call `Store.delete_attachment` to drop metadata.
55
56Example client configuration:
57
58```js
59{
60 type: "direct",
61 uploadUrl: "/files",
62}
63```
64
65Example FastAPI direct upload endpoint:
66
67```python
68@app.post("/files")
69async def upload_file(request: Request):
70 form_data = await request.form()
71 file = form_data.get("file")
72
73 # Your blob store upload
74 attachment = await upload_to_blob_store(file)
75
76 return Response(content=attachment.model_dump_json(), media_type="application/json")
77```
78
79### Two-phase upload
80
81Implement `AttachmentStore.create_attachment` to:
82
83- build an `upload_url` that accepts `multipart/form-data` with a `file` field (direct PUTs are currently not supported),
84- build the `Attachment` model,
85- persist it via `Store.save_attachment`, and
86- return it.
87
88Implement `AttachmentStore.delete_attachment` to delete the stored bytes; `ChatKitServer` will call `Store.delete_attachment` afterward.
89
90- The client POSTs the bytes to `upload_url` after it receives the created attachment metadata in the response.
91
92Client configuration:
93
94```js
95{
96 type: "two_phase",
97}
98```
99
100Example two-phase store issuing a multipart upload URL:
101
102```python
103attachment_store = BlobAttachmentStore()
104server = MyChatKitServer(store=data_store, attachment_store=attachment_store)
105
106
107class BlobAttachmentStore(AttachmentStore[RequestContext]):
108 def generate_attachment_id(self, mime_type: str, context: RequestContext) -> str:
109 return f\"att_{uuid4().hex}\"
110
111 async def create_attachment(
112 self, input: AttachmentCreateParams, context: RequestContext
113 ) -> Attachment:
114 att_id = self.generate_attachment_id(input.mime_type, context)
115 upload_url = issue_multipart_upload_url(att_id, input.mime_type) # your blob store
116 attachment = Attachment(
117 id=att_id,
118 mime_type=input.mime_type,
119 name=input.name,
120 upload_url=upload_url,
121 )
122 await data_store.save_attachment(attachment, context=context)
123 return attachment
124
125 async def delete_attachment(self, attachment_id: str, context: RequestContext) -> None:
126 await delete_blob(att_id=attachment_id) # your blob store
127```
128
129### Convert attachments to model input
130
131Attachments arrive on `input_user_message.attachments` in `ChatKitServer.respond`. The default `ThreadItemConverter` does not handle them, so subclass and implement `attachment_to_message_content` to return a `ResponseInputContentParam` before calling `Runner.run_streamed`.
132
133Example using a blob fetch helper:
134
135```python
136from chatkit.agents import ThreadItemConverter
137from chatkit.types import ImageAttachment
138from openai.types.responses import ResponseInputFileParam, ResponseInputImageParam
139
140
141async def read_bytes(attachment_id: str) -> bytes:
142 ... # fetch from your blob store
143
144
145def as_data_url(mime: str, content: bytes) -> str:
146 return "data:" + mime + ";base64," + base64.b64encode(content).decode("utf-8")
147
148
149class MyConverter(ThreadItemConverter):
150 async def attachment_to_message_content(self, attachment):
151 content = await read_bytes(attachment.id)
152 if isinstance(attachment, ImageAttachment):
153 return ResponseInputImageParam(
154 type="input_image",
155 detail="auto",
156 image_url=as_data_url(attachment.mime_type, content),
157 )
158 if attachment.mime_type == "application/pdf":
159 return ResponseInputFileParam(
160 type="input_file",
161 file_data=as_data_url(attachment.mime_type, content),
162 filename=attachment.name or "unknown",
163 )
164 # For other text formats, check for API support first before
165 # sending as a ResponseInputFileParam.
166```
167
168### Show image attachment previews in thread
169
170Set `ImageAttachment.preview_url` to allow the client to render thumbnails.
171
172- If your preview URLs are **permanent/public**, set `preview_url` once when creating the attachment and persist it.
173- If your storage uses **expiring URLs**, generate a fresh `preview_url` when returning attachment metadata (for example, in `Store.load_thread_items` and `Store.load_attachment`) rather than persisting a long-lived URL. In this case, returning a short-lived signed URL directly is the simplest approach. Alternatively, you may return a redirect that resolves to a temporary signed URL, as long as the final URL serves image bytes with appropriate CORS headers.
174
175## Dictation: speech-to-text input
176
177Enable dictation so users can record audio and have it transcribed into text before sending.
178
179### Enable dictation in the client
180
181Turn on dictation in the composer:
182
183```ts
184const chatkit = useChatKit({
185 // ...
186 composer: {
187 dictation: {
188 enabled: true,
189 },
190 },
191});
192```
193
194This maps to `ChatKitOptions.composer.dictation`.
195
196### Implement `ChatKitServer.transcribe`
197
198When dictation is enabled, the client records audio and sends it to your backend for transcription. Implement `ChatKitServer.transcribe` to accept audio input and return a transcription result.
199
200The client sends one of:
201
202- `"audio/webm;codecs=opus"` (preferred for Chrome/Firefox/Safari 18.4+)
203- `"audio/mp4"` (fallback for older Safari/iOS)
204- `"audio/ogg;codecs=opus"` (alternative for some environments)
205
206The raw value is available as `audio_input.mime_type`. Use `audio_input.media_type` when you only need the base media type (`"audio/webm"`, `"audio/ogg"`, or `"audio/mp4"`).
207
208Example transcription method using the OpenAI Audio API:
209
210```python
211async def transcribe(self, audio_input: AudioInput, context: RequestContext) -> TranscriptionResult:
212 ext = {
213 "audio/webm": "webm",
214 "audio/mp4": "m4a",
215 "audio/ogg": "ogg",
216 }.get(audio_input.media_type)
217 if not ext:
218 raise HTTPException(status_code=400, detail="Unexpected audio format")
219
220 audio_file = io.BytesIO(audio_input.data)
221 audio_file.name = f"audio.{ext}"
222 transcription = client.audio.transcriptions.create(
223 model="gpt-4o-transcribe",
224 file=audio_file
225 )
226 return TranscriptionResult(text=transcription.text)
227```
228
229Return a `TranscriptionResult` that includes the final `text` that should appear in the composer.
230
231## @-mentions: tag entities in user messages
232
233Enable @-mentions so users can tag entities (like documents, tickets, or users) instead of pasting raw identifiers. Mentions travel through ChatKit as structured tags so the model can resolve entities instead of guessing from free text.
234
235### Enable as-you-type entity lookup in the composer
236
237To enable entity tagging as @-mentions in the composer, configure [`entities.onTagSearch`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#ontagsearch) as a ChatKit.js option.
238
239It should return a list of [Entity](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entity/) objects that match the query string.
240
241If you want to hint that @-mentions are available, enable the composer `@` button by setting [`entities.showComposerMenu`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#showcomposermenu). When clicked, it inserts `@` into the composer and opens the tag search menu automatically.
242
243```ts
244const chatkit = useChatKit({
245 // ...
246 entities: {
247 onTagSearch: async (query: string) => {
248 return [
249 {
250 id: "article_123",
251 title: "The Future of AI",
252 group: "Trending",
253 icon: "globe",
254 data: { type: "article" }
255 },
256 {
257 id: "article_124",
258 title: "One weird trick to improve your sleep",
259 group: "Trending",
260 icon: "globe",
261 data: { type: "article" }
262 },
263 ]
264 },
265 // Optional: show the "@" button in the composer for added discoverability.
266 showComposerMenu: true,
267 },
268})
269```
270
271### Convert tags into model input in your server
272
273Mentions arrive server-side as structured tags. Override `ThreadItemConverter.tag_to_message_content` to describe what each tag refers to and translate it into model-readable content.
274
275Example converter method that wraps the tagged entity details in custom markup:
276
277```python
278from chatkit.agents import ThreadItemConverter
279from chatkit.types import UserMessageTagContent
280from openai.types.responses import ResponseInputTextParam
281
282
283class MyThreadItemConverter(ThreadItemConverter):
284 async def tag_to_message_content(
285 self, tag: UserMessageTagContent
286 ) -> ResponseInputTextParam:
287 if tag.type == "article":
288 # Load or unpack the entity the tag refers to
289 summary = await fetch_article_summary(tag.id)
290 return ResponseInputTextParam(
291 type="input_text",
292 text=(
293 "<ARTICLE_TAG>\n"
294 f"ID: {tag.id}\n"
295 f"Title: {tag.text}\n"
296 f"Summary: {summary}\n"
297 "</ARTICLE_TAG>"
298 ),
299 )
300```
301
302### Pair mentions with retrieval tool calls
303
304When the referenced content is too large to inline, keep the tag lean (id + short summary) and let the model fetch details via a tool. In your system prompt, tell the assistant to call the retrieval tool when it sees an `ARTICLE_TAG`.
305
306Example tool paired with the converter above:
307
308```python
309from agents import Agent, StopAtTools, RunContextWrapper, function_tool
310from chatkit.agents import AgentContext
311
312
313@function_tool(description_override="Fetch full article content by id.")
314async def fetch_article(ctx: RunContextWrapper[AgentContext], article_id: str):
315 article = await load_article_content(article_id)
316 return {
317 "title": article.title,
318 "content": article.body,
319 "url": article.url,
320 }
321
322
323assistant = Agent[AgentContext](
324 ...,
325 tools=[fetch_article],
326)
327```
328
329In `tag_to_message_content`, include the id the tool expects (for example, `tag.id` or `tag.data["article_id"]`). The model can then decide to call `fetch_article` to pull the full text instead of relying solely on the brief summary in the tag.
330
331### Prompt the model about mentions
332
333Add short system guidance to help the assistant understand the input item that adds details about the @-mention.
334
335For example:
336
337```
338- <ARTICLE_TAG>...</ARTICLE_TAG> is a summary of an article the user referenced.
339- Use it as trusted context when answering questions about that article.
340- Do not restate the summary verbatim; answer the user’s question concisely.
341- Call the `fetch_article` tool with the article id from the tag when more
342 detail is needed or the user asks for specifics not in the summary.
343```
344
345Combined with the converter above, the model receives explicit, disambiguated entity context while users keep a rich mention UI.
346
347### Handle clicks and previews
348
349Clicks and hover previews apply to the tagged entities shown in past user messages. Mark an entity as interactive when you return it from `onTagSearch` so the client knows to wire these callbacks:
350
351```ts
352{
353 id: "article_123",
354 title: "The Future of AI",
355 group: "Trending",
356 icon: "globe",
357 interactive: true, // clickable/previewable
358 data: { type: "article" }
359}
360```
361
362- `entities.onClick` fires when a user clicks a tag in the transcript. Handle navigation or open a detail view. See the [onClick option](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onclick).
363- `entities.onRequestPreview` runs when the user hovers or taps a tag that has `interactive: true`. Return a `BasicRoot` widget; you can build one with `WidgetTemplate.build_basic(...)` if you are building the preview widgets server-side. See the [onRequestPreview option](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onrequestpreview).
364
365