openai/chatkit-python

Public

mirrored from https://github.com/openai/chatkit-pythonAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

main

Find a branch or tag

Branches

main

Clone

HTTPS

Download ZIP

chatkit-python/docs/guides

docs/guides/stream-generated-images.md

206lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`# Stream generated images`
2
3	`Stream generated images to the client while your agent is running, and persist them in a storage-friendly format.`
4
5	`This guide covers:`
6
7	`- Adding an image generation tool to your agent`
8	`- Converting streamed base64 images into URLs so your datastore does not store raw base64 strings`
9	`- Converting generated image thread items to model input for continued conversation`
10	`- Streaming partial images (progressive previews)`
11
12	`## Add an image generation tool to your agent`
13
14	`To let the model generate images, add the Agents SDK image generation tool to your agent's tool list.`
15
16	```python
17	`from agents import Agent`
18	`from agents.tool import ImageGenerationTool`
19
20
21	`agent = Agent(`
22	`name="designer",`
23	`instructions="Generate images when asked.",`
24	`tools=[ImageGenerationTool(tool_config={"type": "image_generation"})],`
25	`)`
26	```
27
28	Once enabled, `stream_agent_response` will translate image generation output into ChatKit thread items:
29
30	- A `GeneratedImageItem` is added when an image generation call starts.
31	`- It is updated (for partial images) and finalized when the result arrives.`
32
33	`## Avoid storing raw base64 in your datastore`
34
35	By default, ChatKit stores generated images as a data URL (for example, `data:image/png;base64,...`) by using `ResponseStreamConverter.base64_image_to_url`.
36
37	`That's convenient for demos, but it can bloat your persisted thread items. In production, you'll usually want to:`
38
39	`- Write the bytes to object storage / a file store`
40	- Persist only a URL (or a signed URL) on the `GeneratedImageItem`
41
42	### Override `ResponseStreamConverter.base64_image_to_url`
43
44	Subclass `ResponseStreamConverter` and override `base64_image_to_url`. This method is called for both:
45
46	`- Final images`
47	- Partial images (when `partial_images` streaming is enabled)
48
49	```python
50	`import base64`
51
52	`from chatkit.agents import ResponseStreamConverter`
53
54
55	`class MyResponseStreamConverter(ResponseStreamConverter):`
56	`async def base64_image_to_url(`
57	`self,`
58	`image_id: str,`
59	`base64_image: str,`
60	`partial_image_index: int \| None = None,`
61	`) -> str:`
62	# `image_id` stays the same for the whole generation call (including partial updates).
63	# Use `partial_image_index` to derive distinct blob IDs for each partial image.
64	`blob_id = (`
65	`image_id`
66	`if partial_image_index is None`
67	`else f"{image_id}-partial-{partial_image_index}"`
68	`)`
69	# Replace `upload_blob(...)` with your app's storage call (S3, GCS, filesystem, etc).
70	`# It should return a URL that your client can load later.`
71	`url = upload_blob(`
72	`blob_id,`
73	`base64.b64decode(base64_image),`
74	`"image/png",`
75	`)`
76	`return url`
77	```
78
79	### Pass your converter to `stream_agent_response`
80
81	Create your converter and pass it into `stream_agent_response`. The returned URL will be what gets persisted on the `GeneratedImageItem`.
82
83	```python
84	`from agents import Runner`
85
86	`from chatkit.agents import AgentContext, stream_agent_response`
87
88
89	`async def respond(...):`
90	`agent_context = AgentContext(`
91	`thread=thread,`
92	`store=self.store,`
93	`request_context=context,`
94	`previous_response_id=thread.previous_response_id,`
95	`)`
96	`result = Runner.run_streamed(agent, input_items, context=agent_context)`
97
98	`async for event in stream_agent_response(`
99	`agent_context,`
100	`result,`
101	`converter=MyResponseStreamConverter(),`
102	`):`
103	`yield event`
104	```
105
106	`## Convert generated image thread items to model input`
107
108	`On later turns, you'll often feed prior thread items (including generated images) back into the model as context.`
109
110	By default, `ThreadItemConverter.generated_image_to_input` sends the generated image back to the model as:
111
112	`- A short text preface`
113	- An `input_image` content part with `image_url=item.image.url`
114
115	If `item.image.url` is not publicly reachable by the model runtime (for example, it's a private intranet URL, or a localhost URL, or requires cookies), image understanding and image-to-image flows may fail.
116
117	`Two common fixes:`
118
119	- Convert the stored image back into a base64 `data:` URL when building model input
120	`- Generate a temporary public (signed) URL for the duration of the run`
121
122	### Override `ThreadItemConverter.generated_image_to_input`
123
124	Override `generated_image_to_input` and replace `image_url` with something the image API can fetch.
125
126	```python
127	`import base64`
128
129	`from openai.types.responses import ResponseInputImageParam, ResponseInputTextParam`
130	`from openai.types.responses.response_input_item_param import Message`
131
132	`from chatkit.agents import ThreadItemConverter`
133	`from chatkit.types import GeneratedImageItem`
134
135
136	`class MyThreadItemConverter(ThreadItemConverter):`
137	`async def generated_image_to_input(self, item: GeneratedImageItem):`
138	`if not item.image:`
139	`return None`
140
141	`# Option A: rehydrate to a data URL (works when you can fetch bytes yourself).`
142	# Replace `download_blob(...)` with your app's storage call to fetch the image bytes.
143	`image_bytes = download_blob(item.image.id)`
144	`b64 = base64.b64encode(image_bytes).decode("utf-8")`
145	`image_url = f"data:image/png;base64,{b64}"`
146
147	`# Option B: generate a temporary public URL instead:`
148	`# image_url = create_signed_url(item.image.id, expires_in_seconds=60)`
149
150	`return Message(`
151	`type="message",`
152	`role="user",`
153	`content=[`
154	`ResponseInputTextParam(`
155	`type="input_text",`
156	`text="The following image was generated by the agent.",`
157	`),`
158	`ResponseInputImageParam(`
159	`type="input_image",`
160	`detail="auto",`
161	`image_url=image_url,`
162	`),`
163	`],`
164	`)`
165	```
166
167	When building your model input, use your custom converter instead of `simple_to_agent_input`:
168
169	```python
170	`input_items = await MyThreadItemConverter().to_agent_input(items)`
171	```
172
173	`## Stream partial images (progressive previews)`
174
175	`You can stream partial images so users see progressive previews as the image is being generated.`
176
177	`### Enable partial images in the tool config`
178
179	Set `partial_images` in the tool config:
180
181	```python
182	`from agents.tool import ImageGenerationTool`
183
184	`image_tool = ImageGenerationTool(`
185	`tool_config={"type": "image_generation", "partial_images": 3},`
186	`)`
187	```
188
189	`### Show progress for partial images`
190
191	Pass the same `partial_images` value to `ResponseStreamConverter` (or your subclass). ChatKit uses it to compute a `progress` value (between 0 and 1) for each partial image update.
192
193	```python
194	`async for event in stream_agent_response(`
195	`agent_context,`
196	`result,`
197	`converter=MyResponseStreamConverter(partial_images=3),`
198	`):`
199	`yield event`
200	```
201
202	`During the run, ChatKit will emit:`
203
204	- `ThreadItemAddedEvent` for the initial `GeneratedImageItem`
205	- `ThreadItemUpdatedEvent` with `GeneratedImageUpdated(image=..., progress=...)` for each partial image
206	- `ThreadItemDoneEvent` when the final image arrives
207

openai/chatkit-python

Branches

Tags

Clone