openai/chatkit-python

Public

mirrored from https://github.com/openai/chatkit-pythonAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
main

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

docs/guides/stream-generated-images.md

206lines · modecode

1# Stream generated images
2
3Stream generated images to the client while your agent is running, and persist them in a storage-friendly format.
4
5This guide covers:
6
7- Adding an image generation tool to your agent
8- Converting streamed base64 images into URLs so your datastore does not store raw base64 strings
9- Converting generated image thread items to model input for continued conversation
10- Streaming partial images (progressive previews)
11
12## Add an image generation tool to your agent
13
14To let the model generate images, add the Agents SDK image generation tool to your agent's tool list.
15
16```python
17from agents import Agent
18from agents.tool import ImageGenerationTool
19
20
21agent = Agent(
22 name="designer",
23 instructions="Generate images when asked.",
24 tools=[ImageGenerationTool(tool_config={"type": "image_generation"})],
25)
26```
27
28Once enabled, `stream_agent_response` will translate image generation output into ChatKit thread items:
29
30- A `GeneratedImageItem` is added when an image generation call starts.
31- It is updated (for partial images) and finalized when the result arrives.
32
33## Avoid storing raw base64 in your datastore
34
35By default, ChatKit stores generated images as a data URL (for example, `data:image/png;base64,...`) by using `ResponseStreamConverter.base64_image_to_url`.
36
37That's convenient for demos, but it can bloat your persisted thread items. In production, you'll usually want to:
38
39- Write the bytes to object storage / a file store
40- Persist only a URL (or a signed URL) on the `GeneratedImageItem`
41
42### Override `ResponseStreamConverter.base64_image_to_url`
43
44Subclass `ResponseStreamConverter` and override `base64_image_to_url`. This method is called for both:
45
46- Final images
47- Partial images (when `partial_images` streaming is enabled)
48
49```python
50import base64
51
52from chatkit.agents import ResponseStreamConverter
53
54
55class MyResponseStreamConverter(ResponseStreamConverter):
56 async def base64_image_to_url(
57 self,
58 image_id: str,
59 base64_image: str,
60 partial_image_index: int | None = None,
61 ) -> str:
62 # `image_id` stays the same for the whole generation call (including partial updates).
63 # Use `partial_image_index` to derive distinct blob IDs for each partial image.
64 blob_id = (
65 image_id
66 if partial_image_index is None
67 else f"{image_id}-partial-{partial_image_index}"
68 )
69 # Replace `upload_blob(...)` with your app's storage call (S3, GCS, filesystem, etc).
70 # It should return a URL that your client can load later.
71 url = upload_blob(
72 blob_id,
73 base64.b64decode(base64_image),
74 "image/png",
75 )
76 return url
77```
78
79### Pass your converter to `stream_agent_response`
80
81Create your converter and pass it into `stream_agent_response`. The returned URL will be what gets persisted on the `GeneratedImageItem`.
82
83```python
84from agents import Runner
85
86from chatkit.agents import AgentContext, stream_agent_response
87
88
89async def respond(...):
90 agent_context = AgentContext(
91 thread=thread,
92 store=self.store,
93 request_context=context,
94 previous_response_id=thread.previous_response_id,
95 )
96 result = Runner.run_streamed(agent, input_items, context=agent_context)
97
98 async for event in stream_agent_response(
99 agent_context,
100 result,
101 converter=MyResponseStreamConverter(),
102 ):
103 yield event
104```
105
106## Convert generated image thread items to model input
107
108On later turns, you'll often feed prior thread items (including generated images) back into the model as context.
109
110By default, `ThreadItemConverter.generated_image_to_input` sends the generated image back to the model as:
111
112- A short text preface
113- An `input_image` content part with `image_url=item.image.url`
114
115If `item.image.url` is not publicly reachable by the model runtime (for example, it's a private intranet URL, or a localhost URL, or requires cookies), image understanding and image-to-image flows may fail.
116
117Two common fixes:
118
119- Convert the stored image back into a base64 `data:` URL when building model input
120- Generate a temporary public (signed) URL for the duration of the run
121
122### Override `ThreadItemConverter.generated_image_to_input`
123
124Override `generated_image_to_input` and replace `image_url` with something the image API can fetch.
125
126```python
127import base64
128
129from openai.types.responses import ResponseInputImageParam, ResponseInputTextParam
130from openai.types.responses.response_input_item_param import Message
131
132from chatkit.agents import ThreadItemConverter
133from chatkit.types import GeneratedImageItem
134
135
136class MyThreadItemConverter(ThreadItemConverter):
137 async def generated_image_to_input(self, item: GeneratedImageItem):
138 if not item.image:
139 return None
140
141 # Option A: rehydrate to a data URL (works when you can fetch bytes yourself).
142 # Replace `download_blob(...)` with your app's storage call to fetch the image bytes.
143 image_bytes = download_blob(item.image.id)
144 b64 = base64.b64encode(image_bytes).decode("utf-8")
145 image_url = f"data:image/png;base64,{b64}"
146
147 # Option B: generate a temporary public URL instead:
148 # image_url = create_signed_url(item.image.id, expires_in_seconds=60)
149
150 return Message(
151 type="message",
152 role="user",
153 content=[
154 ResponseInputTextParam(
155 type="input_text",
156 text="The following image was generated by the agent.",
157 ),
158 ResponseInputImageParam(
159 type="input_image",
160 detail="auto",
161 image_url=image_url,
162 ),
163 ],
164 )
165```
166
167When building your model input, use your custom converter instead of `simple_to_agent_input`:
168
169```python
170input_items = await MyThreadItemConverter().to_agent_input(items)
171```
172
173## Stream partial images (progressive previews)
174
175You can stream partial images so users see progressive previews as the image is being generated.
176
177### Enable partial images in the tool config
178
179Set `partial_images` in the tool config:
180
181```python
182from agents.tool import ImageGenerationTool
183
184image_tool = ImageGenerationTool(
185 tool_config={"type": "image_generation", "partial_images": 3},
186)
187```
188
189### Show progress for partial images
190
191Pass the same `partial_images` value to `ResponseStreamConverter` (or your subclass). ChatKit uses it to compute a `progress` value (between 0 and 1) for each partial image update.
192
193```python
194async for event in stream_agent_response(
195 agent_context,
196 result,
197 converter=MyResponseStreamConverter(partial_images=3),
198):
199 yield event
200```
201
202During the run, ChatKit will emit:
203
204- `ThreadItemAddedEvent` for the initial `GeneratedImageItem`
205- `ThreadItemUpdatedEvent` with `GeneratedImageUpdated(image=..., progress=...)` for each partial image
206- `ThreadItemDoneEvent` when the final image arrives
207