openai/chatkit-python
Publicmirrored from https://github.com/openai/chatkit-pythonAvailable
docs/guides/stream-generated-images.md
206lines · modecode
| 1 | # Stream generated images |
| 2 | |
| 3 | Stream generated images to the client while your agent is running, and persist them in a storage-friendly format. |
| 4 | |
| 5 | This guide covers: |
| 6 | |
| 7 | - Adding an image generation tool to your agent |
| 8 | - Converting streamed base64 images into URLs so your datastore does not store raw base64 strings |
| 9 | - Converting generated image thread items to model input for continued conversation |
| 10 | - Streaming partial images (progressive previews) |
| 11 | |
| 12 | ## Add an image generation tool to your agent |
| 13 | |
| 14 | To let the model generate images, add the Agents SDK image generation tool to your agent's tool list. |
| 15 | |
| 16 | ```python |
| 17 | from agents import Agent |
| 18 | from agents.tool import ImageGenerationTool |
| 19 | |
| 20 | |
| 21 | agent = Agent( |
| 22 | name="designer", |
| 23 | instructions="Generate images when asked.", |
| 24 | tools=[ImageGenerationTool(tool_config={"type": "image_generation"})], |
| 25 | ) |
| 26 | ``` |
| 27 | |
| 28 | Once enabled, `stream_agent_response` will translate image generation output into ChatKit thread items: |
| 29 | |
| 30 | - A `GeneratedImageItem` is added when an image generation call starts. |
| 31 | - It is updated (for partial images) and finalized when the result arrives. |
| 32 | |
| 33 | ## Avoid storing raw base64 in your datastore |
| 34 | |
| 35 | By default, ChatKit stores generated images as a data URL (for example, `data:image/png;base64,...`) by using `ResponseStreamConverter.base64_image_to_url`. |
| 36 | |
| 37 | That's convenient for demos, but it can bloat your persisted thread items. In production, you'll usually want to: |
| 38 | |
| 39 | - Write the bytes to object storage / a file store |
| 40 | - Persist only a URL (or a signed URL) on the `GeneratedImageItem` |
| 41 | |
| 42 | ### Override `ResponseStreamConverter.base64_image_to_url` |
| 43 | |
| 44 | Subclass `ResponseStreamConverter` and override `base64_image_to_url`. This method is called for both: |
| 45 | |
| 46 | - Final images |
| 47 | - Partial images (when `partial_images` streaming is enabled) |
| 48 | |
| 49 | ```python |
| 50 | import base64 |
| 51 | |
| 52 | from chatkit.agents import ResponseStreamConverter |
| 53 | |
| 54 | |
| 55 | class MyResponseStreamConverter(ResponseStreamConverter): |
| 56 | async def base64_image_to_url( |
| 57 | self, |
| 58 | image_id: str, |
| 59 | base64_image: str, |
| 60 | partial_image_index: int | None = None, |
| 61 | ) -> str: |
| 62 | # `image_id` stays the same for the whole generation call (including partial updates). |
| 63 | # Use `partial_image_index` to derive distinct blob IDs for each partial image. |
| 64 | blob_id = ( |
| 65 | image_id |
| 66 | if partial_image_index is None |
| 67 | else f"{image_id}-partial-{partial_image_index}" |
| 68 | ) |
| 69 | # Replace `upload_blob(...)` with your app's storage call (S3, GCS, filesystem, etc). |
| 70 | # It should return a URL that your client can load later. |
| 71 | url = upload_blob( |
| 72 | blob_id, |
| 73 | base64.b64decode(base64_image), |
| 74 | "image/png", |
| 75 | ) |
| 76 | return url |
| 77 | ``` |
| 78 | |
| 79 | ### Pass your converter to `stream_agent_response` |
| 80 | |
| 81 | Create your converter and pass it into `stream_agent_response`. The returned URL will be what gets persisted on the `GeneratedImageItem`. |
| 82 | |
| 83 | ```python |
| 84 | from agents import Runner |
| 85 | |
| 86 | from chatkit.agents import AgentContext, stream_agent_response |
| 87 | |
| 88 | |
| 89 | async def respond(...): |
| 90 | agent_context = AgentContext( |
| 91 | thread=thread, |
| 92 | store=self.store, |
| 93 | request_context=context, |
| 94 | previous_response_id=thread.previous_response_id, |
| 95 | ) |
| 96 | result = Runner.run_streamed(agent, input_items, context=agent_context) |
| 97 | |
| 98 | async for event in stream_agent_response( |
| 99 | agent_context, |
| 100 | result, |
| 101 | converter=MyResponseStreamConverter(), |
| 102 | ): |
| 103 | yield event |
| 104 | ``` |
| 105 | |
| 106 | ## Convert generated image thread items to model input |
| 107 | |
| 108 | On later turns, you'll often feed prior thread items (including generated images) back into the model as context. |
| 109 | |
| 110 | By default, `ThreadItemConverter.generated_image_to_input` sends the generated image back to the model as: |
| 111 | |
| 112 | - A short text preface |
| 113 | - An `input_image` content part with `image_url=item.image.url` |
| 114 | |
| 115 | If `item.image.url` is not publicly reachable by the model runtime (for example, it's a private intranet URL, or a localhost URL, or requires cookies), image understanding and image-to-image flows may fail. |
| 116 | |
| 117 | Two common fixes: |
| 118 | |
| 119 | - Convert the stored image back into a base64 `data:` URL when building model input |
| 120 | - Generate a temporary public (signed) URL for the duration of the run |
| 121 | |
| 122 | ### Override `ThreadItemConverter.generated_image_to_input` |
| 123 | |
| 124 | Override `generated_image_to_input` and replace `image_url` with something the image API can fetch. |
| 125 | |
| 126 | ```python |
| 127 | import base64 |
| 128 | |
| 129 | from openai.types.responses import ResponseInputImageParam, ResponseInputTextParam |
| 130 | from openai.types.responses.response_input_item_param import Message |
| 131 | |
| 132 | from chatkit.agents import ThreadItemConverter |
| 133 | from chatkit.types import GeneratedImageItem |
| 134 | |
| 135 | |
| 136 | class MyThreadItemConverter(ThreadItemConverter): |
| 137 | async def generated_image_to_input(self, item: GeneratedImageItem): |
| 138 | if not item.image: |
| 139 | return None |
| 140 | |
| 141 | # Option A: rehydrate to a data URL (works when you can fetch bytes yourself). |
| 142 | # Replace `download_blob(...)` with your app's storage call to fetch the image bytes. |
| 143 | image_bytes = download_blob(item.image.id) |
| 144 | b64 = base64.b64encode(image_bytes).decode("utf-8") |
| 145 | image_url = f"data:image/png;base64,{b64}" |
| 146 | |
| 147 | # Option B: generate a temporary public URL instead: |
| 148 | # image_url = create_signed_url(item.image.id, expires_in_seconds=60) |
| 149 | |
| 150 | return Message( |
| 151 | type="message", |
| 152 | role="user", |
| 153 | content=[ |
| 154 | ResponseInputTextParam( |
| 155 | type="input_text", |
| 156 | text="The following image was generated by the agent.", |
| 157 | ), |
| 158 | ResponseInputImageParam( |
| 159 | type="input_image", |
| 160 | detail="auto", |
| 161 | image_url=image_url, |
| 162 | ), |
| 163 | ], |
| 164 | ) |
| 165 | ``` |
| 166 | |
| 167 | When building your model input, use your custom converter instead of `simple_to_agent_input`: |
| 168 | |
| 169 | ```python |
| 170 | input_items = await MyThreadItemConverter().to_agent_input(items) |
| 171 | ``` |
| 172 | |
| 173 | ## Stream partial images (progressive previews) |
| 174 | |
| 175 | You can stream partial images so users see progressive previews as the image is being generated. |
| 176 | |
| 177 | ### Enable partial images in the tool config |
| 178 | |
| 179 | Set `partial_images` in the tool config: |
| 180 | |
| 181 | ```python |
| 182 | from agents.tool import ImageGenerationTool |
| 183 | |
| 184 | image_tool = ImageGenerationTool( |
| 185 | tool_config={"type": "image_generation", "partial_images": 3}, |
| 186 | ) |
| 187 | ``` |
| 188 | |
| 189 | ### Show progress for partial images |
| 190 | |
| 191 | Pass the same `partial_images` value to `ResponseStreamConverter` (or your subclass). ChatKit uses it to compute a `progress` value (between 0 and 1) for each partial image update. |
| 192 | |
| 193 | ```python |
| 194 | async for event in stream_agent_response( |
| 195 | agent_context, |
| 196 | result, |
| 197 | converter=MyResponseStreamConverter(partial_images=3), |
| 198 | ): |
| 199 | yield event |
| 200 | ``` |
| 201 | |
| 202 | During the run, ChatKit will emit: |
| 203 | |
| 204 | - `ThreadItemAddedEvent` for the initial `GeneratedImageItem` |
| 205 | - `ThreadItemUpdatedEvent` with `GeneratedImageUpdated(image=..., progress=...)` for each partial image |
| 206 | - `ThreadItemDoneEvent` when the final image arrives |
| 207 | |