openai/tiktoken
Publicmirrored fromhttps://github.com/openai/tiktokenAvailable
CHANGELOG.md
132lines · modecode
| 1 | # Changelog |
| 2 | |
| 3 | This is the changelog for the open source version of tiktoken. |
| 4 | |
| 5 | ## [v0.13.0] |
| 6 | - Update fancy-regex for significantly increased performance |
| 7 | - Branch byte pair encoding to fix performance on unusual input |
| 8 | - Fix AttributeError caused by incomplete redaction of experimental code |
| 9 | - Update version of `pyo3` |
| 10 | - Update version of optional dependency `blobfile` |
| 11 | |
| 12 | ## [v0.12.0] |
| 13 | - Build wheels for Python 3.14 |
| 14 | - Build musllinux aarch64 wheels |
| 15 | - Support for free-threaded Python |
| 16 | - Update version of `pyo3` and `rustc-hash` |
| 17 | - Avoid use of `blobfile` for reading local files |
| 18 | - Recognise `gpt-5` model identifier |
| 19 | - Minor performance improvement for file reading |
| 20 | |
| 21 | ## [v0.11.0] |
| 22 | - Support for `GPT-5` |
| 23 | - Update version of `pyo3` |
| 24 | - Use new Rust edition |
| 25 | - Fix special token handling in `encode_to_numpy` |
| 26 | - Better error handling |
| 27 | - Improvements to private APIs |
| 28 | |
| 29 | ## [v0.10.0] |
| 30 | - Support for newer models |
| 31 | - Improvements to private APIs |
| 32 | |
| 33 | ## [v0.9.0] |
| 34 | - Support for `o1` and `o3` models |
| 35 | - Better error messages when loading invalid vocabulary files |
| 36 | - Support for encoding to numpy arrays |
| 37 | - Delayed imports when not strictly necessary |
| 38 | |
| 39 | ## [v0.8.0] |
| 40 | |
| 41 | - Support for `o1-` and `chatgpt-4o-` models |
| 42 | - Build wheels for Python 3.13 |
| 43 | - Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc! |
| 44 | - Provide a better error message and type for invalid token decode |
| 45 | - Permit tuples in type hints |
| 46 | - Better error message for passing invalid input to `get_encoding` |
| 47 | - Better error messages during plugin loading |
| 48 | - Add a `__version__` attribute |
| 49 | - Update versions of `pyo3`, `regex`, `fancy-regex` |
| 50 | - Drop support for Python 3.8 |
| 51 | |
| 52 | ## [v0.7.0] |
| 53 | |
| 54 | - Support for `gpt-4o` |
| 55 | - Performance improvements |
| 56 | |
| 57 | ## [v0.6.0] |
| 58 | |
| 59 | - Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc! |
| 60 | - Add `text-embedding-3-*` models to `encoding_for_model` |
| 61 | - Check content hash for downloaded files |
| 62 | - Allow pickling `Encoding` objects. Registered `Encoding` will be pickled by reference |
| 63 | - Workaround PyO3 bug for frozenset conversion |
| 64 | |
| 65 | Thank you to @paplorinc, @mdwelsh, @Praneet460! |
| 66 | |
| 67 | ## [v0.5.2] |
| 68 | |
| 69 | - Build wheels for Python 3.12 |
| 70 | - Update version of PyO3 to allow multiple imports |
| 71 | - Avoid permission errors when using default cache logic |
| 72 | |
| 73 | ## [v0.5.1] |
| 74 | |
| 75 | - Add `encoding_name_for_model`, undo some renames to variables that are implementation details |
| 76 | |
| 77 | ## [v0.5.0] |
| 78 | |
| 79 | - Add `tiktoken._educational` submodule to better document how byte pair encoding works |
| 80 | - Ensure `encoding_for_model` knows about several new models |
| 81 | - Add `decode_with_offets` |
| 82 | - Better error for failures with the plugin mechanism |
| 83 | - Make more tests public |
| 84 | - Update versions of dependencies |
| 85 | |
| 86 | ## [v0.4.0] |
| 87 | |
| 88 | - Add `decode_batch` and `decode_bytes_batch` |
| 89 | - Improve error messages and handling |
| 90 | |
| 91 | ## [v0.3.3] |
| 92 | |
| 93 | - `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding |
| 94 | Unicode character and will replace lone surrogates with the Unicode replacement character. |
| 95 | |
| 96 | ## [v0.3.2] |
| 97 | |
| 98 | - Add encoding for GPT-4 |
| 99 | |
| 100 | ## [v0.3.1] |
| 101 | |
| 102 | - Build aarch64 wheels |
| 103 | - Make `blobfile` an optional dependency |
| 104 | |
| 105 | Thank you to @messense for the environment variable that makes cargo not OOM under emulation! |
| 106 | |
| 107 | ## [v0.3.0] |
| 108 | |
| 109 | - Improve performance by 5-20%; thank you to @nistath! |
| 110 | - Add `gpt-3.5-turbo` models to `encoding_for_model` |
| 111 | - Add prefix matching to `encoding_for_model` to better support future model versions |
| 112 | - Fix a bug in the README instructions on extending tiktoken |
| 113 | - Update the set of available encodings |
| 114 | - Add packaging metadata |
| 115 | |
| 116 | ## [v0.2.0] |
| 117 | |
| 118 | - Add `tiktoken.encoding_for_model` to get the encoding for a specific model |
| 119 | - Improve portability of caching logic |
| 120 | |
| 121 | Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections |
| 122 | |
| 123 | ## [v0.1.2] |
| 124 | |
| 125 | - Avoid use of `blobfile` for public files |
| 126 | - Add support for Python 3.8 |
| 127 | - Add py.typed |
| 128 | - Improve the public tests |
| 129 | |
| 130 | ## [v0.1.1] |
| 131 | |
| 132 | - Initial release |
| 133 | |