openai/tiktoken
Publicmirrored fromhttps://github.com/openai/tiktokenAvailable
CHANGELOG.md
98lines · modecode
| 1 | # Changelog |
| 2 | |
| 3 | This is the changelog for the open source version of tiktoken. |
| 4 | |
| 5 | ## [v0.8.0] |
| 6 | |
| 7 | - Support for `o1-` and `chatgpt-4o-` models |
| 8 | - Build wheels for Python 3.13 |
| 9 | - Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc! |
| 10 | - Provide a better error message and type for invalid token decode |
| 11 | - Permit tuples in type hints |
| 12 | - Better error message for passing invalid input to `get_encoding` |
| 13 | - Better error messages during plugin loading |
| 14 | - Add a `__version__` attribute |
| 15 | - Update versions of `pyo3`, `regex`, `fancy-regex` |
| 16 | - Drop support for Python 3.8 |
| 17 | |
| 18 | ## [v0.7.0] |
| 19 | |
| 20 | - Support for `gpt-4o` |
| 21 | - Performance improvements |
| 22 | |
| 23 | ## [v0.6.0] |
| 24 | |
| 25 | - Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc! |
| 26 | - Add `text-embedding-3-*` models to `encoding_for_model` |
| 27 | - Check content hash for downloaded files |
| 28 | - Allow pickling `Encoding` objects. Registered `Encoding` will be pickled by reference |
| 29 | - Workaround PyO3 bug for frozenset conversion |
| 30 | |
| 31 | Thank you to @paplorinc, @mdwelsh, @Praneet460! |
| 32 | |
| 33 | ## [v0.5.2] |
| 34 | |
| 35 | - Build wheels for Python 3.12 |
| 36 | - Update version of PyO3 to allow multiple imports |
| 37 | - Avoid permission errors when using default cache logic |
| 38 | |
| 39 | ## [v0.5.1] |
| 40 | |
| 41 | - Add `encoding_name_for_model`, undo some renames to variables that are implementation details |
| 42 | |
| 43 | ## [v0.5.0] |
| 44 | |
| 45 | - Add `tiktoken._educational` submodule to better document how byte pair encoding works |
| 46 | - Ensure `encoding_for_model` knows about several new models |
| 47 | - Add `decode_with_offets` |
| 48 | - Better error for failures with the plugin mechanism |
| 49 | - Make more tests public |
| 50 | - Update versions of dependencies |
| 51 | |
| 52 | ## [v0.4.0] |
| 53 | |
| 54 | - Add `decode_batch` and `decode_bytes_batch` |
| 55 | - Improve error messages and handling |
| 56 | |
| 57 | ## [v0.3.3] |
| 58 | |
| 59 | - `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding |
| 60 | Unicode character and will replace lone surrogates with the Unicode replacement character. |
| 61 | |
| 62 | ## [v0.3.2] |
| 63 | |
| 64 | - Add encoding for GPT-4 |
| 65 | |
| 66 | ## [v0.3.1] |
| 67 | |
| 68 | - Build aarch64 wheels |
| 69 | - Make `blobfile` an optional dependency |
| 70 | |
| 71 | Thank you to @messense for the environment variable that makes cargo not OOM under emulation! |
| 72 | |
| 73 | ## [v0.3.0] |
| 74 | |
| 75 | - Improve performance by 5-20%; thank you to @nistath! |
| 76 | - Add `gpt-3.5-turbo` models to `encoding_for_model` |
| 77 | - Add prefix matching to `encoding_for_model` to better support future model versions |
| 78 | - Fix a bug in the README instructions on extending tiktoken |
| 79 | - Update the set of available encodings |
| 80 | - Add packaging metadata |
| 81 | |
| 82 | ## [v0.2.0] |
| 83 | |
| 84 | - Add `tiktoken.encoding_for_model` to get the encoding for a specific model |
| 85 | - Improve portability of caching logic |
| 86 | |
| 87 | Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections |
| 88 | |
| 89 | ## [v0.1.2] |
| 90 | |
| 91 | - Avoid use of `blobfile` for public files |
| 92 | - Add support for Python 3.8 |
| 93 | - Add py.typed |
| 94 | - Improve the public tests |
| 95 | |
| 96 | ## [v0.1.1] |
| 97 | |
| 98 | - Initial release |
| 99 | |