openai/tiktoken

Public

mirrored fromhttps://github.com/openai/tiktokenAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
fa8b65d062fb6a656ac3810c89efde4c8ab999e2

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

CHANGELOG.md

132lines · modecode

1# Changelog
2
3This is the changelog for the open source version of tiktoken.
4
5## [v0.13.0]
6- Update fancy-regex for significantly increased performance
7- Branch byte pair encoding to fix performance on unusual input
8- Fix AttributeError caused by incomplete redaction of experimental code
9- Update version of `pyo3`
10- Update version of optional dependency `blobfile`
11
12## [v0.12.0]
13- Build wheels for Python 3.14
14- Build musllinux aarch64 wheels
15- Support for free-threaded Python
16- Update version of `pyo3` and `rustc-hash`
17- Avoid use of `blobfile` for reading local files
18- Recognise `gpt-5` model identifier
19- Minor performance improvement for file reading
20
21## [v0.11.0]
22- Support for `GPT-5`
23- Update version of `pyo3`
24- Use new Rust edition
25- Fix special token handling in `encode_to_numpy`
26- Better error handling
27- Improvements to private APIs
28
29## [v0.10.0]
30- Support for newer models
31- Improvements to private APIs
32
33## [v0.9.0]
34- Support for `o1` and `o3` models
35- Better error messages when loading invalid vocabulary files
36- Support for encoding to numpy arrays
37- Delayed imports when not strictly necessary
38
39## [v0.8.0]
40
41- Support for `o1-` and `chatgpt-4o-` models
42- Build wheels for Python 3.13
43- Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc!
44- Provide a better error message and type for invalid token decode
45- Permit tuples in type hints
46- Better error message for passing invalid input to `get_encoding`
47- Better error messages during plugin loading
48- Add a `__version__` attribute
49- Update versions of `pyo3`, `regex`, `fancy-regex`
50- Drop support for Python 3.8
51
52## [v0.7.0]
53
54- Support for `gpt-4o`
55- Performance improvements
56
57## [v0.6.0]
58
59- Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc!
60- Add `text-embedding-3-*` models to `encoding_for_model`
61- Check content hash for downloaded files
62- Allow pickling `Encoding` objects. Registered `Encoding` will be pickled by reference
63- Workaround PyO3 bug for frozenset conversion
64
65Thank you to @paplorinc, @mdwelsh, @Praneet460!
66
67## [v0.5.2]
68
69- Build wheels for Python 3.12
70- Update version of PyO3 to allow multiple imports
71- Avoid permission errors when using default cache logic
72
73## [v0.5.1]
74
75- Add `encoding_name_for_model`, undo some renames to variables that are implementation details
76
77## [v0.5.0]
78
79- Add `tiktoken._educational` submodule to better document how byte pair encoding works
80- Ensure `encoding_for_model` knows about several new models
81- Add `decode_with_offets`
82- Better error for failures with the plugin mechanism
83- Make more tests public
84- Update versions of dependencies
85
86## [v0.4.0]
87
88- Add `decode_batch` and `decode_bytes_batch`
89- Improve error messages and handling
90
91## [v0.3.3]
92
93- `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding
94 Unicode character and will replace lone surrogates with the Unicode replacement character.
95
96## [v0.3.2]
97
98- Add encoding for GPT-4
99
100## [v0.3.1]
101
102- Build aarch64 wheels
103- Make `blobfile` an optional dependency
104
105Thank you to @messense for the environment variable that makes cargo not OOM under emulation!
106
107## [v0.3.0]
108
109- Improve performance by 5-20%; thank you to @nistath!
110- Add `gpt-3.5-turbo` models to `encoding_for_model`
111- Add prefix matching to `encoding_for_model` to better support future model versions
112- Fix a bug in the README instructions on extending tiktoken
113- Update the set of available encodings
114- Add packaging metadata
115
116## [v0.2.0]
117
118- Add `tiktoken.encoding_for_model` to get the encoding for a specific model
119- Improve portability of caching logic
120
121Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections
122
123## [v0.1.2]
124
125- Avoid use of `blobfile` for public files
126- Add support for Python 3.8
127- Add py.typed
128- Improve the public tests
129
130## [v0.1.1]
131
132- Initial release
133