openai/tiktoken

Public

mirrored fromhttps://github.com/openai/tiktokenAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

fa8b65d062fb6a656ac3810c89efde4c8ab999e2

Find a branch or tag

Branches

fa8b65d062fb6a656ac3810c89efde4c8ab999e2

Clone

HTTPS

Download ZIP

CHANGELOG.md

132lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`# Changelog`
2
3	`This is the changelog for the open source version of tiktoken.`
4
5	`## [v0.13.0]`
6	`- Update fancy-regex for significantly increased performance`
7	`- Branch byte pair encoding to fix performance on unusual input`
8	`- Fix AttributeError caused by incomplete redaction of experimental code`
9	- Update version of `pyo3`
10	- Update version of optional dependency `blobfile`
11
12	`## [v0.12.0]`
13	`- Build wheels for Python 3.14`
14	`- Build musllinux aarch64 wheels`
15	`- Support for free-threaded Python`
16	- Update version of `pyo3` and `rustc-hash`
17	- Avoid use of `blobfile` for reading local files
18	- Recognise `gpt-5` model identifier
19	`- Minor performance improvement for file reading`
20
21	`## [v0.11.0]`
22	- Support for `GPT-5`
23	- Update version of `pyo3`
24	`- Use new Rust edition`
25	- Fix special token handling in `encode_to_numpy`
26	`- Better error handling`
27	`- Improvements to private APIs`
28
29	`## [v0.10.0]`
30	`- Support for newer models`
31	`- Improvements to private APIs`
32
33	`## [v0.9.0]`
34	- Support for `o1` and `o3` models
35	`- Better error messages when loading invalid vocabulary files`
36	`- Support for encoding to numpy arrays`
37	`- Delayed imports when not strictly necessary`
38
39	`## [v0.8.0]`
40
41	- Support for `o1-` and `chatgpt-4o-` models
42	`- Build wheels for Python 3.13`
43	`- Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc!`
44	`- Provide a better error message and type for invalid token decode`
45	`- Permit tuples in type hints`
46	- Better error message for passing invalid input to `get_encoding`
47	`- Better error messages during plugin loading`
48	- Add a `__version__` attribute
49	- Update versions of `pyo3`, `regex`, `fancy-regex`
50	`- Drop support for Python 3.8`
51
52	`## [v0.7.0]`
53
54	- Support for `gpt-4o`
55	`- Performance improvements`
56
57	`## [v0.6.0]`
58
59	`- Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc!`
60	- Add `text-embedding-3-*` models to `encoding_for_model`
61	`- Check content hash for downloaded files`
62	- Allow pickling `Encoding` objects. Registered `Encoding` will be pickled by reference
63	`- Workaround PyO3 bug for frozenset conversion`
64
65	`Thank you to @paplorinc, @mdwelsh, @Praneet460!`
66
67	`## [v0.5.2]`
68
69	`- Build wheels for Python 3.12`
70	`- Update version of PyO3 to allow multiple imports`
71	`- Avoid permission errors when using default cache logic`
72
73	`## [v0.5.1]`
74
75	- Add `encoding_name_for_model`, undo some renames to variables that are implementation details
76
77	`## [v0.5.0]`
78
79	- Add `tiktoken._educational` submodule to better document how byte pair encoding works
80	- Ensure `encoding_for_model` knows about several new models
81	- Add `decode_with_offets`
82	`- Better error for failures with the plugin mechanism`
83	`- Make more tests public`
84	`- Update versions of dependencies`
85
86	`## [v0.4.0]`
87
88	- Add `decode_batch` and `decode_bytes_batch`
89	`- Improve error messages and handling`
90
91	`## [v0.3.3]`
92
93	- `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding
94	`Unicode character and will replace lone surrogates with the Unicode replacement character.`
95
96	`## [v0.3.2]`
97
98	`- Add encoding for GPT-4`
99
100	`## [v0.3.1]`
101
102	`- Build aarch64 wheels`
103	- Make `blobfile` an optional dependency
104
105	`Thank you to @messense for the environment variable that makes cargo not OOM under emulation!`
106
107	`## [v0.3.0]`
108
109	`- Improve performance by 5-20%; thank you to @nistath!`
110	- Add `gpt-3.5-turbo` models to `encoding_for_model`
111	- Add prefix matching to `encoding_for_model` to better support future model versions
112	`- Fix a bug in the README instructions on extending tiktoken`
113	`- Update the set of available encodings`
114	`- Add packaging metadata`
115
116	`## [v0.2.0]`
117
118	- Add `tiktoken.encoding_for_model` to get the encoding for a specific model
119	`- Improve portability of caching logic`
120
121	`Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections`
122
123	`## [v0.1.2]`
124
125	- Avoid use of `blobfile` for public files
126	`- Add support for Python 3.8`
127	`- Add py.typed`
128	`- Improve the public tests`
129
130	`## [v0.1.1]`
131
132	`- Initial release`
133

openai/tiktoken

Branches

Tags

Clone