microsoft/AI-For-Beginners

Public

mirrored fromhttps://github.com/microsoft/AI-For-BeginnersAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
5544005b2b5a51f59ace027405a0e530cfec068d

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

5-NLP/17-GenerativeNetworks/GenerativeTF.ipynb

478lines · modecode

1{
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# Generative networks\n",
8 "\n",
9 "Recurrent Neural Networks (RNNs) and their gated cell variants such as Long Short Term Memory Cells (LSTMs) and Gated Recurrent Units (GRUs) provided a mechanism for language modeling, i.e. they can learn word ordering and provide predictions for next word in a sequence. This allows us to use RNNs for **generative tasks**, such as ordinary text generation, machine translation, and even image captioning.\n",
10 "\n",
11 "In RNN architecture we discussed in the previous unit, each RNN unit produced next next hidden state as an output. However, we can also add another output to each recurrent unit, which would allow us to output a **sequence** (which is equal in length to the original sequence). Moreover, we can use RNN units that do not accept an input at each step, and just take some initial state vector, and then produce a sequence of outputs.\n",
12 "\n",
13 "In this notebook, we will focus on simple generative models that help us generate text. For simplicity, let's build **character-level network**, which generates text letter by letter. During training, we need to take some text corpus, and split it into letter sequences. "
14 ]
15 },
16 {
17 "cell_type": "code",
18 "execution_count": 1,
19 "metadata": {},
20 "outputs": [],
21 "source": [
22 "import tensorflow as tf\n",
23 "from tensorflow import keras\n",
24 "import tensorflow_datasets as tfds\n",
25 "import numpy as np\n",
26 "\n",
27 "ds_train, ds_test = tfds.load('ag_news_subset').values()"
28 ]
29 },
30 {
31 "cell_type": "markdown",
32 "metadata": {},
33 "source": [
34 "## Building character vocabulary\n",
35 "\n",
36 "To build character-level generative network, we need to split text into individual characters instead of words. `TextVectorization` layer that we have been using before cannot do that, so we have to options:\n",
37 "\n",
38 "* Manually load text and do tokenization 'by hand', as in [this official Keras example](https://keras.io/examples/generative/lstm_character_level_text_generation/)\n",
39 "* Use `Tokenizer` class for character-level tokenization.\n",
40 "\n",
41 "We will go with the second option. `Tokenizer` can also be used to tokenize into words, so one should be able to switch from char-level to word-level tokenization quite easily.\n",
42 "\n",
43 "To do character-level tokenization, we need to pass `char_level=True` parameter:"
44 ]
45 },
46 {
47 "cell_type": "code",
48 "execution_count": 2,
49 "metadata": {},
50 "outputs": [],
51 "source": [
52 "def extract_text(x):\n",
53 " return x['title']+' '+x['description']\n",
54 "\n",
55 "def tupelize(x):\n",
56 " return (extract_text(x),x['label'])\n",
57 "\n",
58 "tokenizer = keras.preprocessing.text.Tokenizer(char_level=True,lower=False)\n",
59 "tokenizer.fit_on_texts([x['title'].numpy().decode('utf-8') for x in ds_train])"
60 ]
61 },
62 {
63 "cell_type": "markdown",
64 "metadata": {},
65 "source": [
66 "We also want to use one special token to denote **end of sequence**, which we will call `<eos>`. Let's add it manually to the vocabulary:"
67 ]
68 },
69 {
70 "cell_type": "code",
71 "execution_count": 3,
72 "metadata": {},
73 "outputs": [],
74 "source": [
75 "eos_token = len(tokenizer.word_index)+1\n",
76 "tokenizer.word_index['<eos>'] = eos_token\n",
77 "\n",
78 "vocab_size = eos_token + 1"
79 ]
80 },
81 {
82 "cell_type": "markdown",
83 "metadata": {},
84 "source": [
85 "Now, to encode text into sequences of numbers, we can use:"
86 ]
87 },
88 {
89 "cell_type": "code",
90 "execution_count": 4,
91 "metadata": {},
92 "outputs": [
93 {
94 "data": {
95 "text/plain": [
96 "[[48, 2, 10, 10, 5, 44, 1, 25, 5, 8, 10, 13, 78]]"
97 ]
98 },
99 "execution_count": 4,
100 "metadata": {},
101 "output_type": "execute_result"
102 }
103 ],
104 "source": [
105 "tokenizer.texts_to_sequences(['Hello, world!'])"
106 ]
107 },
108 {
109 "cell_type": "markdown",
110 "metadata": {},
111 "source": [
112 "## Training a generative RNN to generate titles\n",
113 "\n",
114 "The way we will train RNN to generate news titles is the following. On each step, we will take one title, which will be fed into an RNN, and for each input character we will ask the network to generate next output character:\n",
115 "\n",
116 "![Image showing an example RNN generation of the word 'HELLO'.](./images/rnn-generate.png)\n",
117 "\n",
118 "For the last character of our sequence, we will ask the network to generate `<eos>` token.\n",
119 "\n",
120 "The main difference between generative RNN that we are using here is that we will take an output from each step of the RNN, and not just from the final cell. This can be achieved by specifying `return_sequences` parameter to the RNN cell.\n",
121 "\n",
122 "Thus, during the training, an input to the network would be a sequence of encoded characters of some length, and an output would be a sequence of the same length, but shifted by one element and terminated by `<eos>`. Minibatch will consist of several such sequences, and we would need to use **padding** to align all sequences.\n",
123 "\n",
124 "Let's create functions that will transform the dataset for us. Because we want to pad sequences on minibatch level, we will first batch the dataset by calling `.batch()`, and then `map` it in order to do transformation. So, the transformation function will take a whole minibatch as a parameter:"
125 ]
126 },
127 {
128 "cell_type": "code",
129 "execution_count": 5,
130 "metadata": {},
131 "outputs": [],
132 "source": [
133 "def title_batch(x):\n",
134 " x = [t.numpy().decode('utf-8') for t in x]\n",
135 " z = tokenizer.texts_to_sequences(x)\n",
136 " z = tf.keras.preprocessing.sequence.pad_sequences(z)\n",
137 " return tf.one_hot(z,vocab_size), tf.one_hot(tf.concat([z[:,1:],tf.constant(eos_token,shape=(len(z),1))],axis=1),vocab_size)"
138 ]
139 },
140 {
141 "cell_type": "markdown",
142 "metadata": {},
143 "source": [
144 "A few important things that we do here:\n",
145 "* We first extract the actual text from the string tensor\n",
146 "* `text_to_sequences` converts the list of strings into a list of integer tensors\n",
147 "* `pad_sequences` pads those tensors to their maximum length\n",
148 "* We finally one-hot encode all the characters, and also do the shifting and `<eos>` appending. We will soon see why we need one-hot-encoded characters\n",
149 "\n",
150 "However, this function is **Pythonic**, i.e. it cannot be automatically translated into Tensorflow computational graph. We will get errors if we try to use this function directly in the `Dataset.map` function. We need to enclose this Pythonic call by using `py_function` wrapper: "
151 ]
152 },
153 {
154 "cell_type": "code",
155 "execution_count": 6,
156 "metadata": {},
157 "outputs": [],
158 "source": [
159 "def title_batch_fn(x):\n",
160 " x = x['title']\n",
161 " a,b = tf.py_function(title_batch,inp=[x],Tout=(tf.float32,tf.float32))\n",
162 " return a,b"
163 ]
164 },
165 {
166 "cell_type": "markdown",
167 "metadata": {},
168 "source": [
169 "> **Note**: Differentiating between Pythonic and Tensorflow transformation functions may seem a little too complex, and you may be questioning why we do not transform the dataset using standard Python functions before passing it to `fit`. While this definitely can be done, using `Dataset.map` has a huge advantage, because data transformation pipeline is executed using Tensorflow computational graph, which takes advantage of GPU computations, and minimized the need to pass data between CPU/GPU.\n",
170 "\n",
171 "Now we can build our generator network and start training. It can be based on any recurrent cell which we discussed in the previous unit (simple, LSTM or GRU). In our example we will use LSTM.\n",
172 "\n",
173 "Because the network takes characters as input, and vocabulary size is pretty small, we do not need embedding layer, one-hot-encoded input can directly go into LSTM cell. Output layer would be a `Dense` classifier that will convert LSTM output into one-hot-encoded token numbers.\n",
174 "\n",
175 "In addition, since we are dealing with variable-length sequences, we can use `Masking` layer to create a mask that will ignore padded part of the string. This is not strictly needed, because we are not very much interested in everything that goes beyond `<eos>` token, but we will use it for the sake of getting some experience with this layer type. `input_shape` would be `(None, vocab_size)`, where `None` indicates the sequence of variable length, and output shape is `(None,vocab_size)` as well, as you can see from the `summary`:"
176 ]
177 },
178 {
179 "cell_type": "code",
180 "execution_count": 7,
181 "metadata": {},
182 "outputs": [
183 {
184 "name": "stdout",
185 "output_type": "stream",
186 "text": [
187 "Model: \"sequential\"\n",
188 "_________________________________________________________________\n",
189 "Layer (type) Output Shape Param # \n",
190 "=================================================================\n",
191 "masking (Masking) (None, None, 84) 0 \n",
192 "_________________________________________________________________\n",
193 "lstm (LSTM) (None, None, 128) 109056 \n",
194 "_________________________________________________________________\n",
195 "dense (Dense) (None, None, 84) 10836 \n",
196 "=================================================================\n",
197 "Total params: 119,892\n",
198 "Trainable params: 119,892\n",
199 "Non-trainable params: 0\n",
200 "_________________________________________________________________\n",
201 "15000/15000 [==============================] - 229s 15ms/step - loss: 1.5385\n"
202 ]
203 },
204 {
205 "data": {
206 "text/plain": [
207 "<tensorflow.python.keras.callbacks.History at 0x7fa40c1245e0>"
208 ]
209 },
210 "execution_count": 7,
211 "metadata": {},
212 "output_type": "execute_result"
213 }
214 ],
215 "source": [
216 "model = keras.models.Sequential([\n",
217 " keras.layers.Masking(input_shape=(None,vocab_size)),\n",
218 " keras.layers.LSTM(128,return_sequences=True),\n",
219 " keras.layers.Dense(vocab_size,activation='softmax')\n",
220 "])\n",
221 "\n",
222 "model.summary()\n",
223 "model.compile(loss='categorical_crossentropy')\n",
224 "\n",
225 "model.fit(ds_train.batch(8).map(title_batch_fn))"
226 ]
227 },
228 {
229 "cell_type": "markdown",
230 "metadata": {},
231 "source": [
232 "## Generating output\n",
233 "\n",
234 "Now that we have trained the model, we want to use it to generate some output. First of all, we need a way to decode text represented by a sequence of token numbers. To do this, we could use `tokenizer.sequences_to_texts` function; however, it does not work well with character-level tokenization. Therefore we will take a dictionary of tokens from the tokenizer (called `word_index`), build a reverse map, and write our own decoding function:"
235 ]
236 },
237 {
238 "cell_type": "code",
239 "execution_count": 10,
240 "metadata": {},
241 "outputs": [],
242 "source": [
243 "reverse_map = {val:key for key, val in tokenizer.word_index.items()}\n",
244 "\n",
245 "def decode(x):\n",
246 " return ''.join([reverse_map[t] for t in x])"
247 ]
248 },
249 {
250 "cell_type": "markdown",
251 "metadata": {},
252 "source": [
253 "Now, let's do generation. We will start with some string `start`, encode it into a sequence `inp`, and then on each step we will call our network to infer the next character. \n",
254 "\n",
255 "Output of the network `out` is a vector of `vocab_size` elements representing probablities of each token, and we can find the most probably token number by using `argmax`. We then append this character to the generated list of tokens, and proceed with generation. This process of generating one character is repeated `size` times to generate required number of characters, and we terminate early when `eos_token` is encountered."
256 ]
257 },
258 {
259 "cell_type": "code",
260 "execution_count": 12,
261 "metadata": {},
262 "outputs": [
263 {
264 "data": {
265 "text/plain": [
266 "'Today #39;s lead to strike for the strike for the strike for the strike (AFP)'"
267 ]
268 },
269 "execution_count": 12,
270 "metadata": {},
271 "output_type": "execute_result"
272 }
273 ],
274 "source": [
275 "def generate(model,size=100,start='Today '):\n",
276 " inp = tokenizer.texts_to_sequences([start])[0]\n",
277 " chars = inp\n",
278 " for i in range(size):\n",
279 " out = model(tf.expand_dims(tf.one_hot(inp,vocab_size),0))[0][-1]\n",
280 " nc = tf.argmax(out)\n",
281 " if nc==eos_token:\n",
282 " break\n",
283 " chars.append(nc.numpy())\n",
284 " inp = inp+[nc]\n",
285 " return decode(chars)\n",
286 " \n",
287 "generate(model)"
288 ]
289 },
290 {
291 "cell_type": "markdown",
292 "metadata": {},
293 "source": [
294 "## Sampling output during training \n",
295 "\n",
296 "Because we do not have any useful metrics such as *accuracy*, the only way we can see that our model is getting better is by **sampling** generated string during training. To do it, we will use **callbacks**, i.e. functions that we can pass to the `fit` function, and that will be called periodically during training. "
297 ]
298 },
299 {
300 "cell_type": "code",
301 "execution_count": 13,
302 "metadata": {},
303 "outputs": [
304 {
305 "name": "stdout",
306 "output_type": "stream",
307 "text": [
308 "Epoch 1/3\n",
309 "15000/15000 [==============================] - 226s 15ms/step - loss: 1.2703\n",
310 "Today #39;s a lead in the company for the strike\n",
311 "Epoch 2/3\n",
312 "15000/15000 [==============================] - 227s 15ms/step - loss: 1.2057\n",
313 "Today #39;s the Market Service on Security Start (AP)\n",
314 "Epoch 3/3\n",
315 "15000/15000 [==============================] - 226s 15ms/step - loss: 1.1752\n",
316 "Today #39;s a line on the strike to start for the start\n"
317 ]
318 },
319 {
320 "data": {
321 "text/plain": [
322 "<tensorflow.python.keras.callbacks.History at 0x7fa40c74e3d0>"
323 ]
324 },
325 "execution_count": 13,
326 "metadata": {},
327 "output_type": "execute_result"
328 }
329 ],
330 "source": [
331 "sampling_callback = keras.callbacks.LambdaCallback(\n",
332 " on_epoch_end = lambda batch, logs: print(generate(model))\n",
333 ")\n",
334 "\n",
335 "model.fit(ds_train.batch(8).map(title_batch_fn),callbacks=[sampling_callback],epochs=3)"
336 ]
337 },
338 {
339 "cell_type": "markdown",
340 "metadata": {},
341 "source": [
342 "This example already generates some pretty good text, but it can be further improved in several ways:\n",
343 "* **More text**. We have only used titles for our task, but you may want to experiment with full text. Remember that RNNs are not too great with handling long sequences, so it makes sense either to split them into shorted sentences, or to always train on a fixed sequence length of some predefined value `num_chars` (say, 256). You may try to change the example above into such architecture, using [official Keras tutorial](https://keras.io/examples/generative/lstm_character_level_text_generation/) as an inspiration.\n",
344 "* **Multilayer LSTM**. It makes sense to try 2 or 3 layers of LSTM cells. As we mentioned in the previous unit, each layer of LSTM extracts certain patterns from text, and in case of character-level generator we can expect lower LSTM level to be responsible for extracting syllables, and higher levels - for words and word combinations. This can be simply implemented by passing number-of-layers parameter to LSTM constructor.\n",
345 "* You may also want to experiment with **GRU units** and see which ones perform better, and with **different hidden layer sizes**. Too large hidden layer may result in overfitting (e.g. network will learn exact text), and smaller size might not produce good result."
346 ]
347 },
348 {
349 "cell_type": "markdown",
350 "metadata": {},
351 "source": [
352 "## Soft text generation and temperature\n",
353 "\n",
354 "In the previous definition of `generate`, we were always taking the character with highest probability as the next character in generated text. This resulted in the fact that the text often \"cycled\" between the same character sequences again and again, like in this example:\n",
355 "```\n",
356 "today of the second the company and a second the company ...\n",
357 "```\n",
358 "\n",
359 "However, if we look at the probability distribution for the next character, it could be that the difference between a few highest probabilities is not huge, e.g. one character can have probability 0.2, another - 0.19, etc. For example, when looking for the next character in the sequence '*play*', next character can equally well be either space, or **e** (as in the word *player*).\n",
360 "\n",
361 "This leads us to the conclusion that it is not always \"fair\" to select the character with higher probability, because choosing the second highest might still lead us to meaningful text. It is more wise to **sample** characters from the probability distribution given by the network output.\n",
362 "\n",
363 "This sampling can be done using `np.multinomial` function that implements so-called **multinomial distribution**. A function that implements this **soft** text generation is defined below:"
364 ]
365 },
366 {
367 "cell_type": "code",
368 "execution_count": 33,
369 "metadata": {
370 "scrolled": true
371 },
372 "outputs": [
373 {
374 "name": "stdout",
375 "output_type": "stream",
376 "text": [
377 "\n",
378 "--- Temperature = 0.3\n",
379 "Today #39;s strike #39; to start at the store return\n",
380 "On Sunday PO to Be Data Profit Up (Reuters)\n",
381 "Moscow, SP wins straight to the Microsoft #39;s control of the space start\n",
382 "President olding of the blast start for the strike to pay &lt;b&gt;...&lt;/b&gt;\n",
383 "Little red riding hood ficed to the spam countered in European &lt;b&gt;...&lt;/b&gt;\n",
384 "\n",
385 "--- Temperature = 0.8\n",
386 "Today countie strikes ryder missile faces food market blut\n",
387 "On Sunday collores lose-toppy of sale of Bullment in &lt;b&gt;...&lt;/b&gt;\n",
388 "Moscow, IBM Diffeiting in Afghan Software Hotels (Reuters)\n",
389 "President Ol Luster for Profit Peaced Raised (AP)\n",
390 "Little red riding hood dace on depart talks #39; bank up\n",
391 "\n",
392 "--- Temperature = 1.0\n",
393 "Today wits House buiting debate fixes #39; supervice stake again\n",
394 "On Sunday arling digital poaching In for level\n",
395 "Moscow, DS Up 7, Top Proble Protest Caprey Mamarian Strike\n",
396 "President teps help of roubler stepted lessabul-Dhalitics (AFP)\n",
397 "Little red riding hood signs on cash in Carter-youb\n",
398 "\n",
399 "--- Temperature = 1.3\n",
400 "Today wits flawer ro, pSIA figat's co DroftwavesIs Talo up\n",
401 "On Sunday hround elitwing wint EU Powerburlinetien\n",
402 "Moscow, Bazz #39;s sentries olymen winnelds' next for Olympite Huc?\n",
403 "President lost securitys from power Elections in Smiltrials\n",
404 "Little red riding hood vides profit, exponituity, profitmainalist-at said listers\n",
405 "\n",
406 "--- Temperature = 1.8\n",
407 "Today #39;It: He deat: N.KA Asside\n",
408 "On Sunday i arry Par aldeup patient Wo stele1\n"
409 ]
410 },
411 {
412 "ename": "KeyError",
413 "evalue": "0",
414 "output_type": "error",
415 "traceback": [
416 "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
417 "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
418 "\u001b[0;32m<ipython-input-33-db32367a0feb>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"\\n--- Temperature = {i}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mj\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 20\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerate_soft\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0msize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m300\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mj\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mtemperature\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
419 "\u001b[0;32m<ipython-input-33-db32367a0feb>\u001b[0m in \u001b[0;36mgenerate_soft\u001b[0;34m(model, size, start, temperature)\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mchars\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0minp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minp\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnc\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mchars\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 14\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0mwords\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'Today '\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'On Sunday '\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'Moscow, '\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'President '\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'Little red riding hood '\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
420 "\u001b[0;32m<ipython-input-10-3f5fa6130b1d>\u001b[0m in \u001b[0;36mdecode\u001b[0;34m(x)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mreverse_map\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
421 "\u001b[0;32m<ipython-input-10-3f5fa6130b1d>\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mreverse_map\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
422 "\u001b[0;31mKeyError\u001b[0m: 0"
423 ]
424 }
425 ],
426 "source": [
427 "def generate_soft(model,size=100,start='Today ',temperature=1.0):\n",
428 " inp = tokenizer.texts_to_sequences([start])[0]\n",
429 " chars = inp\n",
430 " for i in range(size):\n",
431 " out = model(tf.expand_dims(tf.one_hot(inp,vocab_size),0))[0][-1]\n",
432 " probs = tf.exp(tf.math.log(out)/temperature).numpy().astype(np.float64)\n",
433 " probs = probs/np.sum(probs)\n",
434 " nc = np.argmax(np.random.multinomial(1,probs,1))\n",
435 " if nc==eos_token:\n",
436 " break\n",
437 " chars.append(nc)\n",
438 " inp = inp+[nc]\n",
439 " return decode(chars)\n",
440 "\n",
441 "words = ['Today ','On Sunday ','Moscow, ','President ','Little red riding hood ']\n",
442 " \n",
443 "for i in [0.3,0.8,1.0,1.3,1.8]:\n",
444 " print(f\"\\n--- Temperature = {i}\")\n",
445 " for j in range(5):\n",
446 " print(generate_soft(model,size=300,start=words[j],temperature=i))"
447 ]
448 },
449 {
450 "cell_type": "markdown",
451 "metadata": {},
452 "source": [
453 "We have introduced one more parameter called **temperature**, which is used to indicate how hard we should stick to the highest probability. If temperature is 1.0, we do fair multinomial sampling, and when temperature goes to infinity - all probabilities become equal, and we randomly select next character. In the example below we can observe that the text becomes meaningless when we increase the temperature too much, and it resembles \"cycled\" hard-generated text when it becomes closer to 0. "
454 ]
455 }
456 ],
457 "metadata": {
458 "kernelspec": {
459 "display_name": "py38_tensorflow",
460 "language": "python",
461 "name": "conda-env-py38_tensorflow-py"
462 },
463 "language_info": {
464 "codemirror_mode": {
465 "name": "ipython",
466 "version": 3
467 },
468 "file_extension": ".py",
469 "mimetype": "text/x-python",
470 "name": "python",
471 "nbconvert_exporter": "python",
472 "pygments_lexer": "ipython3",
473 "version": "3.8.10"
474 }
475 },
476 "nbformat": 4,
477 "nbformat_minor": 4
478}
479