openai/openai-python

Public

mirrored fromhttps://github.com/openai/openai-pythonAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

v0.15.0

Find a branch or tag

Branches

v0.15.0

Clone

HTTPS

Download ZIP

openai-python/examples/embeddings

examples/embeddings/Regression.ipynb

109lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`{`
2	`"cells": [`
3	`{`
4	`"cell_type": "markdown",`
5	`"metadata": {},`
6	`"source": [`
7	`"## Regression using the embeddings\n",`
8	`"\n",`
9	`"Regression means predicting a number, rather than one of the categories. We will predict the score based on the embedding of the review's text. We split the dataset into a training and a testing set for all of the following tasks, so we can realistically evaluate performance on unseen data. The dataset is created in the [Obtain_dataset Notebook](Obtain_dataset.ipynb).\n",`
10	`"\n",`
11	`"We're predicting the score of the review, which is a number between 1 and 5 (1-star being negative and 5-star positive)."`
12	`]`
13	`},`
14	`{`
15	`"cell_type": "code",`
16	`"execution_count": 2,`
17	`"metadata": {},`
18	`"outputs": [`
19	`{`
20	`"name": "stdout",`
21	`"output_type": "stream",`
22	`"text": [`
23	`"Babbage similarity embedding performance on 1k Amazon reviews: mse=0.38, mae=0.39\n"`
24	`]`
25	`}`
26	`],`
27	`"source": [`
28	`"import pandas as pd\n",`
29	`"import numpy as np\n",`
30	`"\n",`
31	`"from sklearn.ensemble import RandomForestRegressor\n",`
32	`"from sklearn.model_selection import train_test_split\n",`
33	`"from sklearn.metrics import mean_squared_error, mean_absolute_error\n",`
34	`"\n",`
35	`"df = pd.read_csv('output/embedded_1k_reviews.csv')\n",`
36	`"df['babbage_similarity'] = df.babbage_similarity.apply(eval).apply(np.array)\n",`
37	`"\n",`
38	`"X_train, X_test, y_train, y_test = train_test_split(list(df.babbage_similarity.values), df.Score, test_size = 0.2, random_state=42)\n",`
39	`"\n",`
40	`"rfr = RandomForestRegressor(n_estimators=100)\n",`
41	`"rfr.fit(X_train, y_train)\n",`
42	`"preds = rfr.predict(X_test)\n",`
43	`"\n",`
44	`"\n",`
45	`"mse = mean_squared_error(y_test, preds)\n",`
46	`"mae = mean_absolute_error(y_test, preds)\n",`
47	`"\n",`
48	`"print(f\"Babbage similarity embedding performance on 1k Amazon reviews: mse={mse:.2f}, mae={mae:.2f}\")"`
49	`]`
50	`},`
51	`{`
52	`"cell_type": "code",`
53	`"execution_count": 26,`
54	`"metadata": {},`
55	`"outputs": [`
56	`{`
57	`"name": "stdout",`
58	`"output_type": "stream",`
59	`"text": [`
60	`"Dummy mean prediction performance on Amazon reviews: mse=1.77, mae=1.04\n"`
61	`]`
62	`}`
63	`],`
64	`"source": [`
65	`"bmse = mean_squared_error(y_test, np.repeat(y_test.mean(), len(y_test)))\n",`
66	`"bmae = mean_absolute_error(y_test, np.repeat(y_test.mean(), len(y_test)))\n",`
67	`"print(f\"Dummy mean prediction performance on Amazon reviews: mse={bmse:.2f}, mae={bmae:.2f}\")"`
68	`]`
69	`},`
70	`{`
71	`"cell_type": "markdown",`
72	`"metadata": {},`
73	`"source": [`
74	`"We can see that the embeddings are able to predict the scores with an average error of 0.39 per score prediction. This is roughly equivalent to predicting 2 out of 3 reviews perfectly, and 1 out of three reviews by a one star error."`
75	`]`
76	`},`
77	`{`
78	`"cell_type": "markdown",`
79	`"metadata": {},`
80	`"source": [`
81	`"You could also train a classifier to predict the label, or use the embeddings within an existing ML model to encode free text features."`
82	`]`
83	`}`
84	`],`
85	`"metadata": {`
86	`"interpreter": {`
87	`"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"`
88	`},`
89	`"kernelspec": {`
90	`"display_name": "Python 3.7.3 64-bit ('base': conda)",`
91	`"name": "python3"`
92	`},`
93	`"language_info": {`
94	`"codemirror_mode": {`
95	`"name": "ipython",`
96	`"version": 3`
97	`},`
98	`"file_extension": ".py",`
99	`"mimetype": "text/x-python",`
100	`"name": "python",`
101	`"nbconvert_exporter": "python",`
102	`"pygments_lexer": "ipython3",`
103	`"version": "3.7.3"`
104	`},`
105	`"orig_nbformat": 4`
106	`},`
107	`"nbformat": 4,`
108	`"nbformat_minor": 2`
109	`}`
110

openai/openai-python

Branches

Tags

Clone