Skip to content

Commit 3219fdf

Browse files
authored
Merge pull request #53 from eli5-org/llm-token-probs
Explain LLM predictions, using logprobs to highlight token probability
2 parents 17338cc + 7b9dd93 commit 3219fdf

File tree

18 files changed

+1565
-12
lines changed

18 files changed

+1565
-12
lines changed

README.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ It provides support for the following machine learning frameworks and packages:
5454
* sklearn-crfsuite_. ELI5 allows to check weights of sklearn_crfsuite.CRF
5555
models.
5656

57+
* OpenAI_ python client. ELI5 allows to explain LLM predictions with token probabilities.
5758

5859
ELI5 also implements several algorithms for inspecting black-box models
5960
(see `Inspecting Black-Box Estimators`_):
@@ -81,6 +82,7 @@ and formatting on a client.
8182
.. _Catboost: https://github.com/catboost/catboost
8283
.. _Permutation importance: https://eli5.readthedocs.io/en/latest/blackbox/permutation_importance.html
8384
.. _Inspecting Black-Box Estimators: https://eli5.readthedocs.io/en/latest/blackbox/index.html
85+
.. _OpenAI: https://github.com/openai/openai-python
8486

8587
License is MIT.
8688

docs/source/_notebooks/explain_llm_logprobs.rst

Lines changed: 553 additions & 0 deletions
Large diffs are not rendered by default.

docs/source/libraries/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ Supported Libraries
1212
catboost
1313
lightning
1414
sklearn_crfsuite
15+
openai
1516
keras
16-

docs/source/libraries/openai.rst

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
.. _library-openai:
2+
3+
OpenAI
4+
======
5+
6+
OpenAI_ provides a client library for calling Large Language Models (LLMs).
7+
8+
.. _OpenAI: https://github.com/openai/openai-python
9+
10+
eli5 supports :func:`eli5.explain_prediction` for
11+
``ChatCompletion``, ``ChoiceLogprobs`` and ``openai.Client`` objects,
12+
highlighting tokens proportionally to the log probability,
13+
which can help to see where model is less confident in it's predictions.
14+
More likely tokens are highlighted in green,
15+
while unlikely tokens are highlighted in red:
16+
17+
.. image:: ../static/llm-explain-logprobs.png
18+
:alt: LLM token probabilities visualized
19+
20+
Explaining with a client, invoking the model with ``logprobs`` enabled:
21+
::
22+
23+
import eli5
24+
import opeanai
25+
client = openai.Client()
26+
prompt = 'some string' # or [{"role": "user", "content": "some string"}]
27+
explanation = eli5.explain_prediction(client, prompt, model='gpt-4o')
28+
explanation
29+
30+
You may pass any extra keyword arguments to :func:`eli5.explain_prediction`,
31+
they would be passed to the ``client.chat.completions.create``,
32+
e.g. you may pass ``n=2`` to get multiple responses
33+
and see explanations for each of them.
34+
35+
You'd normally want to run it in a Jupyter notebook to see the explanation
36+
formatted as HTML.
37+
38+
You can access the ``Choice`` object on the ``explanation.targets[0].target``:
39+
::
40+
41+
explanation.targets[0].target.message.content
42+
43+
If you have already obtained a chat completion with ``logprobs`` from OpenAI client,
44+
you may call :func:`eli5.explain_prediction` with
45+
``ChatCompletion`` or ``ChoiceLogprobs`` like this:
46+
::
47+
48+
chat_completion = client.chat.completions.create(
49+
messages=[{"role": "user", "content": prompt}],
50+
model="gpt-4o",
51+
logprobs=True,
52+
)
53+
eli5.explain_prediction(chat_completion) # or
54+
eli5.explain_prediction(chat_completion.choices[0].logprobs)
55+
56+
57+
See the :ref:`tutorial <explain-llm-logprobs-tutorial>` for a more detailed usage
58+
example.
59+
60+
.. note::
61+
While token probabilities reflect model uncertainty in many cases,
62+
they are not always indicative,
63+
e.g. in case of `Chain of Thought <https://arxiv.org/abs/2201.11903>`_
64+
preceding the final response.
65+
66+
.. note::
67+
Top-level :func:`eli5.explain_prediction` calls are dispatched
68+
to :func:`eli5.llm.explain_prediction.explain_prediction_openai_client`
69+
or :func:`eli5.llm.explain_prediction.explain_prediction_openai_completion`
70+
or :func:`eli5.llm.explain_prediction.explain_prediction_openai_logprobs`
71+
.

docs/source/overview.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ following machine learning frameworks and packages:
5050
* :ref:`library-sklearn-crfsuite`. ELI5 allows to check weights of
5151
sklearn_crfsuite.CRF models.
5252

53+
* :ref:`library-openai`. ELI5 allows to explain LLM predictions with token probabilities.
54+
5355
ELI5 also implements several algorithms for inspecting black-box models
5456
(see :ref:`eli5-black-box`):
5557

55.4 KB
Loading
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
.. _explain-llm-logprobs-tutorial:
2+
3+
.. note::
4+
5+
This tutorial can be run as an IPython notebook_.
6+
7+
.. _notebook: https://github.com/eli5-org/eli5/blob/master/notebooks/explain_llm_logprobs.ipynb
8+
9+
.. include:: ../_notebooks/explain_llm_logprobs.rst

docs/update-notebooks.sh

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,4 +78,17 @@ rm -r source/_notebooks/keras-image-classifiers_files
7878
mv ../notebooks/keras-image-classifiers_files/ \
7979
source/_notebooks/
8080
sed -i 's&.. image:: keras-image-classifiers_files/&.. image:: ../_notebooks/keras-image-classifiers_files/&g' \
81-
source/_notebooks/keras-image-classifiers.rst
81+
source/_notebooks/keras-image-classifiers.rst
82+
83+
84+
# LLM logprobs explain prediction tutorial
85+
jupyter nbconvert \
86+
--to rst \
87+
--stdout \
88+
'../notebooks/explain_llm_logprobs.ipynb' \
89+
> source/_notebooks/explain_llm_logprobs.rst
90+
91+
sed -i '' 's/``eli5.explain_prediction``/:func:`eli5.explain_prediction`/g' \
92+
source/_notebooks/explain_llm_logprobs.rst
93+
sed -i '' 's/\/docs\/source//g' \
94+
source/_notebooks/explain_llm_logprobs.rst

eli5/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,12 @@
9393
except ImportError:
9494
# keras is not available
9595
pass
96+
97+
try:
98+
from .llm.explain_prediction import (
99+
explain_prediction_openai_logprobs,
100+
explain_prediction_openai_client
101+
)
102+
except ImportError:
103+
# openai not available
104+
pass

eli5/base.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from typing import Union, Optional
1+
from typing import Union, Optional, Sequence
22

33
import numpy as np
44

@@ -135,7 +135,7 @@ def __init__(self,
135135
WeightedSpan = tuple[
136136
Feature,
137137
list[tuple[int, int]], # list of spans (start, end) for this feature
138-
float, # feature weight
138+
float, # feature weight or probability
139139
]
140140

141141

@@ -147,16 +147,20 @@ class DocWeightedSpans:
147147
:document:). :preserve_density: determines how features are colored
148148
when doing formatting - it is better set to True for char features
149149
and to False for word features.
150+
:with_probabilities: would interpret weights as probabilities from 0 to 1,
151+
using a more suitable color scheme.
150152
"""
151153
def __init__(self,
152154
document: str,
153-
spans: list[WeightedSpan],
155+
spans: Sequence[WeightedSpan],
154156
preserve_density: Optional[bool] = None,
157+
with_probabilities: Optional[bool] = None,
155158
vec_name: Optional[str] = None,
156159
):
157160
self.document = document
158161
self.spans = spans
159162
self.preserve_density = preserve_density
163+
self.with_probabilities = with_probabilities
160164
self.vec_name = vec_name
161165

162166

0 commit comments

Comments
 (0)