Skip to content

Handles binary data from gnupg incorrectly (crashes with UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 0: invalid continuation byte) #267

@thesamesam

Description

@thesamesam

Describe the bug

We have a small script for our openpgp-keys-gentoo-developers package which uses python-gnupg to validate a downloaded keyring.

For some users (not me, unclear why), it fails and then hangs with the following:

gpg: Total number processed: 98
gpg:               imported: 98
gpg: no ultimately trusted keys found
 * gpg --no-autostart --no-default-keyring --homedir /var/tmp/portage/sec-keys/openpgp-keys-gentoo-developers-20250915/temp/.gnupg --export --armor
 * python3.13 /var/tmp/portage/sec-keys/openpgp-keys-gentoo-developers-20250915/files/keyring-mangler.py /usr/share/openpgp-keys/gentoo-auth.asc /var/tmp/portage/sec-keys/openpgp-keys-gentoo-developers-20250915/work/gentoo-developers.asc /var/tmp/portage/sec-keys/openpgp-keys-gentoo-developers-20250915/work/gentoo-developers-sanitised.asc
Exception in thread Thread-9 (_read_response):
Traceback (most recent call last):
  File "/usr/lib/python3.13/threading.py", line 1043, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/usr/lib/python3.13/threading.py", line 994, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/gnupg.py", line 1241, in _read_response
    line = stream.readline()
  File "<frozen codecs>", line 564, in readline
  File "<frozen codecs>", line 510, in read
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 0: invalid continuation byte

I suspect this is the same as some other issues we've seen (see "Additional information below") where consumers assume that GnuPG always emits valid UTF-8, but for some fields like NOTATION*, it doesn't guarantee that and it should be treated as opaque binary data.

The script itself is:

#!/usr/bin/env python
#
# Distributed under the terms of the GNU General Public License v2
#
# Takes as input:
# 1. authority keys as gentoo-auth.asc (sec-keys/openpgp-keys-gentoo-auth);
# 2. a downloaded, unverified bundle of active developers from qa-reports.gentoo.org (https://qa-reports.gentoo.org/output/active-devs.gpg);
#    but this must be converted to ASCII first(!), and
# 3. an output file (armored).
#
# Outputs armored keyring with all expired keys dropped and all keys without
# a signature from the L2 developer key dropped.
#
# Usage: python keyring-mangler.py <gentoo-auth.asc> <active-devs.gpg> <output for armored keys.asc>
from xml.dom import ValidationErr

import gnupg
import os
import sys

AUTHORITY_KEYS = [
    # Gentoo Authority Key L1
    "ABD00913019D6354BA1D9A132839FE0D796198B1",
    # Gentoo Authority Key L2 for Services
    "18F703D702B1B9591373148C55D3238EC050396E",
    # Gentoo Authority Key L2 for Developers
    "2C13823B8237310FA213034930D132FF0FF50EEB",
]

L2_DEVELOPER_KEY = "30D132FF0FF50EEB"

# logging.basicConfig(level=os.environ.get("LOGLEVEL", "DEBUG"))

gentoo_auth = sys.argv[1]
active_devs = sys.argv[2]
armored_output = sys.argv[3]

gpg = gnupg.GPG(verbose=False, gnupghome=os.environ["GNUPGHOME"], options="--auto-check-trustdb")
gpg.encoding = "utf-8"

with open(gentoo_auth, "r", encoding="utf8") as keyring:
    keys = keyring.read()
    gpg.import_keys(keys)

gpg.trust_keys([AUTHORITY_KEYS[0]], "TRUST_ULTIMATE")

with open(active_devs, "r", encoding="utf8") as keyring:
    keys = keyring.read()
    gpg.import_keys(keys)

# print(keys)
# print(gpg.list_keys)

good_keys = []

# TODO: Use new 'problems' key from python-gnupg-0.5.1?
for key in gpg.list_keys(sigs=True):
    print(f"Checking key={key['keyid']}, uids={key['uids']}")

    # pprint.pprint(key)

    if key["fingerprint"] in AUTHORITY_KEYS:
        # Just add this in.
        good_keys.append(key["fingerprint"])
        continue

    # https://security.stackexchange.com/questions/41208/what-is-the-exact-meaning-of-this-gpg-output-regarding-trust
    if key["trust"] == "e":
        # If it's expired, drop the key, as we can't easily then
        # verify it because of gpg limitations.
        print(
            f"Dropping expired {key['keyid']=}, {key['uids']=} (because this prevents validation)"
        )
        continue

    if key["trust"] == "-":
        print(f"Dropping {key['keyid']=}, {key['uids']=} because no trust calculated")
        continue

    if key["trust"] != "f":
        print(
            f"Dropping {key['keyid']=}, {key['uids']=} because not calculated as fully trusted"
        )
        continue

    # As a sanity check, make sure each key has a signature from
    # the L2 developer signing key.
    got_l2_signature = any(sig[0] == "30D132FF0FF50EEB" for sig in key["sigs"])
    if not got_l2_signature:
        raise ValidationErr(f"{key['uids']=} lacks a signature from L2 key!")

    good_keys.append(key["fingerprint"])

if len(good_keys) <= len(AUTHORITY_KEYS):
    raise RuntimeError("No valid developer keys were found!")

with open(armored_output, "w", encoding="utf8") as keyring:
    keyring.write(gpg.export_keys(good_keys))

and it operates on https://qa-reports.gentoo.org/output/keys/active-devs-20250915.gpg and https://dev.gentoo.org/~mgorny/dist/openpgp-keys/gentoo-auth.asc.20240703.gz.

To Reproduce

I haven't managed to myself yet.

Expected behavior

Successfully decoding gpg output.

Screenshots

N/A

Environment

  • OS, including version: Gentoo Linux
  • Version of this library: 0.5.5
  • Version of GnuPG: 2.5.13

Additional information

This came up in Gentoo in one of our packages that uses python-gnupg at https://bugs.gentoo.org/965447 (where regular vanilla GnuPG is in use).

It also came up in another context, outside of python-gnupg, and with Sequoia instead of GnuPG, at gentoo/portage#1495 where we made a similar error in Portage.

I have also filed https://dev.gnupg.org/T7896 with GnuPG asking them to always emit valid UTF-8 to avoid this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem rightpendingIssue will be closed if feedback not received soon.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions