Skip to content

False group in sc.get.aggregate when obs contains NaNs #3903

@quentinblampey

Description

@quentinblampey

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

Hi!

Thanks for implementing sc.get.aggregate, it's very helpful.

I had an issue when some values are NaNs, e.g., below in this MRA some cell-types are unknown:

import numpy as np
import pandas as pd
from anndata import AnnData
import scanpy as sc

X = np.arange(6) + np.arange(6)[:, None]

obs = pd.DataFrame(index=[f"cell{i}" for i in range(X.shape[0])])
obs["cell_type"] = [np.nan, np.nan, "B", "C", "B", "B"]
obs["sample_id"] = ["s1", "s1", "s1", "s2", "s2", "s2"]
obs["patient_type"] = ["responder", "responder", "responder", "control", "control", "control"]

adata = AnnData(X, obs=obs)

adata_agg = sc.get.aggregate(adata, by=["sample_id", "patient_type", "cell_type"], func="sum", layer=None)

It fails silently: in adata_agg.obs, there is a s1_control_C obs, which shouldn't exist. Actually, the sample s1 is not even a control in the provided example.

More generally, sc.get.aggregate seems not to work with NaNs. It's relatively easy to solve it upstream, but since it's silently failing, I think it can be dangerous for users who are not aware of this issue and proceed with the analysis.

We could either:

  1. Send a nice error if the groups contain NaNs
  2. Handle NaNs (they should not be in the output)

I might work on this if nobody takes it, but not before a few weeks.

Versions

Package                    Version       Editable project location
-------------------------- ------------- -----------------------------
anndata                    0.12.6
annotated-doc              0.0.4
annotated-types            0.7.0
anyio                      4.11.0
appnope                    0.1.4
array-api-compat           1.12.0
asttokens                  3.0.1
attrs                      25.4.0
babel                      2.17.0
backrefs                   6.1
beautifulsoup4             4.14.2
bleach                     6.3.0
certifi                    2025.11.12
cfgv                       3.4.0
charset-normalizer         3.4.4
click                      8.3.1
colorama                   0.4.6
comm                       0.2.3
contourpy                  1.3.3
coverage                   7.12.0
crc32c                     2.8
cycler                     0.12.1
debugpy                    1.8.17
decorator                  5.2.1
defusedxml                 0.7.1
distlib                    0.4.0
donfig                     0.8.1.post1
executing                  2.2.1
fastapi                    0.121.2
fastjsonschema             2.21.2
filelock                   3.20.0
fonttools                  4.60.1
formulaic                  1.2.1
formulaic-contrasts        1.0.0
ghp-import                 2.1.0
griffe                     1.15.0
gseapy                     1.1.11
h11                        0.16.0
h5py                       3.15.1
hatch                      1.15.1
hatchling                  1.27.0
httpcore                   1.0.9
httpx                      0.28.1
hyperlink                  21.0.0
identify                   2.6.15
idna                       3.11
igraph                     1.0.0
iniconfig                  2.3.0
interface-meta             1.3.0
ipykernel                  6.31.0
ipython                    9.7.0
ipython-pygments-lexers    1.1.1
ipywidgets                 8.1.8
jaraco-classes             3.4.0
jaraco-context             6.0.1
jaraco-functools           4.3.0
jedi                       0.19.2
jinja2                     3.1.6
joblib                     1.5.2
jsonschema                 4.25.1
jsonschema-specifications  2025.9.1
jupyter-client             8.6.3
jupyter-core               5.9.1
jupyterlab-pygments        0.3.0
jupyterlab-widgets         3.0.16
jupytext                   1.18.1
keyring                    25.7.0
kiwisolver                 1.4.9
legacy-api-wrap            1.5
llvmlite                   0.45.1
lxml                       6.0.2
markdown                   3.10
markdown-it-py             4.0.0
markupsafe                 3.0.3
matplotlib                 3.10.7
matplotlib-inline          0.2.1
mdit-py-plugins            0.5.0
mdurl                      0.1.2
mergedeep                  1.3.4
mistune                    3.1.4
mkdocs                     1.6.1
mkdocs-autorefs            1.4.3
mkdocs-get-deps            0.2.0
mkdocs-jupyter             0.25.1
mkdocs-material            9.7.0
mkdocs-material-extensions 1.3.1
mkdocstrings               0.30.1
mkdocstrings-python        1.19.0
mofapy2                    0.7.2
mofax                      0.3.7
more-itertools             10.8.0
mudata                     0.3.2
muon                       0.1.7
mypy                       1.18.2
mypy-extensions            1.1.0
narwhals                   2.12.0
natsort                    8.4.0
nbclient                   0.10.2
nbconvert                  7.16.6
nbformat                   5.10.4
nest-asyncio               1.6.0
networkx                   3.5
nodeenv                    1.9.1
numba                      0.62.1
numcodecs                  0.16.3
numpy                      2.3.5
packaging                  25.0
paginate                   0.5.7
pandas                     2.3.3
pandocfilters              1.5.1
parso                      0.8.5
pastel                     0.2.1
pathspec                   0.12.1
patsy                      1.0.2
pexpect                    4.9.0
pillow                     12.0.0
platformdirs               4.5.0
plotly                     6.5.0
pluggy                     1.6.0
poethepoet                 0.37.0
pre-commit                 4.4.0
prompt-toolkit             3.0.52
protobuf                   6.33.1
psutil                     7.1.3
psycopg2-binary            2.9.11
ptyprocess                 0.7.0
pure-eval                  0.2.3
pyarrow                    22.0.0
pydantic                   2.12.4
pydantic-core              2.41.5
pydeseq2                   0.5.3
pygments                   2.19.2
pymdown-extensions         10.17.1
pynndescent                0.5.13
pyparsing                  3.2.5
pytest                     9.0.1
pytest-cov                 7.0.0
python-dateutil            2.9.0.post0
python-dotenv              1.2.1
python-igraph              1.0.0
pytz                       2025.2
pyyaml                     6.0.3
pyyaml-env-tag             1.1
pyzmq                      27.1.0
referencing                0.37.0
requests                   2.32.5
rich                       14.2.0
rpds-py                    0.29.0
ruff                       0.14.5
scanpy                     1.11.5
scikit-learn               1.7.2
scipy                      1.16.3
scrnaseq-analysis          0.1.0
seaborn                    0.13.2
session-info               1.0.1
session-info2              0.2.3
shellingham                1.5.4
six                        1.17.0
sniffio                    1.3.1
soupsieve                  2.8
sqlalchemy                 2.0.44
stack-data                 0.6.3
starlette                  0.49.3
statsmodels                0.14.5
stdlib-list                0.12.0
texttable                  1.7.0
threadpoolctl              3.6.0
tinycss2                   1.4.0
tomli-w                    1.2.0
tomlkit                    0.13.3
tornado                    6.5.2
tqdm                       4.67.1
traitlets                  5.14.3
trove-classifiers          2025.11.14.15
typing-extensions          4.15.0
typing-inspection          0.4.2
tzdata                     2025.2
umap-learn                 0.5.9.post2
upsetplot                  0.9.0
urllib3                    2.5.0
userpath                   1.9.2
uv                         0.9.10
uvicorn                    0.38.0
virtualenv                 20.35.4
watchdog                   6.0.0
wcwidth                    0.2.14
webencodings               0.5.1
widgetsnbextension         4.0.15
wrapt                      2.0.1
zarr                       3.1.3
zstandard                  0.25.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Triage 🩺This issue needs to be triaged by a maintainer

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions