Skip to content
Draft
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion Python/specialize.c
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,28 @@ static int function_kind(PyCodeObject *code);
static bool function_check_args(PyObject *o, int expected_argcount, int opcode);
static uint32_t function_get_version(PyObject *o, int opcode);

#ifdef Py_GIL_DISABLED
static void
maybe_enable_deferred_ref_count(PyObject *dict, PyObject *name)
{
PyObject *op;
if (PyDict_GetItemRef(dict, name, &op) != 1) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's combine this lookup with the earlier _PyDict_LookupIndex. _PyDict_LookupIndex already gets the value internally and throws it away.

if (_Py_IsOwnedByCurrentThread(op) ||
!PyType_IS_GC(Py_TYPE(op)) ||
_PyObject_HasDeferredRefcount(op)) {
Py_DECREF(op);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The !PyType_IS_GC and _PyObject_HasDeferredRefcount checks aren't really necessary because PyUnstable_Object_EnableDeferredRefcount handles those cases internally.

if (PyFrozenSet_Check(op) || PyTuple_Check(op) || PyType_Check(op)) {
PyUnstable_Object_EnableDeferredRefcount(op);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this for all objects, not just a few types. Otherwise, I think we will keep run into scaling bottlenecks with global variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... That's going to push a lot of objects to the GC for deallocation in programs that do a lot of work at the global level (but do have a few functions that are called enough that use those globals.) I'm thinking of naively written "scripts". Then again, it would only do that for objects that participate in GC (i.e. not strings or integers), so it's probably fine. I mean, nobody's too surprised when things aren't deallocated quite as quickly as they expect... right?

Yeah, I think this should be fine for main. I definitely wouldn't backport it to 3.14.

Copy link
Contributor

@colesbury colesbury Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree regarding not backporting to 3.14.

I think the above check that excludes owned by the current thread should help with most scripts -- it'll only affect global variables accessed by multiple threads.

Another thing that may help is that I think the PyUnstable_Object_EnableDeferredRefcount call only happens at specialization time, not every time a variable is updated or accessed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm a bit worried about increased memory usage. Since it's only for objects shared between threads and only for hot LOAD_GLOBAL/LOAD_ATTR instructions, maybe it's okay. Perhaps we should consider changing gc_should_collect() to check not only the young object count but also the process memory size increase as well. The condition with count < gcstate->long_lived_total / 4 could be done with an "or" condition with the process size check.

Py_DECREF(op);
}
#endif


static int
specialize_module_load_attr_lock_held(PyDictObject *dict, _Py_CODEUNIT *instr, PyObject *name)
{
Expand Down Expand Up @@ -384,6 +406,9 @@ specialize_module_load_attr_lock_held(PyDictObject *dict, _Py_CODEUNIT *instr, P
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_OUT_OF_VERSIONS);
return -1;
}
#ifdef Py_GIL_DISABLED
maybe_enable_deferred_ref_count((PyObject *)dict, name);
#endif
write_u32(cache->version, keys_version);
cache->index = (uint16_t)index;
specialize(instr, LOAD_ATTR_MODULE);
Expand Down Expand Up @@ -1264,7 +1289,6 @@ specialize_attr_loadclassattr(PyObject *owner, _Py_CODEUNIT *instr,
return 1;
}


static void
specialize_load_global_lock_held(
PyObject *globals, PyObject *builtins,
Expand Down Expand Up @@ -1305,6 +1329,9 @@ specialize_load_global_lock_held(
SPECIALIZATION_FAIL(LOAD_GLOBAL, SPEC_FAIL_OUT_OF_RANGE);
goto fail;
}
#ifdef Py_GIL_DISABLED
maybe_enable_deferred_ref_count(globals, name);
#endif
cache->index = (uint16_t)index;
cache->module_keys_version = (uint16_t)keys_version;
specialize(instr, LOAD_GLOBAL_MODULE);
Expand Down
Loading