Skip to content

Add support for creating and restoring-from checkpoints #740

@eyalroz

Description

@eyalroz

With CUDA 12.8, NVIDIA introduced a checkpoint mechanism:

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CHECKPOINT.html

  • cuCheckpointProcessCheckpoint ( int pid, CUcheckpointCheckpointArgs* args )
  • cuCheckpointProcessGetRestoreThreadId ( int pid, int* tid )
  • cuCheckpointProcessGetState ( int pid, CUprocessState* state )
  • cuCheckpointProcessLock ( int pid, CUcheckpointLockArgs* args )
  • cuCheckpointProcessRestore ( int pid, CUcheckpointRestoreArgs* args )
  • cuCheckpointProcessUnlock ( int pid, CUcheckpointUnlockArgs* args )

Let's support that.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions