Tested on Rockly linux 9, CGroup2, Slurm 23.02.8.
CGroup1 works too, prefer CGroup2, which seems to have a nicer(or at least good looking) handling of the processes.
as the same version(major and minor should be the same) as the one that you installed on your HPC clsuter.
E.g., the prefix=xxx when the package was built. You can open a installed Slurm lib, like "vi libslurm.so.39" and search for "slurm.conf".
If you have these info, then you can(keep using the same)
>./configure --prefix=/usr/local --sysconfdir=/opt/etc/slurm --libdir=/opt/lib64
The major one is "--prefix=". The other path can be deduced from it, or you can set it differently.
>cd contribs/pam_slurm_adopt
>gmake
You do not need to run "make install", as you already have the package isnatlled, and you do not want to mess with them. The compiled "pam_slurm_adopt.so" is just in folder of "contribs/pam_slurm_adopt/.libs".
Copy it to "/lib64/security/" folder of the compute nodes.
#/etc/pam.d/sshd
#Add to the last line
account required pam_slurm_adopt.so log_level=debug5
#Comment all lines of "pam_systemd.so" of all of the pam files
/etc/pam.d/password-auth:#-session optional pam_systemd.so
/etc/pam.d/runuser-l:#-session optional pam_systemd.so
/etc/pam.d/system-auth:#-session optional pam_systemd.so
Sometimes, if you can not find or organize the PATHs like "--prefix=", "--sysconfdir" or "--libdir", you can try to modify the source file of "pam_slurm_adopt.c".
>cd contribs/pam_slurm_adopt
> vi pam_slurm_adopt.c
#change line 843
slurm_conf_init("/opt/etc/myslurm/myslurm.conf");//(NULL);
>gmake
And then copy the pam_slurm_adopt.so to /lib64/security folder.
$ ssh node005
Access denied by pam_slurm_adopt: you have no active jobs on this node
Login not allowed: no running jobs and no WLM allocations
Connection closed by 192.168.0.205 port 22
If you have a job on the same node, you shoudl be able to SSH login. AND all spawned processes of this new SSH session on the same node will be adopted and controlled by your Slurm job's CGROUP, to share the same limit of resources, like GPU devce, RAM, etc. Once the job is done, that SSH session will be terminated imediately.
To check whether it works as expected, you can run command on the compute node:
$ systemd-cgls
And you can check if the extra SSH session's processes are adopted to the proper Slurm job:
......
├─slurmstepd.scope … (#5289)
│ → user.invocation_id: 29f6242c7a2c4590ae1da8976195058b
│ → user.delegate: 1
│ ├─job_1651 (#2688323)
│ │ ├─step_0 (#2688543)
│ │ │ ├─slurm (#2688631)
│ │ │ │ └─2739202 slurmstepd: [1651.0]
│ │ │ └─user (#2688587)
│ │ │ └─task_0 (#2688719)
│ │ │ ├─2739212 /bin/bash
│ │ │ └─2739337 top <-------------my srun session
│ │ └─step_extern (#2688367)
│ │ ├─slurm (#2688455)
│ │ │ └─2739195 slurmstepd: [1651.extern]
│ │ └─user (#2688411)
│ │ └─task_special (#2688499)
│ │ ├─2739199 sleep 100000000
│ │ ├─2739273 sshd: feng [priv] <----my SSH session
│ │ ├─2739278 sshd: feng@pts/1
│ │ ├─2739279 -bash
│ │ ├─2739338 systemd-cgls
.......