You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `ClusterManager.jl` package implements code for different job queue systems commonly used on compute clusters.
3
+
The `LSFClusterManager.jl` package implements code for the LSF (Load Sharing Facility) compute cluster job queue system.
4
4
5
-
> [!WARNING]
6
-
> This package is not currently being actively maintained or tested.
7
-
>
8
-
> We are in the process of splitting this package up into multiple smaller packages, with a separate package for each job queue systems.
9
-
>
10
-
> We are seeking maintainers for these new packages. If you are an active user of any of the job queue systems listed below and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer, and specify which job queue system you use.
11
-
12
-
## Available job queue systems
5
+
`LSFManager` supports IBM's scheduler. See the `addprocs_lsf` docstring
6
+
for more information.
13
7
14
-
Implemented in this package (the `ClusterManagers.jl` package):
8
+
Implemented in this package (the `LSFClusterManager.jl` package):
| Sun Grid Engine (SGE) via `qsub`|`addprocs_sge(np::Integer; qsub_flags=``)` or `addprocs(SGEManager(np, qsub_flags))`|
20
-
| Sun Grid Engine (SGE) via `qrsh`|`addprocs_qrsh(np::Integer; qsub_flags=``)` or `addprocs(QRSHManager(np, qsub_flags))`|
21
-
| PBS (Portable Batch System) |`addprocs_pbs(np::Integer; qsub_flags=``)` or `addprocs(PBSManager(np, qsub_flags))`|
22
-
| Scyld |`addprocs_scyld(np::Integer)` or `addprocs(ScyldManager(np))`|
23
-
| HTCondor[^1]|`addprocs_htc(np::Integer)` or `addprocs(HTCManager(np))`|
24
-
| Slurm |`addprocs_slurm(np::Integer; kwargs...)` or `addprocs(SlurmManager(np); kwargs...)`|
25
-
| Local manager with CPU affinity setting |`addprocs(LocalAffinityManager(;np=CPU_CORES, mode::AffinityMode=BALANCED, affinities=[]); kwargs...)`|
26
-
27
-
[^1]: HTCondor was previously named Condor.
28
-
29
-
Implemented in external packages:
30
13
31
-
| Job queue system | Command to add processors |
32
-
| ---------------- | ------------------------- |
33
-
| Kubernetes (K8s) via [K8sClusterManagers.jl](https://github.com/beacon-biosignals/K8sClusterManagers.jl)|`addprocs(K8sClusterManagers(np; kwargs...))`|
34
-
| Azure scale-sets via [AzManagers.jl](https://github.com/ChevronETC/AzManagers.jl)|`addprocs(vmtemplate, n; kwargs...)`|
35
-
36
-
You can also write your own custom cluster manager; see the instructions in the [Julia manual](https://docs.julialang.org/en/v1/manual/distributed-computing/#ClusterManagers).
37
-
38
-
### Slurm: a simple example
14
+
### LSF: a simple interactive example
39
15
40
16
```julia
41
-
using Distributed, ClusterManagers
42
-
43
-
# Arguments to the Slurm srun(1) command can be given as keyword
44
-
# arguments to addprocs. The argument name and value is translated to
45
-
# a srun(1) command line argument as follows:
46
-
# 1) If the length of the argument is 1 => "-arg value",
47
-
# e.g. t="0:1:0" => "-t 0:1:0"
48
-
# 2) If the length of the argument is > 1 => "--arg=value"
49
-
# e.g. time="0:1:0" => "--time=0:1:0"
50
-
# 3) If the value is the empty string, it becomes a flag value,
51
-
# e.g. exclusive="" => "--exclusive"
52
-
# 4) If the argument contains "_", they are replaced with "-",
@@ -93,13 +36,13 @@ julia> From worker 2: compute-6
93
36
From worker 3: compute-6
94
37
```
95
38
96
-
Some clusters require the user to specify a list of required resources.
39
+
Some clusters require the user to specify a list of required resources.
97
40
For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390).
98
41
The keyword `qsub_flags` can be used to specify these and other options.
99
42
Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`).
You can set `addr=:auto` to automatically use the host's private IP address on the local network, which will allow other workers on this network to connect. You can also use `port=0` to let the OS choose a random free port for you (some systems may not support this). Once created, printing the `ElasticManager` object prints the command which you can run on workers to connect them to the master, e.g.:
By default, the printed command uses the absolute path to the current Julia executable and activates the same project as the current session. You can change either of these defaults by passing `printing_kwargs=(absolute_exename=false, same_project=false))` to the first form of the `ElasticManager` constructor.
184
-
185
-
Once workers are connected, you can print the `em` object again to see them added to the list of active workers.
0 commit comments