Skip to content

Commit 09ca115

Browse files
committed
Nomad: recommendations for singleton deployments
Many users have a requirement to run exactly one instance of a given allocation because it requires exclusive access to some cluster-wide resource, which we'll refer to here as a "singleton allocation". This is challenging to implement, so this document is intended to describe an accepted design to publish as a how-to/tutorial.
1 parent 0300018 commit 09ca115

File tree

1 file changed

+300
-0
lines changed
  • content/nomad/v1.11.x/content/docs/job-declare/strategy

1 file changed

+300
-0
lines changed
Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
---
2+
layout: docs
3+
page_title: Configure singleton deployments
4+
description: |-
5+
Declare a job that guarantees only a single instance can run at a time, with
6+
minimal downtime.
7+
---
8+
9+
# Configure singleton deployments
10+
11+
A singleton deployment is one where there is at most one instance of a given
12+
allocation running on the cluster at one time. You might need this if the
13+
workload needs exclusive access to a remote resource like a data store. Nomad
14+
does not support singleton deployments as a built-in feature. Your workloads
15+
continue to run even when the Nomad client agent has crashed, so ensuring
16+
there's at most one allocation for a given workload some cooperation from the
17+
job. This document describes how to implement singleton deployments.
18+
19+
## Design Goals
20+
21+
The configuration described here meets two primary design goals:
22+
23+
* The design will prevent a specific process with a task from running if there
24+
is another instance of that task running anywhere else on the Nomad cluster.
25+
* Nomad should be able to recover from failure of the task or the node on which
26+
the task is running with minimal downtime, where "recovery" means that the
27+
original task should be stopped and that Nomad should schedule a replacement
28+
task.
29+
* Nomad should minimize false positive detection of failures to avoid
30+
unnecessary downtime during the cutover.
31+
32+
There's a tradeoff between between recovery speed and false positives. The
33+
faster you make Nomad attempt to recover from failure, the more likely that a
34+
transient failure causes a replacement to be scheduled and a subsequent
35+
downtime.
36+
37+
Note that it's not possible to design a perfectly zero-downtime singleton
38+
allocation in a distributed system. This design will err on the side of
39+
correctness: having 0 or 1 allocations running rather than the incorrect 1 or 2
40+
allocations running.
41+
42+
## Overview
43+
44+
There are several options available for some details of the implementation, but
45+
all of them include the following:
46+
47+
* You must have a distributed lock with a TTL that's refreshed from the
48+
allocation. The process that sets and refreshes the lock must have its
49+
lifecycle tied to the main task. It can be either in-process, in-task with
50+
supervision, or run as a sidecar. If the allocation cannot obtain the lock,
51+
then it must not start whatever process or operations is intended to be a
52+
singleton. After a configurable window without obtaining the lock, the
53+
allocation must fail.
54+
* You must set the [`group.disconnect.stop_on_client_after`][] field. This
55+
forces a Nomad client that's disconnected from the server to stop the
56+
singleton allocation, which in turn releases the lock or allows its TTL to
57+
expire.
58+
59+
The values for the three timers (the lock TTL, the time it takes the alloc to
60+
give up, and the `stop_on_client_after` duration) are the values that can be
61+
tuned to reduce the maximum amount of downtime the application can have.
62+
63+
The Nomad [Locks API][] can support the operations needed. In psuedo-code these
64+
operations are:
65+
66+
* `PUT /v1/var/:path?lock-acquire`
67+
* On success: start heartbeat every 1/2 TTL
68+
* On conflict or failure: retry with backoff and timeout.
69+
* Once out of attempts, exit the process with error code.
70+
* To heartbeat, `PUT /v1/var/:path?lock-renew`
71+
* On success: continue
72+
* On conflict: exit the process with error code
73+
* On failure: retry with backoff up to TTL.
74+
* If TTL expires, attempt to revoke lock, then exit the process with error code.
75+
76+
The allocation can safely use the Nomad [Task API][] socket to write to the
77+
locks API, rather than communicating with the server directly. This reduces load
78+
on the server and speeds up detection of failed client nodes because the
79+
disconnected client cannot forward the Task API requests to the leader.
80+
81+
The [`nomad var lock`][] command implements this logic and can be used to shim
82+
the process being locked.
83+
84+
### ACLs
85+
86+
Allocations cannot write to Variables by default. You must configure a
87+
[workload-associated ACL policy][] that allows write access in the
88+
[`namespace.variables`][] block. For example, the following ACL policy allows
89+
access to write a lock on the path `nomad/jobs/example/lock` in the `prod`
90+
namespace:
91+
92+
```
93+
namespace "prod" {
94+
variables {
95+
path "nomad/jobs/example/lock" {
96+
capabilities = ["write", "read", "list"]
97+
}
98+
}
99+
}
100+
```
101+
102+
You set this policy on the job with `nomad acl policy apply -namespace prod -job
103+
example example-lock ./policy.hcl`.
104+
105+
### Using `nomad var lock`
106+
107+
The easiest way to implement the locking logic is to use `nomad var lock` as a
108+
shim in your task. The jobspec below assumes there's a Nomad binary in the
109+
container image.
110+
111+
```hcl
112+
job "example" {
113+
group "group" {
114+
115+
disconnect {
116+
stop_on_client_after = "1m"
117+
}
118+
119+
task "primary" {
120+
config {
121+
driver = "docker"
122+
image = "example/app:1"
123+
command = "nomad"
124+
args = [
125+
"var", "lock", "nomad/jobs/example/lock", # lock
126+
"busybox", "httpd", # application
127+
"-vv", "-f", "-p", "8001", "-h", "/local" # application args
128+
]
129+
}
130+
131+
identity {
132+
env = true
133+
}
134+
}
135+
}
136+
}
137+
```
138+
139+
If you don't want to ship a Nomad binary in the container image you can make a
140+
read-only mount of the binary from a host volume. This will only work in cases
141+
where the Nomad binary has been statically linked or you have glibc in the
142+
container image.
143+
144+
```hcl
145+
job "example" {
146+
group "group" {
147+
148+
disconnect {
149+
stop_on_client_after = "1m"
150+
}
151+
152+
volume "binaries" {
153+
type = "host"
154+
source = "binaries"
155+
read_only = true
156+
}
157+
158+
task "primary" {
159+
config {
160+
driver = "docker"
161+
image = "example/app:1"
162+
command = "/opt/bin/nomad"
163+
args = [
164+
"var", "lock", "nomad/jobs/example/lock", # lock
165+
"busybox", "httpd", # application
166+
"-vv", "-f", "-p", "8001", "-h", "/local" # application args
167+
]
168+
}
169+
170+
identity {
171+
env = true # make NOMAD_TOKEN available to lock command
172+
}
173+
174+
volume_mount {
175+
volume = "binaries"
176+
destination = "/opt/bin"
177+
}
178+
}
179+
}
180+
}
181+
```
182+
183+
### Sidecar Lock
184+
185+
If cannot implement the lock logic in your application or with a shim such as
186+
`nomad var lock`, you'rll need to implement it such that the task you are
187+
locking is running as a sidecar of the locking task, which has
188+
[`task.leader=true`][] set.
189+
190+
```hcl
191+
job "example" {
192+
group "group" {
193+
194+
disconnect {
195+
stop_on_client_after = "1m"
196+
}
197+
198+
task "lock" {
199+
leader = true
200+
config {
201+
driver = "raw_exec"
202+
command = "/opt/lock-script.sh"
203+
pid_mode = "host"
204+
}
205+
206+
identity {
207+
env = true # make NOMAD_TOKEN available to lock command
208+
}
209+
}
210+
211+
task "application" {
212+
lifecycle {
213+
hook = "poststart"
214+
sidecar = true
215+
}
216+
217+
config {
218+
driver = "docker"
219+
image = "example/app:1"
220+
}
221+
}
222+
}
223+
}
224+
```
225+
226+
The locking task has the following requirements:
227+
228+
* The locking task must be in the same group as the task being locked.
229+
* The locking task must be able to terminate the task being locked without the
230+
Nomad client being up (i.e. they share the same PID namespace, or the locking
231+
task is privileged).
232+
* The locking task must have a way of signalling the task being locked that it
233+
is safe to start. For example, the locking task can write a sentinel file into
234+
the /alloc directory, which the locked task tries to read on startup and
235+
blocks until it exists.
236+
237+
If the third requirement cannot be met, then you’ll need to split the lock
238+
acquisition and lock heartbeat into separate tasks:
239+
240+
```hcl
241+
job "example" {
242+
group "group" {
243+
244+
disconnect {
245+
stop_on_client_after = "1m"
246+
}
247+
248+
task "acquire" {
249+
lifecycle {
250+
hook = "prestart"
251+
sidecar = false
252+
}
253+
config {
254+
driver = "raw_exec"
255+
command = "/opt/lock-acquire-script.sh"
256+
}
257+
identity {
258+
env = true # make NOMAD_TOKEN available to lock command
259+
}
260+
}
261+
262+
task "heartbeat" {
263+
leader = true
264+
config {
265+
driver = "raw_exec"
266+
command = "/opt/lock-heartbeat-script.sh"
267+
pid_mode = "host"
268+
}
269+
identity {
270+
env = true # make NOMAD_TOKEN available to lock command
271+
}
272+
}
273+
274+
task "application" {
275+
lifecycle {
276+
hook = "poststart"
277+
sidecar = true
278+
}
279+
280+
config {
281+
driver = "docker"
282+
image = "example/app:1"
283+
}
284+
}
285+
}
286+
}
287+
```
288+
289+
If the primary task is configured to [`restart`][], the task should be able to
290+
restart within the lock TTL in order to minimize flapping on restart. This
291+
improves availability but isn't required for correctness.
292+
293+
[`group.disconnect.stop_on_client_after`]: /nomad/docs/job-specification/disconnect#stop_on_client_after
294+
[Locks API]: /nomad/api-docs/variables/locks
295+
[Task API]: /nomad/api-docs/task-api
296+
[`nomad var lock`]: /nomad/commands/var/lock
297+
[workload-associated ACL policy]: /nomad/docs/concepts/workload-identity#workload-associated-acl-policies
298+
[`namespace.variables`]: /nomad/docs/other-specifications/acl-policy#variables
299+
[`task.leader=true`]: /nomad/docs/job-specification/task#leader
300+
[`restart`]: /nomad/docs/job-specification/restart

0 commit comments

Comments
 (0)