This repository was archived by the owner on Jan 9, 2020. It is now read-only.
File tree Expand file tree Collapse file tree 1 file changed +23
-0
lines changed
resource-managers/kubernetes/architecture-docs Expand file tree Collapse file tree 1 file changed +23
-0
lines changed Original file line number Diff line number Diff line change @@ -3,4 +3,27 @@ layout: global
33title : Kubernetes Implementation of the External Shuffle Service
44---
55
6+ # External Shuffle Service
67
8+ The ` KubernetesExternalShuffleService ` was added to allow Spark to use Dynamic Allocation Mode when
9+ running in Kubernetes. The shuffle service is responsible for persisting shuffle files beyond the
10+ lifetime of the executors, allowing the number of executors to scale up and down without losing
11+ computation.
12+
13+ The implementation of choice is as a DaemonSet that runs a shuffle-service pod on each node.
14+ Shuffle-service pods and executors pods that land on the same node share disk using hostpath
15+ volumes. Spark requires that each executor must know the IP address of the shuffle-service pod that
16+ shares disk with it.
17+
18+ The user specifies the shuffle service pods they want executors of a particular SparkJob to use
19+ through two new properties:
20+
21+ * spark.kubernetes.shuffle.service.labels
22+ * spark.kubernetes.shuffle.namespace
23+
24+ KubernetesClusterSchedulerBackend is aware of shuffle service pods and the node corresponding to
25+ them in a particular namespace. It uses this data to configure the executor pods to connect with the
26+ shuffle services that are co-located with them on the same node.
27+
28+ There is additional logic in the ` KubernetesExternalShuffleService ` to watch the Kubernetes API,
29+ detect failures, and proactively cleanup files in those error cases.
You can’t perform that action at this time.
0 commit comments