@@ -15,7 +15,7 @@ Kubernetes scheduler that has been added to Spark.
1515[ kubectl] ( https://kubernetes.io/docs/user-guide/prereqs/ ) . If you do not already have a working Kubernetes cluster,
1616you may setup a test cluster on your local machine using
1717[ minikube] ( https://kubernetes.io/docs/getting-started-guides/minikube/ ) .
18- * We recommend using the latest releases of minikube be updated to the most recent version with the DNS addon enabled.
18+ * We recommend using the latest release of minikube with the DNS addon enabled.
1919* You must have appropriate permissions to list, create, edit and delete
2020[ pods] ( https://kubernetes.io/docs/user-guide/pods/ ) in your cluster. You can verify that you can list these resources
2121by running ` kubectl auth can-i <list|create|edit|delete> pods ` .
@@ -28,12 +28,13 @@ by running `kubectl auth can-i <list|create|edit|delete> pods`.
2828 <img src =" img/k8s-cluster-mode.png " title =" Spark cluster components " alt =" Spark cluster components " />
2929</p >
3030
31- spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
31+ <code >spark-submit</code > can be directly used to submit a Spark application to a Kubernetes cluster.
32+ The submission mechanism works as follows:
3233
33- * Spark creates a spark driver running within a [ Kubernetes pod] ( https://kubernetes.io/docs/concepts/workloads/pods/pod/ ) .
34+ * Spark creates a Spark driver running within a [ Kubernetes pod] ( https://kubernetes.io/docs/concepts/workloads/pods/pod/ ) .
3435* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
3536* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
36- logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up.
37+ logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up.
3738
3839Note that in the completed state, the driver pod does * not* use any computational or memory resources.
3940
@@ -54,7 +55,7 @@ and built for your usage.
5455
5556You may build these docker images from sources.
5657There is a script, ` sbin/build-push-docker-images.sh ` that you can use to build and push
57- customized spark distribution images consisting of all the above components.
58+ customized Spark distribution images consisting of all the above components.
5859
5960Example usage is:
6061
@@ -95,14 +96,14 @@ kubectl cluster-info
9596Kubernetes master is running at http://127.0.0.1:6443
9697```
9798
98- In the above example, the specific Kubernetes cluster can be used with spark submit by specifying
99+ In the above example, the specific Kubernetes cluster can be used with < code > spark- submit</ code > by specifying
99100` --master k8s://http://127.0.0.1:6443 ` as an argument to spark-submit. Additionally, it is also possible to use the
100101authenticating proxy, ` kubectl proxy ` to communicate to the Kubernetes API.
101102
102103The local proxy can be started by:
103104
104105``` bash
105- kubectl proxy
106+ kubectl proxy
106107```
107108
108109If the local proxy is running at localhost:8001, ` --master k8s://http://127.0.0.1:8001 ` can be used as the argument to
@@ -123,15 +124,15 @@ take actions.
123124
124125### Accessing Logs
125126
126- Logs can be accessed using the kubernetes API and the ` kubectl ` CLI. When a Spark application is running, it's possible
127+ Logs can be accessed using the Kubernetes API and the ` kubectl ` CLI. When a Spark application is running, it's possible
127128to stream logs from the application using:
128129
129130``` bash
130131kubectl -n=< namespace> logs -f < driver-pod-name>
131132```
132133
133134The same logs can also be accessed through the
134- [ kubernetes dashboard] ( https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/ ) if installed on
135+ [ Kubernetes dashboard] ( https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/ ) if installed on
135136the cluster.
136137
137138### Accessing Driver UI
@@ -143,13 +144,13 @@ The UI associated with any application can be accessed locally using
143144kubectl port-forward < driver-pod-name> 4040:4040
144145```
145146
146- Then, the spark driver UI can be accessed on ` http://localhost:4040 ` .
147+ Then, the Spark driver UI can be accessed on ` http://localhost:4040 ` .
147148
148149### Debugging
149150
150151There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
151152connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
152- are errors during the running of the application, often, the best way to investigate may be through the kubernetes CLI.
153+ are errors during the running of the application, often, the best way to investigate may be through the Kubernetes CLI.
153154
154155To get some basic information about the scheduling decisions made around the driver pod, you can run:
155156
@@ -165,15 +166,15 @@ kubectl logs <spark-driver-pod>
165166
166167Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
167168application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
168- the spark application.
169+ the Spark application.
169170
170171## Kubernetes Features
171172
172173### Namespaces
173174
174175Kubernetes has the concept of [ namespaces] ( https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ ) .
175176Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
176- use namespaces to launch spark applications. This is through the ` --conf spark.kubernetes.namespace` argument to spark-submit .
177+ use namespaces to launch Spark applications. This can be made use of through the ` spark.kubernetes.namespace ` configuration .
177178
178179Kubernetes allows using [ ResourceQuota] ( https://kubernetes.io/docs/concepts/policy/resource-quotas/ ) to set limits on
179180resources, number of objects, etc on individual namespaces. Namespaces and ResourceQuota can be used in combination by
@@ -198,7 +199,7 @@ that allows driver pods to create pods and services under the default Kubernetes
198199service account that has the right role granted. Spark on Kubernetes supports specifying a custom service account to
199200be used by the driver pod through the configuration property
200201` spark.kubernetes.authenticate.driver.serviceAccountName=<service account name> ` . For example to make the driver pod
201- to use the ` spark ` service account, a user simply adds the following option to the ` spark-submit ` command:
202+ use the ` spark ` service account, a user simply adds the following option to the ` spark-submit ` command:
202203
203204```
204205--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
@@ -272,6 +273,7 @@ specific to Spark on Kubernetes.
272273 <td >
273274 Docker image to use for the driver. Specify this using the standard
274275 <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
276+ This configuration is required and must be provided by the user.
275277 </td >
276278</tr >
277279<tr >
@@ -280,6 +282,7 @@ specific to Spark on Kubernetes.
280282 <td >
281283 Docker image to use for the executors. Specify this using the standard
282284 <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
285+ This configuration is required and must be provided by the user.
283286 </td >
284287</tr >
285288<tr >
@@ -365,7 +368,7 @@ specific to Spark on Kubernetes.
365368 <td ><code >spark.kubernetes.authenticate.driver.oauthToken</code ></td >
366369 <td >(none)</td >
367370 <td >
368- OAuth token to use when authenticating against the against the Kubernetes API server from the driver pod when
371+ OAuth token to use when authenticating against the Kubernetes API server from the driver pod when
369372 requesting executors. Note that unlike the other authentication options, this must be the exact string value of
370373 the token to use for the authentication. This token value is uploaded to the driver pod. If this is specified, it is
371374 highly recommended to set up TLS for the driver submission server, as this value is sensitive information that would
@@ -483,15 +486,17 @@ specific to Spark on Kubernetes.
483486 <td ><code >spark.kubernetes.driver.secrets.[SecretName]</code ></td >
484487 <td >(none)</td >
485488 <td >
486- Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
489+ Mounts the [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
490+ named <code>SecretName</code> onto the path specified by the value
487491 in the driver Pod. The user can specify multiple instances of this for multiple secrets.
488492 </td >
489493 </tr >
490494 <tr >
491495 <td ><code >spark.kubernetes.executor.secrets.[SecretName]</code ></td >
492496 <td >(none)</td >
493497 <td >
494- Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
498+ Mounts the [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
499+ named <code>SecretName</code> onto the path specified by the value
495500 in the executor Pods. The user can specify multiple instances of this for multiple secrets.
496501 </td >
497502 </tr >
0 commit comments