Skip to content

Commit aa49815

Browse files
committed
init
1 parent 42e85ea commit aa49815

File tree

4 files changed

+3
-283
lines changed

4 files changed

+3
-283
lines changed

README.md

Lines changed: 0 additions & 283 deletions
Original file line numberDiff line numberDiff line change
@@ -1,283 +0,0 @@
1-
# Deploying Spark on Kubernetes
2-
3-
This post details how to deploy Spark on a Kubernetes cluster.
4-
5-
*Dependencies:*
6-
7-
- Docker v18.06.1-ce
8-
- Minikube v0.29.0
9-
- Spark v2.2.1
10-
- Hadoop 2.7.3
11-
12-
## Minikube
13-
14-
[Minikube](https://kubernetes.io/docs/setup/minikube/) is a tool used to run a single-node Kubernetes cluster locally.
15-
16-
Follow the official [Install Minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) guide to install it along with a [Hypervisor](https://kubernetes.io/docs/tasks/tools/install-minikube/#install-a-hypervisor) (like [VirtualBox](https://www.virtualbox.org/wiki/Downloads) or [HyperKit](https://github.com/moby/hyperkit), to manage virtual machines, and [Kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/), to deploy and manage apps on Kubernetes.
17-
18-
By default, the Mikikube VM is configured to use 1GB of memory and 2 CPU cores. This is [not sufficient](https://spark.apache.org/docs/2.3.1/hardware-provisioning.html) for Spark jobs, so be sure to increase the memory in your Docker [client](https://docs.docker.com/docker-for-mac/#advanced) (for HyperKit) or directly in VirtualBox. Then, when you start Mikikube, pass the memory and CPU options to it:
19-
20-
```sh
21-
$ minikube start --vm-driver=hyperkit --memory 8192 --cpus 4
22-
23-
or
24-
25-
$ minikube start --memory 8192 --cpus 4
26-
```
27-
28-
## Docker
29-
30-
Next, let's build a custom Docker image for Spark [2.2.1](https://spark.apache.org/releases/spark-release-2-2-2.html), designed for Spark [Standalone mode](https://spark.apache.org/docs/latest/spark-standalone.html).
31-
32-
*Dockerfile*:
33-
34-
```
35-
# base image
36-
FROM java:openjdk-8-jdk
37-
38-
# define spark and hadoop versions
39-
ENV HADOOP_VERSION 2.7.3
40-
ENV SPARK_VERSION 2.2.1
41-
42-
# download and install hadoop
43-
RUN mkdir -p /opt && \
44-
cd /opt && \
45-
curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
46-
tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
47-
ln -s hadoop-${HADOOP_VERSION} hadoop && \
48-
echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native
49-
50-
# download and install spark
51-
RUN mkdir -p /opt && \
52-
cd /opt && \
53-
curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
54-
tar -zx && \
55-
ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
56-
echo Spark ${SPARK_VERSION} installed in /opt
57-
58-
# add scripts and update spark default config
59-
ADD common.sh spark-master spark-worker /
60-
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
61-
ENV PATH $PATH:/opt/spark/bin
62-
```
63-
64-
You can find the above *Dockerfile* along with the Spark config file and scripts in the [spark-kubernetes](foo) repo on GitHub.
65-
66-
Build the image:
67-
68-
```sh
69-
$ eval $(minikube docker-env)
70-
$ docker build -t spark-hadoop:2.2.1 .
71-
```
72-
73-
> If you don't want to spend the time building the image locally, feel free to use my pre-built Spark image from [Docker Hub](https://hub.docker.com/) - `mjhea0/spark-hadoop:2.2.1`.
74-
75-
View:
76-
77-
```sh
78-
$ docker image ls spark-hadoop
79-
80-
REPOSITORY TAG IMAGE ID CREATED SIZE
81-
spark-hadoop 2.2.1 3ebc80d468bb 3 minutes ago 875MB
82-
```
83-
84-
## Spark Master
85-
86-
*spark-master-deployment.yaml*:
87-
88-
```yaml
89-
kind: Deployment
90-
apiVersion: extensions/v1beta1
91-
metadata:
92-
name: spark-master
93-
spec:
94-
replicas: 1
95-
selector:
96-
matchLabels:
97-
component: spark-master
98-
template:
99-
metadata:
100-
labels:
101-
component: spark-master
102-
spec:
103-
containers:
104-
- name: spark-master
105-
image: spark-hadoop:2.2.1
106-
command: ["/spark-master"]
107-
ports:
108-
- containerPort: 7077
109-
- containerPort: 8080
110-
resources:
111-
requests:
112-
cpu: 100m
113-
```
114-
115-
*spark-master-service.yaml*:
116-
117-
```yaml
118-
kind: Service
119-
apiVersion: v1
120-
metadata:
121-
name: spark-master
122-
spec:
123-
ports:
124-
- name: webui
125-
port: 8080
126-
targetPort: 8080
127-
- name: spark
128-
port: 7077
129-
targetPort: 7077
130-
selector:
131-
component: spark-master
132-
```
133-
134-
Create the Spark master Deployment and start the Services:
135-
136-
```sh
137-
$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
138-
$ kubectl create -f ./kubernetes/spark-master-service.yaml
139-
```
140-
141-
Verify:
142-
143-
```sh
144-
$ kubectl get deployments
145-
146-
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
147-
spark-master-deployment 1 1 1 1 11s
148-
149-
150-
$ kubectl get pods
151-
152-
NAME READY STATUS RESTARTS AGE
153-
spark-master-698c46ff7d-vxv7r 1/1 Running 0 41s
154-
```
155-
156-
## Spark Workers
157-
158-
*spark-worker-deployment.yaml*:
159-
160-
```yaml
161-
kind: Deployment
162-
apiVersion: extensions/v1beta1
163-
metadata:
164-
name: spark-worker
165-
spec:
166-
replicas: 2
167-
selector:
168-
matchLabels:
169-
component: spark-worker
170-
template:
171-
metadata:
172-
labels:
173-
component: spark-worker
174-
spec:
175-
containers:
176-
- name: spark-worker
177-
image: spark-hadoop:2.2.1
178-
command: ["/spark-worker"]
179-
ports:
180-
- containerPort: 8081
181-
resources:
182-
requests:
183-
cpu: 100m
184-
```
185-
186-
Create the Spark worker Deployment:
187-
188-
```sh
189-
$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml
190-
```
191-
192-
Verify:
193-
194-
```sh
195-
$ kubectl get deployments
196-
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
197-
spark-master 1 1 1 1 1m
198-
spark-worker 2 2 2 2 3s
199-
200-
201-
$ kubectl get pods
202-
203-
NAME READY STATUS RESTARTS AGE
204-
spark-master-698c46ff7d-vxv7r 1/1 Running 0 1m
205-
spark-worker-c49766f54-r5p9t 1/1 Running 0 21s
206-
spark-worker-c49766f54-rh4bc 1/1 Running 0 21s
207-
```
208-
209-
## Ingress
210-
211-
Did you notice that we exposed the Spark web UI on port 8080? In order to access it outside the cluster, let's configure an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) object.
212-
213-
*minikube-ingress.yaml*:
214-
215-
```yaml
216-
apiVersion: extensions/v1beta1
217-
kind: Ingress
218-
metadata:
219-
name: minikube-ingress
220-
annotations:
221-
spec:
222-
rules:
223-
- host: spark-kubernetes
224-
http:
225-
paths:
226-
- path: /
227-
backend:
228-
serviceName: spark-master
229-
servicePort: 8080
230-
```
231-
232-
Enable the Ingress [addon](https://github.com/kubernetes/minikube/tree/master/deploy/addons/ingress):
233-
234-
```sh
235-
$ minikube addons enable ingress
236-
```
237-
238-
Create the Ingress object:
239-
240-
```sh
241-
$ kubectl apply -f ./kubernetes/minikube-ingress.yaml
242-
```
243-
244-
Next, you need to update your */etc/hosts* file to route requests from the host we defined, `spark-kubernetes`, to the Minikube instance.
245-
246-
Add an entry to /etc/hosts:
247-
248-
```sh
249-
$ echo "$(minikube ip) spark-kubernetes" | sudo tee -a /etc/hosts
250-
```
251-
252-
Test it out in the browser at [http://spark-kubernetes/](http://spark-kubernetes/):
253-
254-
TODO: add image
255-
256-
## Test
257-
258-
To test, run the PySpark shell from the the master container:
259-
260-
```sh
261-
$ kubectl exec spark-master-698c46ff7d-r4tq5 -it pyspark
262-
```
263-
264-
Then run the following code after the PySpark prompt appears:
265-
266-
```python
267-
words = 'the quick brown fox jumps over the\
268-
lazy dog the quick brown fox jumps over the lazy dog'
269-
sc = SparkContext()
270-
seq = words.split()
271-
data = sc.parallelize(seq)
272-
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
273-
dict(counts)
274-
sc.stop()
275-
```
276-
277-
You should see:
278-
279-
```sh
280-
{'brown': 2, 'lazy': 2, 'over': 2, 'fox': 2, 'dog': 2, 'quick': 2, 'the': 4, 'jumps': 2}
281-
```
282-
283-
TODO: add video

create.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,8 @@
22

33
kubectl create -f ./kubernetes/spark-master-deployment.yaml
44
kubectl create -f ./kubernetes/spark-master-service.yaml
5+
6+
sleep 10
7+
58
kubectl create -f ./kubernetes/spark-worker-deployment.yaml
69
kubectl apply -f ./kubernetes/minikube-ingress.yaml

spark-web-ui.png

-248 KB
Binary file not shown.

test.py

Whitespace-only changes.

0 commit comments

Comments
 (0)