Update: horde
is a library that is “built to address some perceived shortcomings of Swarm’s design.” (from the introductory blog post), and is currently (as of December 2020) more actively maintained than Swarm. It also works together with libcluster
. We used horde
for some new use cases and we recommend you to also check it out if Swarm doesn’t fit your need.
Collaborated with Rocky Neurock
We run Elixir on Kubernetes. While there were some rough edges for such a setup a few years ago, it has become much easier thanks to libraries such as libcluster and Swarm. Recently, we had a use case which those two libraries fit perfectly. However, since the libraries are relatively new, you might find the documentation and examples a bit lacking and it took us a while to get everything working together. We wrote this comprehensive walk-through to document our experience.
Note that you’d need to run your Elixir instances as releases, either with distillery
or built-in releases (Elixir 1.9+), in order for this to work. This walk-through is based on our setup using Distillery.
First, a bit of background information:
The ability for different nodes to form a cluster and maintain location transparency has always been a huge selling point for the BEAM VM. libcluster is a library that helps with automatic node discovery and cluster formation, especially when the nodes are run in Kubernetes pods.
The main reason you might want a cluster is so you can organize and orchestrate processes across different nodes. This is where Swarm comes into the picture. Swarm maintains a global process registry and automates tasks such as process migration/restart after the cluster topology changes.
In our case, we have some permanent worker processes which should only have one instance of each running across all nodes at any given time. The problem is that we perform k8s rolling deployments with our CD pipeline, thus pods (and therefore nodes) will get destroyed and recreated throughout the day.
With Swarm, we register all above-mentioned worker processes globally, so whenever a pod gets destroyed, its worker processes will be automatically restarted on another healthy pod, ensuring that they are running exactly as envisioned all the time.
(Note: Initially, Swarm contained both the auto-clustering functionality and the process orchestration functionality. Later, the maintainer decided that it would be better to split them into two separate libraries, which become the libcluster and Swarm we see today.)
In our first step, we’re going to ensure that automatic cluster formation in Kubernetes takes place successfully.
Before we add libcluster to our project, we need to ensure that every Elixir node has a unique name. By default, in the rel/vm.args
file generated by Distillery, we have a line just like:
-name <%= release_name %>@127.0.0.1
This means every node will be started as yourapp@127.0.0.1
, which is not what we want.
We could first expose the Kubernetes pod IP as an environment variable in the kube configuration (e.g. kube/deploy.yaml
) for the Elixir app:
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: your-app
# ...
spec:
# ...
template:
# ...
spec:
# ...
containers:
- name: your-app
# ...
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
and then you can change the line in rel/vm.args
so the pod IP address will be substituted at runtime:
-name <%= release_name %>@${MY_POD_IP}
After ensuring unique node names, you can already test clustering manually via your console:
# On your local machine
~ kubectl exec -n your-namespace your-pod -c your-container -- sh
# Launch Elixir console after connecting to your k8s pod.
~ bin/console
iex(app@<POD_1_IP>)> Node.connect(":app@<POD_2_IP>")
iex(app@)> Node.list()
# If things are set up correctly, Node.list() should return [":app@<POD_2_IP>"]
Having tested manual clustering, we can then move on to automatic clustering with libcluster.
You may notice that there are three clustering strategies for Kubernetes in libcluster and wonder which one you should use. In our experience, Cluster.Strategy.Kubernetes.DNS
is the easiest to set up. All it requires is that you add a headless service to your cluster and no modifications are needed for the already existing (Elixir) pods.
In your k8s config:
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: your-app
# ...
spec:
template:
metadata:
# The selector for the headless service can
# find the app by this label
labels:
app: your-app
role: service
# ...
---
apiVersion: v1
kind: Service
metadata:
name: your-app-headless
spec:
clusterIP: None
# The selector is just a way to filter the pods on which the app is deployed.
# It should match up with the labels specified above
selector:
app: your-app
role: service
After adding the headless service, we’re finally ready for libcluster. You may add the ClusterSupervisor
to the list of supervisors to be started with your app.
### In mix.exs
defp deps do
[
...
{:libcluster, "~> 3.1"},
...
]
end
### In config/prod.exs
config :libcluster,
topologies: [
your_app: [
strategy: Cluster.Strategy.Kubernetes.DNS,
config: [
service: "your-app-headless",
# The one as seen in node name yourapp@
application_name: "yourapp",
polling_interval: 10_000
]
]
]
### In lib/application.ex
def start(_type, _args) do
# List all child processes to be supervised
children =
[
[
{
Cluster.Supervisor,
[
Application.get_env(:libcluster, :topologies),
[name: Yourapp.ClusterSupervisor]
]
}
],
# Start the Ecto repository
Yourapp.Repo,
# Start the endpoint when the application starts
YourappWeb.Endpoint,
]
opts = [strategy: :one_for_one, name: Yourapp.Supervisor]
Supervisor.start_link(children, opts)
end
If everything goes well, your pods should already be automatically clustered when you start them. You can verify this by running Node.list()
in the console, similar to what we did above.
During development, you will not want to deploy your changes every time to an actual k8s cluster in order to validate them or run a local k8s cluster for that matter. A more lightweight approach would be to make libcluster work together with docker-compose
and form a local node cluster. We found Cluster.Strategy.Gossip
the easiest to set up for this purpose.
The following assumes the app is started with docker-compose
directly, without using Distillery releases.
First, we need to make sure that each Erlang node has a unique name, just as we did for the production environment. We will do it in our entrypoint script for docker-compose
:
# In Dockerfile:
ENTRYPOINT ["/opt/app/docker-entrypoint-dev.sh"]
# In docker-entrypoint-dev.sh:
if [ -z ${NODE_IP+x} ]; then
export NODE_IP="$(hostname -i | cut -f1 -d' ')"
fi
elixir --name yourapp@${NODE_IP} --cookie "your_dev_erlang_cookie" -S mix phx.server
Then, we need to scale the number of containers for our service to 2. You can easily do it by adding another service in your docker-compose.yml
file:
services:
your_app: &app
build:
context: ./app
ports:
- 4000:4000
# ...
your_app_2:
<<: *app
ports:
- 4001:4000
(Note: Another way to achieve the same is to use the --scale
flag of the docker-compose up
command.)
Finally, we just need to specify our clustering strategy correctly:
### In config/config.exs
config :libcluster,
topologies: [
your_app: [
strategy: Cluster.Strategy.Gossip,
config: [
port: 45892,
if_addr: "0.0.0.0",
multicast_addr: "230.1.1.251",
multicast_ttl: 1
]
]
]
By default, the port and the multicast address should already have been available. If not, you can check your docker-compose configurations.
By this point, the two local nodes should be able to automatically find and connect to each other whenever you start your app via docker-compose
.
After the foundation has been laid with libcluster, we may now move on to Swarm.
This sentence from the documentation is key to using Swarm to its maximum potential:
Swarm is intended to be used by registering processes before they are created, and letting Swarm start them for you on the proper node in the cluster.
Therefore, we found the example from the documentation, which shows Swarm being used together with a normal Supervisor
, to be slightly confusing: a normal Supervisor
must start with some initial child worker processes, which will not be managed by Swarm. DynamicSupervisor
seems to suit Swarm’s use case the most: we can start a DynamicSupervisor
without any children and ensure all child processes are registered with Swarm before they are dynamically started later.
We can write our DynamicSupervisor
module as such:
defmodule Yourapp.YourSupervisor do
use DynamicSupervisor
# See https://hexdocs.pm/elixir/Application.html
# for more information on OTP Applications
def start_link(state) do
DynamicSupervisor.start_link(__MODULE__, state, name: __MODULE__)
end
def init(_) do
DynamicSupervisor.init(strategy: :one_for_one)
end
def register(worker_name) do
DynamicSupervisor.start_child(__MODULE__, worker_name)
end
end
Note that in the init
function we didn’t need to provide any actual children to be started.
The register
function is a convenience function that needs to be provided to Swarm.register_name/4
whenever we want to start a worker process with Swarm. It simply calls start_child
and would return the pid of the started worker.
This is how you would dynamically start your worker process anywhere in your app:
Swarm.register_name(
:unique_name_for_this_worker_process,
Yourapp.YourSupervisor,
:register,
[Yourapp.YourWorker]
)
Finally, we come to the definition for the worker process itself. Below is a minimal working example which would simply restart a killed worker process on another node, without preserving its state:
defmodule Yourapp.YourWorker do
use GenServer
def start_link(state) do
GenServer.start_link(__MODULE__, state)
end
def init(opts) do
initial_state = initialize_worker(opts)
{:ok, initial_state}
end
def handle_call({:swarm, :begin_handoff}, _from, state) do
{:reply, :restart, state}
end
def handle_info({:swarm, :die}, state) do
{:stop, :shutdown, state}
end
defp initialize_worker(opts) do
# ...
end
end
Optionally, if you need to make use of the state handover functionality, you would need to make your worker more complicated with these additions:
# Change the handling of :begin_handoff
# This is triggered whenever a registered process is to be killed.
def handle_call({:swarm, :begin_handoff}, _from, current_state) do
{:reply, {:resume, produce_outgoing_state(current_state)}, current_state}
end
# Handle :end_handoff
# This is triggered whenever a process has been restarted on a new node.
def handle_call({:swarm, :end_handoff, incoming_state}, _from, current_state) do
{:noreply, end_handoff_new_state(current_state, incoming_state)}
end
Now, if you kill a node, you should see all the workers that were originally running on it automatically restarted on another node in the cluster.
Thanks for reading. We hope this walk-through has been of some help to you.