Nomad - cluster / node install
Plan topolgy
Before you’ll start, wait for a while and think.
- Do you want to add node?
- How big the cluster is?
- what is the purpose of the node?
- How far is the node from the cluster I want to extends?
Sizing
-
1 or 3 nodes
Each node have to be server and client
-
2 nodes
Don’t install it!!! Only allowed amount of servers in cluster is 1, 3, 4.......12 nodes
-
4 and more nodes
Set 3 servers and the rest will be clients
-
More than 11 nodes
Split it in two clusters ;-)
Networking
- only requirement from the nomad is low latency, so keep the servers in the same DC or at least in the same countinent
- we use firewall for blocking everything from outside and allows everything between nodes and from our internal IPs, nomad does the rest.
Prepare servers
HDD mountpoints
/ 50G
/home the rest
In future we maybe use /home diferently!!!
OS
Our ansible is ready for debian 10 and 11, but it should run on Ubuntu, Mint, etc…
Packages
Python is ansible’s requirement
Create the new cluster
Whole cluster is manged by ansible, noone should modify anything different way. Lets say that we are creating new cluster called milky-way…
Use latest release and template
- be sure that using branch master in latest version
- create new branch depend on cluster you will be manage
git checkout -b milky-way_create_cluster - create new env from the template
cp -a env/template env/milky-way
Edit parameters
- env/milky-way/hosts - main file where changes should be done
- env/milky-way/group_vars/all/firewall-cluster.yaml - update list of ips in the cluster
- env/milky-way/group_vars/all/vault.yaml - update secrets with ansible-vault
the rest of files are simlinks because the variables are the same for all envs. In case you have to modify them, replace symlink by copy of the original file and than do your changes. Or put the change to hosts file with a comment why…
Run deploy
Deploy is pretty fast, takes less than 15 minutes.
ansible-play -i env/milky-way/hosts nomad-deploy.yml --diff
Manual steps after install
After the deployment the cluster is fully working and production ready. But for most cases we have to do a few manual steps…
Configure wildcard in DNS
In our case the record will looks this way
CNAME *.milky-way.easy2.cloud
The record is targeting to all nodes in this cluster
Configure ACL
Check if acls are present at the cluster
# check nomad acl policies
nomad acl policy list
If it does not contains developers, admins, … run command as follows
sh ~/acl/init-policy.sh
To create users token, you can use this for loop
for usr in `grep admins /etc/group | cut -d: -f4 | sed 's/,/\ /g'`; do
echo "Create token for user $usr";
nomad acl token create -name=${usr} -policy=admins
echo ""
done
Do NOT run this script many times, it will create new tokens without checking everything !!!
Deploy traefic and nomad-proxy
At first you have to deploy traefik, because nomad-proxy uses it
- Clone repository where we stores templates
# clone repository git clone git@git.easy.cz:nomad/templates
You have to be able access nomad via nomad cli tool (you may run it from the first master node).
If you want to run it remotely you needs installed nomad binary localy and exported env variables:
- NOMAD_TOKEN=
- NOMAD_ADDR=
Deploy traefik
cd jobs/traefik-ssl/
nomad run deploy.nomad
- check proxy “Not sure how”
Deploy nomad proxy
-
go to path with template
cd templates/jobs/nomad-proxy/ -
Prepare new variable file
# create variable file for new cluster cat > var_milky-way.tf<<EOF fqdn = nomad.milky-way.easy2.cloud dcs = ["milky-way"] EOF -
Deploy proxy to nomad
nomad run -var-file=var_milky-way.tf deploy.nomad -
check proxy and than commit new file to the repository
Deploy loki, prometheus and promtail
This is live in a different repository and it is a bit more complex. Please follow the readme in the repository bellow to complete this installation…
Restart everything
Now you have to restart whole cluster and check the state after reboot
Check the cluster
- all the nodes should be up and operational
- all the deployments should be up
- …