Skip to content

Nomad - cluster / node install

Plan topolgy

Before you’ll start, wait for a while and think.

  • Do you want to add node?
  • How big the cluster is?
  • what is the purpose of the node?
  • How far is the node from the cluster I want to extends?

Sizing

  • 1 or 3 nodes

    Each node have to be server and client

  • 2 nodes

    Don’t install it!!! Only allowed amount of servers in cluster is 1, 3, 4.......12 nodes

  • 4 and more nodes

    Set 3 servers and the rest will be clients

  • More than 11 nodes

    Split it in two clusters ;-)

Networking

  • only requirement from the nomad is low latency, so keep the servers in the same DC or at least in the same countinent
  • we use firewall for blocking everything from outside and allows everything between nodes and from our internal IPs, nomad does the rest.

Prepare servers

HDD mountpoints

/           50G
/home       the rest

In future we maybe use /home diferently!!!

OS

Our ansible is ready for debian 10 and 11, but it should run on Ubuntu, Mint, etc…

Packages

Python is ansible’s requirement

Create the new cluster

Whole cluster is manged by ansible, noone should modify anything different way. Lets say that we are creating new cluster called milky-way

Use latest release and template

  • be sure that using branch master in latest version
  • create new branch depend on cluster you will be manage
    git checkout -b milky-way_create_cluster
    
  • create new env from the template
    cp -a env/template env/milky-way
    

Edit parameters

  • env/milky-way/hosts - main file where changes should be done
  • env/milky-way/group_vars/all/firewall-cluster.yaml - update list of ips in the cluster
  • env/milky-way/group_vars/all/vault.yaml - update secrets with ansible-vault

the rest of files are simlinks because the variables are the same for all envs. In case you have to modify them, replace symlink by copy of the original file and than do your changes. Or put the change to hosts file with a comment why…

Run deploy

Deploy is pretty fast, takes less than 15 minutes.

ansible-play -i env/milky-way/hosts nomad-deploy.yml --diff

Manual steps after install

After the deployment the cluster is fully working and production ready. But for most cases we have to do a few manual steps…

Configure wildcard in DNS

In our case the record will looks this way

CNAME *.milky-way.easy2.cloud

The record is targeting to all nodes in this cluster

Configure ACL

Check if acls are present at the cluster

# check nomad acl policies
nomad acl policy list

If it does not contains developers, admins, … run command as follows

sh ~/acl/init-policy.sh

To create users token, you can use this for loop

for usr in `grep admins /etc/group | cut -d: -f4 | sed 's/,/\ /g'`; do 
    echo "Create token for user $usr";
    nomad acl token create -name=${usr} -policy=admins 
    echo ""
done

Do NOT run this script many times, it will create new tokens without checking everything !!!

Deploy traefic and nomad-proxy

At first you have to deploy traefik, because nomad-proxy uses it

  • Clone repository where we stores templates
    # clone repository
    git clone git@git.easy.cz:nomad/templates
    

You have to be able access nomad via nomad cli tool (you may run it from the first master node).

If you want to run it remotely you needs installed nomad binary localy and exported env variables:

  • NOMAD_TOKEN=
  • NOMAD_ADDR=

Deploy traefik

cd jobs/traefik-ssl/
nomad run deploy.nomad
  • check proxy “Not sure how”

Deploy nomad proxy

  • go to path with template

    cd templates/jobs/nomad-proxy/
    

  • Prepare new variable file

    # create variable file for new cluster
    cat > var_milky-way.tf<<EOF
    fqdn = nomad.milky-way.easy2.cloud
    dcs  = ["milky-way"]
    EOF
    

  • Deploy proxy to nomad

    nomad run -var-file=var_milky-way.tf deploy.nomad
    

  • check proxy and than commit new file to the repository

Deploy loki, prometheus and promtail

This is live in a different repository and it is a bit more complex. Please follow the readme in the repository bellow to complete this installation…

Restart everything

Now you have to restart whole cluster and check the state after reboot

Check the cluster

  • all the nodes should be up and operational
  • all the deployments should be up