The documentation is under development and may contain incomplete information.
Adding a static node to a cluster
You can add a static node manually or using the Cluster API Provider Static (CAPS).
Adding a static node manually
To add a bare-metal server to a cluster as a static node, follow these steps:
-
Use the existing NodeGroup custom resource or create a new one, setting the
Static
orCloudStatic
value for thenodeType
parameter.Example of a NodeGroup resource named
worker
:apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker spec: nodeType: Static
-
Get the Base64-encoded script code to add and configure the node.
Example command to get the Base64-encoded script code to add a node to the
worker
NodeGroup:NODE_GROUP=worker kubectl -n d8-cloud-instance-manager get secret manual-bootstrap-for-${NODE_GROUP} -o json | jq '.data."bootstrap.sh"' -r
- Pre-configure the new node according to your environment specifics:
- Add all the necessary mount points to the
/etc/fstab
file: NFS, Ceph, etc. - Install the necessary packages (for example,
ceph-common
). - Set up network connectivity between the new node and the other nodes of the cluster.
- Add all the necessary mount points to the
-
Connect to the new node over SSH and run the following command, inserting the Base64 string you got in step 2:
echo <Base64-CODE> | base64 -d | bash
Adding a static node using CAPS
To learn more about Cluster API Provider Static (CAPS), refer to Configuring a node via CAPS.
Example of adding a static node to a cluster using CAPS:
To add a static node to a cluster (bare metal server or virtual machine), follow these steps:
-
Allocate a server with an installed operating system (OS) and set up network connectivity. If necessary, install additional OS-specific packages and add mount points to use on the node.
-
Create a user (named
caps
in the following example) capable of runningsudo
by running the following command on the server:useradd -m -s /bin/bash caps usermod -aG sudo caps
-
Allow the user to run
sudo
commands without having to enter a password. To do that, add the following line to thesudo
configuration on the server by either editing the/etc/sudoers
file, running thesudo visudo
command, or via any other method:caps ALL=(ALL) NOPASSWD: ALL
-
Generate a pair of SSH keys with an empty passphrase on the server using the following command:
ssh-keygen -t rsa -f caps-id -C "" -N ""
The public and private keys of the
caps
user will be stored in thecaps-id.pub
andcaps-id
files in the current directory on the server. -
Add the generated public key to the
/home/caps/.ssh/authorized_keys
file of thecaps
user by running the following commands in the directory storing the keys on the server:mkdir -p /home/caps/.ssh cat caps-id.pub >> /home/caps/.ssh/authorized_keys chmod 700 /home/caps/.ssh chmod 600 /home/caps/.ssh/authorized_keys chown -R caps:caps /home/caps/
-
-
Create a SSHCredentials resource in the cluster.
-
To access the added server, CAPS requires the private key of the service user
caps
. The Base64-encoded key is added to the SSHCredentials resource.To encode the private key to Base64, run the following command in the user key directory on the server:
base64 -w0 caps-id
-
On any computer configured to manage the cluster, create an environment variable with the value of the Base64-encoded private key you generated earlier. To prevent the key from saving in the shell history, add a whitespace character at the beginning of the command:
CAPS_PRIVATE_KEY_BASE64=<PRIVATE_KEY_IN_BASE64>
-
Create a SSHCredentials resource with the service user name and associated private key:
d8 k create -f - <<EOF apiVersion: deckhouse.io/v1alpha1 kind: SSHCredentials metadata: name: static-0-access spec: user: caps privateSSHKey: "${CAPS_PRIVATE_KEY_BASE64}" EOF
-
-
Create a StaticInstance resource in the cluster.
The StaticInstance resource defines the IP address of the static node server and the data required to access the server:
d8 k create -f - <<EOF apiVersion: deckhouse.io/v1alpha1 kind: StaticInstance metadata: name: static-0 spec: # Specify the static node server's IP address. address: "<SERVER-IP>" credentialsRef: kind: SSHCredentials name: static-0-access EOF
-
Create a NodeGroup resource in the cluster:
d8 k create -f - <<EOF apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker spec: nodeType: Static staticInstances: count: 1 EOF
-
Wait until the NodeGroup resource is in the
Ready
state. To check the resource state, run the following command:d8 k get ng worker
In the NodeGroup state, 1 node should appear in the
READY
column:NAME TYPE READY NODES UPTODATE INSTANCES DESIRED MIN MAX STANDBY STATUS AGE SYNCED worker Static 1 1 1 15m True
Adding a static node using CAPS and label selector filters
To connect different StaticInstance resources to different NodeGroup resources, you can use the label selector specified in the NodeGroup and StaticInstance metadata.
In the following example, you can see how three static nodes are distributed between two NodeGroup resources:
one into the worker
group, and two others into the front
group.
- Prepare the required resources (three servers) and create the SSHCredentials resources for them, following steps 1 and 2 from the previous scenario.
-
Create two NodeGroup in the cluster.
Specify
labelSelector
, so that only the corresponding servers could connect to the NodeGroup resources:d8 k create -f - <<EOF apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: front spec: nodeType: Static staticInstances: count: 2 labelSelector: matchLabels: role: front --- apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker spec: nodeType: Static staticInstances: count: 1 labelSelector: matchLabels: role: worker EOF
-
Create the StaticInstance resources in the cluster.
Specify the actual IP addresses of the servers and set the
role
label in metadata:d8 k create -f - <<EOF apiVersion: deckhouse.io/v1alpha1 kind: StaticInstance metadata: name: static-front-1 labels: role: front spec: address: "<SERVER-FRONT-IP1>" credentialsRef: kind: SSHCredentials name: front-1-credentials --- apiVersion: deckhouse.io/v1alpha1 kind: StaticInstance metadata: name: static-front-2 labels: role: front spec: address: "<SERVER-FRONT-IP2>" credentialsRef: kind: SSHCredentials name: front-2-credentials --- apiVersion: deckhouse.io/v1alpha1 kind: StaticInstance metadata: name: static-worker-1 labels: role: worker spec: address: "<SERVER-WORKER-IP>" credentialsRef: kind: SSHCredentials name: worker-1-credentials EOF
-
To check the result, run the following command:
d8 k get ng
In the output, you should see a list of created NodeGroup resources, with static nodes distributed between them:
NAME TYPE READY NODES UPTODATE INSTANCES DESIRED MIN MAX STANDBY STATUS AGE SYNCED master Static 1 1 1 1h True front Static 2 2 2 1h True
How do I know if something went wrong?
If a node in a NodeGroup isn’t updated
(theUPTODATE
value is less than the NODES
value when executing the kubectl get nodegroup
command)
or you assume there are other problems that may be related to the node-manager
module,
check the logs of the bashible
service. The bashible
service runs on each node managed by the node-manager
module.
To view the logs of the bashible
service, run the following command on the node:
journalctl -fu bashible
Example of output when the bashible
service has performed all necessary actions:
May 25 04:39:16 kube-master-0 systemd[1]: Started Bashible service.
May 25 04:39:16 kube-master-0 bashible.sh[1976339]: Configuration is in sync, nothing to do.
May 25 04:39:16 kube-master-0 systemd[1]: bashible.service: Succeeded.
Removing a node from a cluster
The procedure is valid for both a manually configured node (using the bootstrap script) and a node configured using CAPS.
To disconnect a node from a cluster and clean up the server (VM), run the following command on the node:
bash /var/lib/bashible/cleanup_static_node.sh --yes-i-am-sane-and-i-understand-what-i-am-doing
How do I clean up a node for adding to another cluster?
This is only necessary if you need to move a static node from one cluster to another. Note that these operations result in removing data from the local storage. If you only need to change a NodeGroup, follow the NodeGroup changing procedure instead.
If the node you are cleaning up has the LINSTOR/DRBD storage pools,
to evict resources from the node and remove the LINSTOR/DRBD node,
use the corresponding procedure for the sds-replicated-volume
module.
To clean up a node for adding to another cluster, follow these steps:
-
Remove the node from the Kubernetes cluster:
kubectl drain <node> --ignore-daemonsets --delete-local-data kubectl delete node <node>
-
Run the clean-up script on the node:
bash /var/lib/bashible/cleanup_static_node.sh --yes-i-am-sane-and-i-understand-what-i-am-doing
-
After the node is restarted, it can be added to another cluster.
FAQ
Can I delete a StaticInstance?
A StaticInstance in the Pending
state can be deleted safely.
To delete a StaticInstance in any state other than Pending
, such as Running
, Cleaning
, or Bootstrapping
,
do the following:
- Add the label
"node.deckhouse.io/allow-bootstrap": "false"
to the StaticInstance. - Wait until the StaticInstance status is changed to
Pending
. - Delete the StaticInstance.
- Decrease the
NodeGroup.spec.staticInstances.count
parameter’s value by 1. - Wait until the NodeGroup is in the
Ready
state.
How do I change the IP address of a StaticInstance?
You cannot change the IP address in the StaticInstance resource. If an incorrect address is specified in the StaticInstance, you have to delete the StaticInstance and create a new one.
How do I migrate a manually configured static node under CAPS control?
You need to clean up the node and then hand it over under CAPS control.
How do I change the NodeGroup of a static node?
If a node is under CAPS control, you can’t change the NodeGroup membership of such a node. The only way is to delete a StaticInstance and create a new one.
To switch an existing manually added static node to another NodeGroup, change its group label and delete its role label using the following commands:
kubectl label node --overwrite <node_name> node.deckhouse.io/group=<new_node_group_name>
kubectl label node <node_name> node-role.kubernetes.io/<old_node_group_name>-
Applying the changes will take some time.
How do I know what is running on a node while it is being created?
To find out what’s happening on a node (for example, when it’s taking too long to create or stuck in the Pending
state),
you can check the cloud-init
logs.
To do that, follow these steps:
-
Find the node that is currently bootstrapping:
kubectl get instances | grep Pending
An output example:
dev-worker-2a6158ff-6764d-nrtbj Pending 46s
-
Get information about connection parameters for viewing logs:
kubectl get instances dev-worker-2a6158ff-6764d-nrtbj -o yaml | grep 'bootstrapStatus' -B0 -A2
An output example:
bootstrapStatus: description: Use 'nc 192.168.199.178 8000' to get bootstrap logs. logsEndpoint: 192.168.199.178:8000
-
To view the
cloud-init
logs for diagnostics, run the command you got (nc 192.168.199.178 8000
according to the example above).The logs of the initial node configuration are located at
/var/log/cloud-init-output.log
.