Use context: kubectl config use-context k8s-c3-CCC
There seems to be an issue with the kubelet not running on cluster3-node1
. Fix it and confirm that cluster has node cluster3-node1
available in Ready state afterwards. You should be able to schedule a Pod on cluster3-node1
afterwards.
Write the reason of the issue into /opt/course/18/reason.txt
.
Troubleshooting a Non-Responsive Kubernetes Node: A Step-by-Step Guide
In a Kubernetes cluster, ensuring that all nodes are operational is crucial for the stability and performance of your applications. When a node becomes unresponsive or enters a “NotReady” state, it can cause disruptions. In this guide, we’ll walk through the process of troubleshooting a non-responsive node, identifying issues with the kubelet service, and resolving them.
Step 1: Checking Node Status
The first step in troubleshooting a non-responsive node is to check its status using kubectl get nodes
:
1 2 3 |
kubectl get node |
Example output:
1 2 3 4 5 6 7 |
NAME STATUS ROLES AGE VERSION cluster3-controlplane1 Ready control-plane 14d v1.30.1 cluster3-node1 NotReady <none> 14d v1.30.1 |
Here, we see that cluster3-node1
is in the NotReady state, indicating that the node is unresponsive.
Step 2: Checking the Kubelet Service
The kubelet is the primary “node agent” that runs on each node in the cluster. If the kubelet service is not running, the node will not be able to communicate with the control plane. First, SSH into the problematic node and check if the kubelet is running:
1 2 3 4 5 |
ssh cluster3-node1 ps aux | grep kubelet |
If the kubelet is not running, check its status using systemd:
1 2 3 |
service kubelet status |
Example output:
1 2 3 4 5 6 7 8 9 10 |
● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: inactive (dead) (Result: exit-code) since Thu 2024-01-04 13:12:54 UTC; 1h 23min ago Docs: https://kubernetes.io/docs/ |
In this case, the kubelet service is inactive, and we need to restart it:
1 2 3 |
service kubelet start |
Step 3: Analyzing Kubelet Startup Issues
If the kubelet fails to start, check the output of the service status command for errors. One common issue is a misconfigured path to the kubelet binary. You can manually attempt to run the kubelet binary to verify its location:
1 2 3 4 5 6 7 |
/usr/local/bin/kubelet -bash: /usr/local/bin/kubelet: No such file or directory whereis kubelet kubelet: /usr/bin/kubelet |
In this case, the kubelet binary was incorrectly specified as /usr/local/bin/kubelet
, but the correct path is /usr/bin/kubelet
.
Step 4: Correcting the Kubelet Service Configuration
To fix the issue, edit the kubelet service configuration file and correct the path to the kubelet binary:
1 2 3 |
vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf |
After updating the path, reload the systemd daemon and restart the kubelet service:
1 2 3 4 5 6 |
systemctl daemon-reload service kubelet restart service kubelet status # Check if it's now running |
The kubelet should now be running correctly, and the node should return to a Ready state after a few moments.
Step 5: Verifying Node Status
After fixing the kubelet service, check the status of the node again to ensure it’s back to normal:
1 2 3 |
kubectl get node |
Expected output:
1 2 3 4 5 6 7 |
NAME STATUS ROLES AGE VERSION cluster3-controlplane1 Ready control-plane 14d v1.30.1 cluster3-node1 Ready <none> 14d v1.30.1 |
The node cluster3-node1
should now be in the Ready state, indicating that it has successfully rejoined the cluster.
Step 6: Documenting the Issue
Finally, it’s important to document the cause of the issue and the steps taken to resolve it. This information can be valuable for future reference or for other team members:
1 2 3 4 |
# /opt/course/18/reason.txt wrong path to kubelet binary specified in service config |
Conclusion
In this guide, we’ve walked through troubleshooting a non-responsive Kubernetes node by checking the kubelet service, identifying a misconfiguration, and resolving the issue. Ensuring that all nodes are in a Ready state is essential for maintaining the health and performance of your Kubernetes cluster.