Asking For Help: Kubernetes Pods Unable To Reach Each Other (Flannel, Networking)
Asking For Help: Kubernetes Pods Unable To Reach Each Other (Flannel, Networking)
I am not entirely sure whether this sub is also intended for asking questions, but after opening the question on stack overflow and only getting an AI answer, I thought it would be worth a shot to ask here. What follows is a rather long question, but most of it is just debugging information to avoid obvious questions.
I have created a small kubernetes cluster (6 nodes) using kubeadm and flannel as the CNI in an openstack project. This is my first time using more than a single node kubernetes cluster.
I set up the kubernetes cluster's master via
yaml
# tasks file for kubernetes_master - name: Install required packages apt: name: - curl - gnupg2 - software-properties-common - apt-transport-https - ca-certificates state: present update_cache: yes - name: Install Docker apt: name: docker.io state: present update_cache: yes - name: Remove Keyrings Directory (if it exists) ansible.builtin.shell: rm -rf /etc/apt/keyrings - name: Remove Existing Kubernetes Directory (if it exists) ansible.builtin.shell: sudo rm -rf /etc/apt/sources.list.d/pkgs_k8s_io_core_stable_v1_30_deb.list - name: Disable swap ansible.builtin.command: cmd: swapoff -a #- name: Ensure swap is disabled on boot # ansible.builtin.command: # cmd: sudo sed -i -e '/\/swap.img\s\+none\s\+swap\s\+sw\s\+0\s\+0/s/^/#/' /etc/fstab - name: Ensure all swap entries are disabled on boot ansible.builtin.command: cmd: sudo sed -i -e '/\s\+swap\s\+/s/^/#/' /etc/fstab - name: Add kernel modules for Containerd ansible.builtin.copy: dest: /etc/modules-load.d/containerd.conf content: | overlay br_netfilter - name: Load kernel modules for Containerd ansible.builtin.shell: cmd: modprobe overlay && modprobe br_netfilter become: true - name: Add kernel parameters for Kubernetes ansible.builtin.copy: dest: /etc/sysctl.d/kubernetes.conf content: | net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 - name: Load kernel parameter changes ansible.builtin.command: cmd: sudo sysctl --system - name: Configuring Containerd (building the configuration file) ansible.builtin.command: cmd: sudo sh -c "containerd config default > /opt/containerd/config.toml" - name: Configuring Containerd (Setting SystemdCgroup Variable to True) ansible.builtin.command: cmd: sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /opt/containerd/config.toml - name: Reload systemd configuration ansible.builtin.command: cmd: systemctl daemon-reload - name: Restart containerd service ansible.builtin.service: name: containerd state: restarted - name: Allow 6443/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 6443/tcp - name: Allow 2379:2380/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 2379:2380/tcp - name: Allow 22/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 22/tcp - name: Allow 8080/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 8080/tcp - name: Allow 10250/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 10250/tcp - name: Allow 10251/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 10251/tcp - name: Allow 10252/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 10252/tcp - name: Allow 10255/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 10255/tcp - name: Allow 5473/tcp through firewall ansible.builtin.command: cmd: sudo ufw allow 5473/tcp - name: Enable the firewall ansible.builtin.ufw: state: enabled - name: Reload the firewall ansible.builtin.command: cmd: sudo ufw reload - name: Prepare keyrings directory and update permissions file: path: /etc/apt/keyrings state: directory mode: '0755' - name: Download Kubernetes GPG key securely ansible.builtin.shell: curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg - name: Add Kubernetes repository ansible.builtin.apt_repository: repo: "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /" state: present - name: Install kubeadm, kubelet, kubectl ansible.builtin.apt: name: - kubelet - kubeadm - kubectl state: present update_cache: yes - name: Hold kubelet, kubeadm, kubectl packages ansible.builtin.command: cmd: sudo apt-mark hold kubelet kubeadm kubectl - name: Replace /etc/default/kubelet contents ansible.builtin.copy: dest: /etc/default/kubelet content: 'KUBELET_EXTRA_ARGS="--cgroup-driver=cgroupfs"' - name: Reload systemd configuration ansible.builtin.command: cmd: systemctl daemon-reload - name: Restart kubelet service ansible.builtin.service: name: kubelet state: restarted - name: Update System-Wide Profile for Kubernetes ansible.builtin.copy: dest: /etc/profile.d/kubernetes.sh content: | export KUBECONFIG=/etc/kubernetes/admin.conf export ANSIBLE_USER="sysadmin" # only works if not executing on master #- name: Reboot the system # ansible.builtin.reboot: # msg: "Reboot initiated by Ansible for Kubernetes setup" # reboot_timeout: 150 - name: Replace Docker daemon.json configuration ansible.builtin.copy: dest: /etc/docker/daemon.json content: | { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2" } - name: Reload systemd configuration ansible.builtin.command: cmd: systemctl daemon-reload - name: Restart Docker service ansible.builtin.service: name: docker state: restarted - name: Update Kubeadm Environment Variable ansible.builtin.command: cmd: sudo sed -i -e '/^\[Service\]/a Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"' /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf - name: Reload systemd configuration ansible.builtin.command: cmd: systemctl daemon-reload - name: Restart kubelet service ansible.builtin.service: name: kubelet state: restarted - name: Pull kubeadm container images ansible.builtin.command: cmd: sudo kubeadm config images pull - name: Initialize Kubernetes control plane ansible.builtin.command: cmd: kubeadm init --pod-network-cidr=10.244.0.0/16 creates: /tmp/kubeadm_output register: kubeadm_init_output become: true changed_when: false - name: Set permissions for Kubernetes Admin file: path: /etc/kubernetes/admin.conf state: file mode: '0755' - name: Store Kubernetes initialization output to file copy: content: "{{ kubeadm_init_output.stdout }}" dest: /tmp/kubeadm_output become: true delegate_to: localhost - name: Generate the Join Command ansible.builtin.shell: cat /tmp/kubeadm_output | tail -n 2 | sed ':a;N;$!ba;s/\\\n\s*/ /g' > /tmp/join-command delegate_to: localhost - name: Set permissions for the Join Executable file: path: /tmp/join-command state: file mode: '0755' delegate_to: localhost
manually reboot the node and installed flannel via kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
. Worker's are created in a similar way (without flannel). I omit their script for now but I can add it if it seems important.
I then had dns resolution issues with a helm chart which is why I tried to investigate network issues and noticed that instances are unable to ping each other.
I am unsure how to debug this issue further.
Debug Info
kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-0 Ready control-plane 4h38m v1.30.14 k8s-worker-0 Ready <none> 4h35m v1.30.14 k8s-worker-1 Ready <none> 4h35m v1.30.14 k8s-worker-2 Ready <none> 4h35m v1.30.14 k8s-worker-3 Ready <none> 4h35m v1.30.14 k8s-worker-4 Ready <none> 4h35m v1.30.14 k8s-worker-5 Ready <none> 4h34m v1.30.14
kube-flannel-ds-275hx 1/1 Running 0 150m 192.168.33.149 k8s-worker-0 <none> <none> kube-flannel-ds-2rplc 1/1 Running 0 150m 192.168.33.38 k8s-worker-5 <none> <none> kube-flannel-ds-2w98x 1/1 Running 0 150m 192.168.33.113 k8s-worker-1 <none> <none> kube-flannel-ds-g4vb6 1/1 Running 0 150m 192.168.33.167 k8s-worker-4 <none> <none> kube-flannel-ds-mpwbz 1/1 Running 0 150m 192.168.33.163 k8s-worker-2 <none> <none> kube-flannel-ds-qmbgc 1/1 Running 0 150m 192.168.33.117 k8s-master-0 <none> <none> kube-flannel-ds-sgdgs 1/1 Running 0 150m 192.168.33.243 k8s-worker-3 <none> <none>
ip addr show flannel.1 4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default link/ether a2:4a:11:1f:84:ef brd ff:ff:ff:ff:ff:ff inet 10.244.0.0/32 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::a04a:11ff:fe1f:84ef/64 scope link valid_lft forever preferred_lft forever
ip route default via 192.168.33.1 dev ens3 proto dhcp src 192.168.33.117 metric 100 10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink 10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink 10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink 10.244.6.0/24 via 10.244.6.0 dev flannel.1 onlink 169.254.169.254 via 192.168.33.3 dev ens3 proto dhcp src 192.168.33.117 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 192.168.33.0/24 dev ens3 proto kernel scope link src 192.168.33.117 metric 100 192.168.33.1 dev ens3 proto dhcp scope link src 192.168.33.117 metric 100 192.168.33.2 dev ens3 proto dhcp scope link src 192.168.33.117 metric 100 192.168.33.3 dev ens3 proto dhcp scope link src 192.168.33.117 metric 100 192.168.33.4 dev ens3 proto dhcp scope link src 192.168.33.117 metric 100
kubectl run -it --rm dnsutils --image=busybox:1.28 --restart=Never -- nslookup kubernetes.default If you don't see a command prompt, try pressing enter. Address 1: 10.96.0.10 nslookup: can't resolve 'kubernetes.default' pod "dnsutils" deleted pod default/dnsutils terminated (Error)
kubectl get pods -n kube-system -l k8s-app=kube-dns NAME READY STATUS RESTARTS AGE coredns-55cb58b774-6vb7p 1/1 Running 1 (4h19m ago) 4h38m coredns-55cb58b774-wtrz6 1/1 Running 1 (4h19m ago) 4h38m
Ping Test
ubuntu@k8s-master-0:~$ kubectl run pod1 --image=busybox:1.28 --restart=Never --command -- sleep 3600 pod/pod1 created ubuntu@k8s-master-0:~$ kubectl run pod2 --image=busybox:1.28 --restart=Never --command -- sleep 3600 pod/pod2 created
ubuntu@k8s-master-0:~$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 15m 10.244.5.2 k8s-worker-1 <none> <none> pod2 1/1 Running 0 15m 10.244.4.2 k8s-worker-3 <none> <none> ubuntu@k8s-master-0:~$ kubectl exec -it pod1 -- sh / # ping 10.244.5.2 PING 10.244.5.2 (10.244.5.2): 56 data bytes 64 bytes from 10.244.5.2: seq=0 ttl=64 time=0.107 ms 64 bytes from 10.244.5.2: seq=1 ttl=64 time=0.091 ms 64 bytes from 10.244.5.2: seq=2 ttl=64 time=0.090 ms ^C --- 10.244.5.2 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.090/0.096/0.107 ms / # 10.244.4.2 sh: 10.244.4.2: not found / # ping 10.244.4.2 PING 10.244.4.2 (10.244.4.2): 56 data bytes ^C --- 10.244.4.2 ping statistics --- 2 packets transmitted, 0 packets received, 100% packet loss / # exit command terminated with exit code 1
If I understand flannel correctly, it is fine that the pods are in other subnets as the ip routes manage the forwarding.