Troubleshooting DNS on Kubernetes with NSX-T

After integrating NSX-T with K8S I sometimes get issues with coredns not working.

Common root cause: K8S internal DNS infrastructure needs non-NAT’ed network access from container PODs to K8S Nodes and vice versa. As NSX-T NCP default behaviour is to NAT your K8S Namespaces this can – depending on you overall architecture – cause connection issues.

In general the issue described here only occurs when your K8S Overlay networks are connected to a different Distributed Router as the management Ports of the K8S Nodes [see 2.4].

1. Short Version

Ensure no Distributed Firewall rule is blocking DNS traffic 😉

try to ping your K8S Nodes (management port) out of a container pod
try to ping your K8S container pod from your K8S Nodes

Both ways must work. If not, identify the Gateway doing NAT for your K8S environment and implement a NO-SNAT rule between your Container- Network (Source) and your K8S Node Management network (Destination) prior to all other SNAT rules.

2. Detailed analysis

2.1 Issue

for troubleshooting you can apply the dnsutils pod

vm@k8s-master:~$ kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml pod/dnsutils created

and check if its running:

vm@k8s-master:~$ kubectl get pods dnsutils NAME READY STATUS RESTARTS AGE dnsutils 1/1 Running 0 [time]

verify internal name resolution fails

vm@k8s-master:~$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default ;; connection timed out; no servers could be reached

2.2 coredns Deployment / kube-dns Service

kube-dns service exists

vm@k8s-master:~$ kubectl get service -nkube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP [time]

but no endpoints are shown

vm@k8s-master:~$ kubectl get endpoints kube-dns -nkube-system NAME ENDPOINTS AGE kube-dns [time]

coredns PODs are running, but not ready

vm@k8s-master:~$ kubectl get pods -nkube-system -o wide -lk8s-app=kube-dns NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-5d999964c5-bpnbj 0/1 Running 0 [time] 10.0.1.34 k8s-master coredns-5d999964c5-kwjbg 0/1 Running 0 [time] 10.0.1.37 k8s-node1

ping to K8S Node Management IPs from dnsutils pod will fail

vm@k8s-master:~$ kubectl exec -i -t dnsutils -- ping 172.16.30.11 -c4 PING 172.16.30.11 (172.16.30.11): 56 data bytes

--- 172.16.30.11 ping statistics --- 4 packets transmitted, 0 packets received, 100% packet loss command terminated with exit code 1

2.3 Check NSX-T Distributed Firewall DFW

Use NSX-T Traceflow to check if DFW might block DNS traffic from your PODs to coredns PODs. Access to UDP/TCP port 53 to all coredns PODs should work. As the previous ping also failed you should ensure its not getting blocked by DFW…

NSX-T Traceflow dnsutils pod UDP/5000 to coredns pod UDP/53

Traceflow looks awesome 🙂 DFW isn’t blocking traffic.

2.4 K8S / NSX-T deployment options

There are different options on how K8S Pods / Namespaces and K8S Node Management ports can be connected. This impacts where NAT of traffic coming from Container PODs is done. Whenever there is a NAT boundary between Container PODs and K8S Node Management ports this DNS issue could happen. So Figure B and Figure D should not be affected.

Figure A shared T1, Node Management on T0

Figure B shared T1, Node Management on same T1

Figure C shared T1, Node Management out of NSX

Figure D per-namspace T1 linked to T0, Node Management on T0

Figure E per-namespace T1 linked to T0, Node Management out of NSX

2.5 Resolution

Identify the Gateway responsible for NAT of you K8S environment and create a NO-SNAT rule between your Container – Network (Source) and your K8S Node Management network (Destination) prior to all other SNAT rules. In my K8S / NSX-T Post Container Network was refered as K8S-Container-Network

restart coredns deployment

vm@k8s-master:~$ kubectl rollout restart deployment coredns -nkube-system deployment.apps/coredns restarted

coredns pods should come up and be ready

vm@k8s-master:~$ kubectl get pods -nkube-system -o wide -lk8s-app=kube-dns NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-78d4fb7fff-99h8c 1/1 Running 0 98s 10.0.1.35 k8s-node2 coredns-78d4fb7fff-v6n5r 1/1 Running 0 98s 10.0.1.38 k8s-node1

kube-dns endpoints should exist now

vm@k8s-master:~$ kubectl get endpoints kube-dns -nkube-system NAME ENDPOINTS AGE kube-dns 10.0.1.35:53,10.0.1.38:53,10.0.1.35:9153 + 3 more… [time]

K8S internal DNS should work now. Forward:

vm@k8s-master:~$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default Server: 10.96.0.10 Address: 10.96.0.10#53

Name: kubernetes.default.svc.cluster.local Address: 10.96.0.1

and reverse:

vm@k8s-master:~$ kubectl exec -i -t dnsutils -- nslookup 10.96.0.10 10.0.96.10.in-addr.arpa name = kube-dns.kube-system.svc.cluster.local.