Linux DNS stopped working

The newest Linux/Ubuntu 22.04+ DNS resolve stack is a fucking spaghetti of resolvconf, systemd-resolved, Network Manager and 34724 other cryptic shit. In the good old days you would add nameserver to /etc/resolv.conf and be done with it. Now you can’t even edit the file since it’s a symlink to ../run/resolvconf/resolv.conf with nameserver 127.0.0.53 which is a DNS server running locally on your machine by systemd-resolved.

If you see something like this, read on:

$ ping www.google.com
ping: www.google.com: Name or service not known
$ host -v www.google.com
Trying "www.google.com"
;; connection timed out; no servers could be reached
$ host -v www.google.com 127.0.0.53
Trying "www.google.com"
;; connection timed out; no servers could be reached
$ host -v www.google.com 8.8.8.8
Trying "www.google.com"
;; now works

How DNS resolution work in Linux as of 2024

As a rule of thumb, most programs (ping, ssh, firefox) use the library calls getaddrinfo or gethostbyname provided by glibc. glibc then employs GNU Name Service Switch (NSS) to perform a lookup. NSS is configured via a file called /etc/nsswitch.conf to determine the lookup priority and policy of how to perform different lookups.

The important line from that file is this one:

hosts:          files mdns4_minimal [NOTFOUND=return] dns mymachines

See man page for nsswitch.conf. The rule says:

  1. files: Consult a file for known host names, which is /etc/hosts.
  2. If nothing is found: try mdns4_minimal which calls nss-mdns to perform a .local mDNS lookup (more on this later)
  3. [NOTFOUND=return] says: if mdns4_minimal was not able to resolve .local domain, stop the lookup: no point resolving .local in a DNS server since DNS servers don’t contain such records. However, if mdns4_minimal is not able to resolve www.google.com this rule is ignored and next approach is tried out.
  4. dns: ask a DNS server mentioned in /etc/resolv.conf. Nowadays it’s 127.0.0.53:53 which goes to local systemd-resolved.

Some programs are designed to specifically perform DNS-server requests only; those are host, dig and resolvectl query. They skip the NSS configuration and go directly to the DNS, which these days is systemd-resolved. They usually fail to resolve .local stuff since DNS is not responsible for holding .local records. That’s why host can fail to resolve machines on your LAN while ping can. But more on this later.

systemd-resolved

Since /etc/resolv.conf isn’t powerful enough to handle all sorts of crazy scenarios like VPNs, different network interfaces having different DNS servers, VMs, a different solution was designed: systemd-resolved. systemd-resolved handles DNS resolution in modern Linux distros. It exposes a local (your-machine-only) DNS server on 127.0.0.53:53; /etc/resolv.conf then contains nameserver 127.0.0.53 which tells all Linux commands to resolve DNS via systemd-resolved.

To learn of the systemd-resolved DNS routing scheme, type in the following:

$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp2s0f0)
Current Scopes: none
     Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 5 (wlp3s0)
    Current Scopes: DNS
         Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.0.1
       DNS Servers: 192.168.0.1

Link 6 (virbr0)
Current Scopes: none
     Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 7 (docker0)
Current Scopes: none

Since DNS is NOT responsible for .local resolution, generally you want -mDNS and -LLMNR (those services disabled). They’re disabled by default.

However systemd-resolved should resolve all other DNS names. You can test this out by using the following commands:

$ host www.google.com
$ host www.google.com 8.8.8.8
$ dig www.google.com
$ resolvectl query www.google.com

Additional DNS servers (e.g. local dnsmasq)

You can force systemd-resolved to use additional DNS servers you need (e.g. a local dnsmasq server). Edit /etc/systemd/resolved.conf and set DNS=127.0.0.1 (or other IP; see resolved.conf man pages for more info). Then, restart systemd-resolved and check that your DNS server is now in effect:

$ sudo systemctl restart systemd-resolved
$ resolvectl status

systemd-resolved will now try your custom DNS server first, but it will fall back to whatever it used before, since additional DNS servers are coming from your link (via dhclient or network manager). This is exactly what we want: first try my custom DNS, but then fallback to a DNS that can resolve everything, so that my network resolution continues to work correctly.

Troubleshooting

Print DNS servers for all links via resolvectl status. Make sure you can ping the DNS servers by their IP address (that they are actually accessible from your client machine).

You can view its logs and/or status, maybe they will reveal something of interest:

$ journalctl -u systemd-resolved.service
$ systemctl status systemd-resolved.service

Alternatively, try to restart the systemd-resoved service:

$ sudo systemctl restart systemd-resolved.service
$ resolvectl status

wireguard + Ubuntu 22.10

If you run wg-quick up mywg and suddenly DNS stops working (while it worked on Ubuntu 22.04), the reason is that resolvectl is now configured a bit differently when wireguard conf file contains a DNS entry.

  • On Ubuntu 22.04, any wireguard DNS entry would go into the Global section of resolvectl status, and your wireguard link’s Current Scopes would read none
  • On Ubuntu 22.10, the wireguard DNS entry goes into your wireguard link, sets the Current Scopes to DNS and adds DNS Domain: ~. which somehow breaks your DNS.

Workaround: comment out the DNS entry in your wireguard conf file.

Local/mDNS/LLMNR

The .local top-level-domain (TLD) is reserved to be used for your LAN. Every machine has its own hostname, and if configured properly, it is accessible on your LAN by the name of hostname.local.

There are two competing standards, mDNS/multicast DNS and LLMNR; both use the same principle of listening to broadcasts. mDNS is primarily used by Linux and Apple, LLMNR is used primarily by Windows; I’ll focus on mDNS.

mDNS performs lookup by broadcasting on UDP 224.0.0.251:5353 which sends the query to all machines on your LAN. All mDNS-capable server machines on LAN use avahi-daemon to listen on UDP port 5353 and reply to the original sender with its IP address if the lookup targets that particular machine.

mDNS Resolvers

In modern Linux desktops, you have multiple mDNS resolvers:

  1. nss-mdns plugs as GNU Name Service Switch (NSS) into glibc and resolves mDNS via avahi-daemon (see StackExchange). You can test this resolver via avahi-resolve --name foo.local. All linux programs including ping and ssh uses this one.
  2. host, dig and resolvectl query uses systemd-resolved to resolve mDNS.

Since ping uses avahi-daemon, it’s possible that ping is able to resolve foo.local stuff while host and resolvectl query can’t, since they go through systemd-resolved.

Remember that even though systemd-resolved is technically capable of performing mDNS resolution, it really shouldn’t. It’s the right configuration to have mDNS turned off in systemd-resolved.

Enabling nss-mdns

Enabling this one is more important than systemd-resolved since all Linux programs except DNS clients such as host and dig use this method.

See nss-mdns: Activation. In order to activate this nss module you need to edit /etc/nsswitch.conf and make sure mdns4 or mdns4_minimal are included in the hosts: line. Also make sure libnss-mdns is installed: sudo apt install libnss-mdns.

Your resolvectl query foo.local/host/dig will fail to resolve the local server by default. That’s recommended, but maybe in your use-case you may need to enable mDNS. First, run $ resolvectl status and check whether mDNS is enabled. If not, see ArchLinux Wiki on enabling mDNS:

  1. sudo vim /etc/systemd/resolved.conf and enable MulticastDNS in [Resolve].
  2. Restart systemd-resolved: sudo systemctl restart systemd-resolved.service

mDNS from a virtual machine

The broadcast may not traverse more complex network setups which pretty much include VM networks. One solution is to run the VM in the bridge networking mode, but beware: this exposes your VM to all machines on your LAN. For example:

  • Linux-on-Linux KVM with virtio and NAT: the guest can only resolve the host but not any other machine on the network.
  • Linux-on-Mac UTM with virtio-net-pci and Shared Network: the same situation, the guest can only resolve the host but not any other machine on the network. See UTM Network docs for more details.
    • Switching to bridged mode solved this: it exposed the VM as if running on the LAN along with the host MacBook, and mDNS lookups started to pick up other machines on the LAN. The VM name was also resolvable from other LAN machines and was even pingable.

Another way could be NAT traversal but I have no idea whether mDNS can be configured that way.

Further Down The Rabbit Hole

I haven’t touched VPNs and other more complex stuff, please see the gnome.org blogpost on understanding DNS for more details.

Written on October 15, 2022