I'll preface this by saying that these issues are on a Surface Tablet that I've been using to play around with, so I haven't been too diligent in documenting what changes were made when.
I've got a Surface Go 2 tablet with the LTE modem that I installed Linux Mint onto several months ago. When I first made the switch, cellular connectivity seemed very "touch and go" but Wi-Fi had been solid.
At some point in time (roughly 6 months ago), I switched my home network to using Control D for DNS resolution for about 2 months until I decided it wasn't what I wanted and went back to my default setup which is a Unifi UCG Max gateway using the AdGuard public DNS servers coupled with the built-in ad blocking of the Unifi gateway. This feeds to a separate Wi-Fi mesh network in my home.
About a month ago I noticed that I could no longer reach internet locations on my tablet when connected to my home Wi-Fi network, but I could still access other computers on my LAN just fine, so Wi-Fi was working. Cellular connectivity seemed to have stopped working entirely even though I ran the "lte_modem_fix" that is on github and was seeing several bars of connectivity in the status bar.
Even though websites were inaccessible (Firefox gave me an error saying there was no network connection), in my attempt to try anything I found that I could visit the Control D website even though I stopped subscribing months ago.
On a lark I pulled up my Mullvad VPN app which I have an active subscription to and it let me connect to a server. As soon as I did this, ALL internet sites became available.
Next I took the tablet with me away from home, disabled Wi-Fi and activated the cellular network. Again the bars appeared but I couldn't access any sites. I loaded up Mullvad and was able to connect, after which I could reliably connect to all internet sites. Again, cellular connectivity was never 100% but Wi-Fi was.
How do I even begin troubleshooting and fixing this? Needing a VPN isn't the end of the world, but when at home it gets in the way of accessing local computers so I'd like to get to where the tablet works on Wi-Fi or cellular, with and without a VPN active.
I feel like I had a problem very much like this with Debian Testing on my Surface Go 1 (and I think my desktop too) a couple years back, and it turned out there was issues with /etc/nsswitch.conf. I can't remember exactly what I did, but this is the current contents of that file:
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files systemd
group: files systemd
shadow: files
gshadow: files
hosts: files mdns4_minimal [NOTFOUND=RETURN] dns myhostname
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
Compare yours - maybe even post it so I can try to reproduce the issue on my machine. Anyhow, hope it helps, and good luck.
My nsswitch.conf file looks identical to yours, so nothing to edit there.
I also looked at my resolv.conf and systemd\resolved.conf files.
resolv.conf is a symlink, but is the only file with anything un-commented in the file:
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(
8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53
options edns0 trust-ad
search .
Ok, so does the VPN bring it's own DNS? Some VPNs do, so it may explain why everything suddenly works fine when you connect.
When not connected to VPN, are you able to dig or nslookup internet names? Local names? A server timeout will be very different from an nxdomain or an empty SOA, in the response.
Are you able to telnet to a public web server on TCP/443?
One thought I'm having is, maybe at some point you set a static IP on your wifi interface, but screwed up the subnetting.
Have you ever messed with network manager or systemd-resolved internal settings, maybe trying to setup multicast DNS or caching?
Yes I believe that Mullvad routes you to their DNS server so that explains why it works when connected to VPN. If I attempt an nslookup when NOT connected to VPN it fails and the server it attempts to contact is 127.0.0.53. When I connect to VPN the nslookup succeeds, and it uses the same server address.
I then disconnect from VPN and ping the ip address that I just looked up (I chose etsy) and the ping goes through so this seems to be a DNS lookup issue. Is 127.0.0.53 the right server address? I would expect it to use my DHCP server address of 192.168.x.x format.
Ok, so something setup 127.0.0.53 as your DNS server, and isn't removing it correctly. I think it's safe to say it's Mullvad, since it works using that DNS server IP when connected. Is that IP in your resolv.conf, or is resolv.conf maybe a stub, and you're using systemd-resolved?
Test the network from the lowest level if you haven't already, using ping and the IPv4 address of a common server (for instance, ping 8.8.8.8) to bypass DNS.
If it works, your DNS is borked.
If it doesn't, then there's something more fundamentally wrong with your network configuration—I'd guess it was an issue with the gateway IP address, which would mean it can't figure out how to get to the wider Internet, although it seems super-weird to have that happening with DHCP in the mix. Maybe you left some vestiges of your old configuration behind in a file that your admin GUI doesn't clean up and it's overriding DHCP, I don't know.
Thanks for the tip. If I bypass DNS it does appear to work so that's likely the problem. I need to figure out why now and I think it has something to do with a local DNS override of some sort.