1mo ago

Reverse proxy without a single point of failure

I'm thinking of expanding my homelab to support running some paid SaaS projects out of my house, and so I need to start thinking about uptime guarantees.

I want to set up a cluster where every service lives on at least two machines, so that no single machine dying can take a service down. The problem is the reverse proxy: the router still has to point port 443 at a single fixed IP address running Caddy, and that machine will always be a single point of failure. How would I go about running two or more reverse proxy servers with failover?

I'm guessing the answer has something to do with the router, and possibly getting a more advanced router or running an actual OS on the router that can handle failover. But at that point the router is a single point of failure! And yes, that's unavoidable... but I'm reasonably confident that the unmodified commodity router I've used for years is unlikely to spontaneously die, whereas I've had very bad luck with cheap fanless and single-board computers, so anything I buy to use as an advanced router is just a new SPOF and I might as well have used it for the reverse proxy.

38 comments

My personal opinion, as soon as you’re charging and providing SLAs you’ve exceeded what you should be doing on a residential ISP.
I’d really recommend putting your app in a real cloud solution, which can provide actual load balancing via DNS natively for regional failover if you desire.
- I feel like op is about to find out why businesses pay for cloud services.
- I get it, and I've seen this response other places I've asked about this too. But a license agreement can just offer refunds for downtime, it doesn't have to promise any specific amount of availability. For small, cheap, experimental subscription apps, that should be enough; it's not like I'm planning on selling software to businesses or hosting anything that users would store critically important data in. The difference in cost between home servers and cloud hosting is MASSIVE. It's the difference between being able to make a small profit on small monthly subscriptions, versus losing hundreds or thousands per month until subscriber numbers go up.
  (also fwiw this entire plan is dependent on getting fiber internet, which should be available in my area soon; without fiber it would be impractical to run something like this from home)
  
  You aren't going to get high reliability unless you spend big time. Instead, could you just offer uptime during business hours? Maybe give yourself a window to do planned changes.
  
  This will blow up in your face. You know enough to be dangerous but no enough to know that uptime is very hard.
  AWS or Azure really isn't that expensive if you are just running a VM with some containers. You don't need to over think it. Create a VM and spin up some docker containers.
  
  That's not the point. Its unprofessional. Someone is going to smash and grab OPs idea and actually have the skills to host it properly. Probably at a fraction of the cost because OP doesn't understand that hosting SaaS products out of his house isn't professional or effective.
  Also; cloud is cheaper than self hosting at any small amount of scale. This wouldn't cost much to run in AWS if built properly. The people who struggle with AWS costs are not professionals and have no business hosting anything.

Keepalived to set up a floating IP between two proxy hosts. The VIP is where the traffic points to, the two hosts act as active/passive HA.
- Looking into this a little, it might be what I need. The documentation I've found on this says it uses VRRP, which creates a "virtual" IP address; will that be different from the machine's own IP address? And will an ordinary router be able to forward a port to this kind of virtual IP address without any special configuration?
  
  Yes. Your machines would have one main IP address, and one virtual IP address that would be assigned to either machine depending on the priority or health check status. That IP can be on the same physical interface, or a separate one. It’s very flexible, pretty standard config for high availability setups.

Additional SPoFs: Your upstream internet connection, your modem/router, electricity supply, your home (not burning, flooded, collapsed, etc.). And you.

Congrats, you're officially at the point where you should probably looking at kubernetes. Highly available, failover, and load balancers. It's a steep learning curve, but if you're looking for this level of availability you're probably ready for it
- Already considering using Kube, though I haven't read much about it yet. Does it support this specific use case (making multiple servers share a single LAN IP with failover), in a way that an ordinary router can use that IP without special configuration?
  
  I use k3s as my base with istio to handle routing, so each node then has the same ports open and istio is the proxy. Internally there's a load balancer to distribute to whatever pod the traffic needs to go to. Outside the cluster DNS is my only single point of failure but it routes to multiple hosts. I doubt you'd have trouble finding a way to have a DNS that can do that. I don't think you can get that much more separated from single points
  
  You want proper Kubernetes. Kube is for learning and testing purposes only. In Kubernetes there are plenty of different Ingress services available depending on your provider. I would look into something like Traefik or Metallb

I do this with HAProxy and keepalived. My dns servers resolve my domains to a single virtual ip that keepalived manages. If one HAProxy node goes down, the other picks right up.
And this is one of the few things I’ve got setup with ansible, so deploying and making changes is pretty easy.

SLAs?
You're going to need a redundant ISP and a generator. You've left the territory where it's economical to self host something if that's what you're looking at. You still have several other single points of failure.
And I'll be honest, your setup isn't ready for an SLA either. Just having a second machine is such a small part of what you need to do before doing any guarantees. Are you using a Dynamic DNS service? What's the networking setup look like? Router to Compute?
From the sounds of it, you're not a professional. It might be time to engage an expert if you want to grow this.

No, the router being the SPOF (single point of failure) is totally avoidable.
At mny home (no SaaS services offered, but critical "enough" for my life services) i have two different ISPs on two different tecnologies: one is FTTC via copper cable (aka good old ADSL successor) plus a WFA 5G (much faster but with data cap). Those two are connected to one opnSense router (which, indeed, is a SPOF at this time). But you can remove also this SPOF by adding a second opnSense and tie the two in failover.
So the setup would be:
FTTC -> ISP1 router -> LAN cable 1 to port 1 of opnSense n.1
FTTC -> ISP1 router -> LAN cable 2 to port 1 of opnSense n.2
FWA -> ISP2 router -> LAN cable 1 to port 2 of opnSense n.1
FWA -> ISP2 router -> LAN cable 2 to port 2 of opnSense n.2
Then in both opnSense i would setup failover multi-WAN and bridge them together so that one diyng will trigger the second one.
edit: fixed small errors

Disappointed to see the cloud people preaching uptime when most cloud offerings have severe downtime issues weekly.
Stop living in a bubble.
Github was down yesterday and that isn't fun.
Stuff still goes down all the time on the cloud. More than on prem in my experience.
They don't even properly track their downtime and lie about 99.9

So you have 2, or 3 spof, your home internet, your home router, and your reverse proxy container.
You can solve most of that with a second internet connection on its own router and some k3s/k8s
Current router points to one container then you have your second router point to the other container. You can use DNS load balancing to share the connections over your 2 internet connections.
Depending on your monitoring system you if a connection goes down you could then trigger a DNS update to remove the offline connection from DNS. You will have to set the ttl of the record to facilitate the change more rapidly.

OPNsense and HAproxy might be a place to start, they work well together. You can define a backend pool of servers for roundrobinning, and if you buy a block of IPs you can roundrobin the incoming requests as well. I run OPNsense as a VM so that I can use Proxmox's high availability service for the router and it'll failover or manually livemigrate if I'm doing maintenance. You can VLAN the servers off from the rest of the network as well with OPNsense, and set up VPNs there for clients if needed, or use the SDN functions in the hypervisor to segregate servers if you're running them on the hypervisor.

This is a rabbit hole that's going to be very expensive. Caddy isn't going to do what you are wanting. You likely need enterprise systems which are complex and require at least 3 machines.
I would use AWS or Azure instead

The term you're looking for is load balancing. DNS load balancing will work fine for your purposes. Use a DNS host that supports health checks to the endpoints, and you're all set. If one goes down, DNS will not be returned when querying the record for the downed host.
- For what OP is asking DNS has no part in DNAT, they need a load balancer.
  Personally, asking about high uptime on a residential ISP is the larger issue here, but alas.
- I don't think this is it. The router doesn't know anything about DNS, it only knows "this port goes to this IP address". It seems like I either need multiple devices sharing an IP address or router software that inherently supports load balancing.
  
  You just described a load balancer. The router doesn't know about DNS but clients using your service use DNS. You can do some simple load balancing behind DNS. If you want to do it by IP address you want a load balancer though.
  
  If your current router doesn't support static DNS entries or advanced management of them, you could run a DNS service, or just get a router that runs OpenWRT. GL.Inet makes solid devices for relatively cheap.
  
  You need something like HAproxy or Traefik

38 comments