YSK: Subscriber count on communities only show the numbers of users subscribed from your specific instance. The real number might be much larger than you think.
This is an interesting problem with federation by design. I do wonder if there's some space to create a pipeline type application that shares this kind of data. Or an integration with the site you listed.
I'm not convinced it's a federation issue, it seems more like it's by design. After all, it does show you the active user counts. Presumably you could get the total subscribers count just by having an API call to the home community to ask for it.
I'm going to share a sentence my father blew my mind with when I was 16:
"Unreliability is the internet's biggest, best feature."
By this, he meant that the internet is extremely fail tolerant; one server, one site, one component goes down, the rest of it keeps working.
I think that's at play here. An instance can keep up with its own local members and subscribers, I imagine that's just a database operation, MySQL or something. But when trying to total up total number of subscribers from other instances, very realistic problems start to pop up.
A member from Instance A subscribes to a community on Instance B. How does Instance B keep up with that subscription? A sends B a message that someone has subscribed, so it adds an entry to a "foreign subscribers" list? Cool. And I suppose an "unsubscribe" message would also be sent to remove that entry, right?
What if that user deletes their account or it's banned? What if Instance A just...shuts down one day and never boots back up? You'll end up with these ghost entries inflating numbers. It's not an easy problem to work around.
At a high level you've pretty much nailed what is happening.
What if that user deletes their account or it’s banned?
Lemmy federates these to let other instances know. Check the mod log (link at bottom of every lemmy instance website) to see the record of this).
What if Instance A just…shuts down one day and never boots back up? You’ll end up with these ghost entries inflating numbers. It’s not an easy problem to work around
This is already an issue, but a solvable one. Currently some instances are blocking hundreds of other instances that used to exist but no longer do, because Lemmy keeps trying to contact them and when it fails it retries.
But the solution probably isn't that hard. Someone smarter than me can work it out but I imagine it working something like retry every 5 mins for an hour, every hour for a week, then don't retry unyil you get a new request from that instance (e.g. for one of their users to subscribe to a community on your instance).
In fact, Mastodon is a lot more mature than Lemmy an I expect would have the same problem, so we can probably copy whatever their solution is.
I imagine the simplest solution would be to add up the subscriber count of each instance you're federated with and show a 'federated subscribers' count per community.
The instance that the community is on has the total list, and since the active user count is accurate I presume it's already sending that information in some way. Easiest would be to include it with that data, I'd think.
I actually like the idea of a server that polls all the instances on some reasonable frequency (could even be just once a day), and then holds information about users and communities in aggregate. That way, all the instances could just go to that one place to see totals like this without each instance having to poll every other instance.
That seems to add a single point of failure for some key functionality. And who owns that server? Can they be bought out by Meta pretending to be a good citizen?
I wouldn't call that functionally "key" - in fact we're doing okay without it now. It would be an easy way to add some nice to have functionally without a lot of overhead.