sh.itjust.works Main Community @sh.itjust.works kersploosh @sh.itjust.works 1 yr. ago

Pushing back against the wave of bot accounts on Lemmy

Hi everyone. I wanted to share some Lemmy-related activism I’ve been up to. I got really interested in the apparent surge of bot accounts that happened in June. Recently, I was able to play a small part in removing some of them. Hopefully by getting the word out we can ensure Lemmy is a place for actual human users and not legions of spam bots.

First some background. This won't be new to many of you, but I'll include it anyway. During the week of June 18 to June 25, as the Reddit migration to Lemmy was in full swing, there was a surge of suspicious account creation on Lemmy instances that had open registration and no captcha or email verification. Hundreds of thousands of accounts appeared and then sat inactive. We can only guess what they’re for, but I assume they are being planted for future malicious use (spamming ads, subversive electioneering, influencing upvotes to drive content to our front pages, etc.)

If you look at the stats on The Federation you might notice that even the shape of the Total Users graphs are the same across many instances. User numbers ramped up on June 18, grew almost linearly throughout the week, and peaked on June 24. (I’m puzzled by the slight drop at the end. I assume it's due to some smoothing or rate-sensitive averaging that The Federation uses for the graphs?)

Here are total user graphs for a few representative instances showing the typical shape:

Clearly this is suspicious, and I wasn’t the only one to notice. Lemmy.ninja documented how they discovered and removed suspicious accounts from this time period: (https://lemmy.ninja/post/30492). Several other posts detailed how admins were trying to purge suspicious accounts. From June 24 to June 30 The Federation showed a drop in the total number of Lemmy users from 1,822,313 to 1,589,412. That’s 232,901 suspicious accounts removed! Great success! Right?

Well, no, not yet. There are still dozens of instances with wildly suspicious user numbers. I took data from The Federation and compared total users to active users on all listed instances. The instances in the screenshot below collectively have 1.22 million accounts but only 46 active users. These look like small self-hosted instances that have been infected by swarms of bot accounts.

As of this writing The Federation shows approximately 1.9 million total Lemmy accounts. That means the majority of all Lemmy accounts are sitting dormant on these instances, potentially to be used for future abuse.

This bothers me. I want Lemmy to be a place where actual humans interact. I don’t want it to become another cesspool of spam bots and manipulative shenanigans. The internet has enough places like that already.

So, after stewing on it for a few days, I decided to do something. I started messaging admins at some of these instances, pointing out their odd account numbers and referencing the lemmy.ninja post above. I suggested they consider removing the suspicious accounts. Then I waited.

And they responded! Some admins were simply unaware of their inflated user counts. Some had noticed but assumed it was a bug causing Lemmy to report an incorrect number. Others weren’t sure how to purge the suspicious accounts without nuking their instances and starting over. In any case, several instance admins checked their databases, agreed the accounts were suspicious, and managed to delete them. I’m told that the lemmy.ninja post was very helpful.

Check out these early results!

Awesome! Another 144k suspicious accounts are gone. A few other admins have said they are working on doing the same on their instances. I plan to message the admins at all the instances where the total accounts to active users ratio is above 10,000. Maybe, just maybe, scrubbing these suspected bot accounts will reduce future abuse and prevent this place from becoming the next internet cesspool.

That’s all for now. Thanks for reading! Also, special thanks to the following people:

@RotaryKeyboard@lemmy.ninja for your helpful post!

@brightside@demotheque.com, @davidisgreat@lemmy.sedimentarymountains.com, and @SoupCanDrew@lemmy.fyi for being so quick to take action on your instances!

41 comments

That great. Thank you for your hard work. It's a worthy cause.
Nice! Please don't remove me tho, I migrated from reddit during that wave and I don't interact or comment much, I mostly like looking at posts and scrolling mindlessly while searching for interesting communities to join.
- So just like everyone else
- I generally lurk as well. Usually I'm finding out about something too late to be the first post about it or someone else has already commented what I was thinking. In 13 years on Reddit, I probably have under 100 contributions.
- Don't remove me either pls :(
How do you tell if these suspicious accounts are actually bots? What if they're users who are actually just inactive?
- To be honest I don't know of a way to objectively distinguish a legit user's inactive account from an automated account created via software. I'm looking at graphs and playing the odds. It's possible there are a small number of legit accounts in there, though IMO I think that's very unlikely.
  
  Looking back, I suppose I didn't elaborate why I think the user counts on the highlighted instances look suspicious. The ones with huge total-vs-active user ratios look like pure bot pools to me due to two characteristics: (1) The number of active users doesn't change as the number of accounts increases. 30,000 or 60,000 new accounts appear and none of them show any activity? No way. (2) The user count grows occurs evenly within a certain date range and then abruptly stops. If these instances were really being used by the general public then I would expect accounts to be generated before June 18 and after June 25. And the growth within that window should be uneven. The fact that multiple instances saw the same growth pattern on the same dates smells like automation at work.
Thank you for your service o7
This is brilliant, but I think you might be over looking something here. I’m part of the reddit migration and a like a lot of new users I didn’t know what I was doing or really understand instances when I joined. So I ended up signing up on a few different instances before I understood. 3 of my accounts are inactive, but I don’t want to delete them necessarily - having alt accounts makes sense.

Lemmy.world is my main account, but it was completely overwhelmed for a couple a days at the start of the migration and was pretty much unusable. Some instances have already defederated from other instances, or are debating doing so in future. And then there was the hack that rendered a bunch of instances unusable. Not to mention I might want a separate porn account, professional account etc… I wouldn’t be surprised if there were double the number of genuine (but inactive) accounts created 3 weeks ago as there were new sign ups.

From your numbers the bot accounts still far outnumber the genuine accounts, even if every new user made 4 like me. But I’d be concerned about genuine inactive accounts being chucked out with the bots. Although maybe that’s better than the alternative?
- The way I did it was just deleting any account that signed up, but didn't complete email verification within one hour.
  
  The bots weren't completing email verification.
  
  Though that's only during 0.18 when captcha didn't work, I don't have email verification on now and didn't before 0.18 so I don't know how they identify them now
  
  There's a bug currently going around that allows people to gain your account info, there is no way I'm giving lemmy my email address.
  
  That sounds like a good compromise. Is there a way to message all inactive accounts and asking them to complete a captcha before a certain date, or the accounts will be deleted?
- That's me. I've been creating multiple accounts and hopping around. It's only been a week so nothing permanently planted quite yet.
  
  I’d like to wait a month or so and let everything settle before I decide which instance I make my main account. There’s so much going on at the moment the admins, hosts, mods and developers/programmers deservedly need a minute to get on top of everything. I’d like to retain my inactive accounts until then, especially as we’re talking about what these inactive accounts might do in the future, not an army of malicious bots attacking now.
- I did this too. I initially signed up at beehaw but they then defederated from many of the larger instances. So it's already become a very specific vibe and focus when I log in there. I have a local instance I joined that I use to keep up with sports and local news. And another for general scrolling and discovering new communities. I tried to create a specific use for each one rather than having several abandoned accounts I made before I understood how it worked.
- Same here. I signed up with the same username on 9 different instances. Had no idea how it all worked. 8 of the 9 accounts are mostly inactive now as I stick to just this one
Why not simply ask if they are robots, they wouldn't lie, their programming won't allow it.
- as an AI language model, I am not able to respond to this.
  
  A shibboleth for sentience.
A bunch of lurkers are probably sweating after reading this lol
- Quick! Make a comment to show activity!
- Better get a reply in just to make sure then
Great job man. I alerted voltage.vn about their bot problem back when this was all going down and their admin was able to remove over 10,000 accounts pretty easily.

I've also been following the bot activity with some interest, and to be honest I'm not sure whether it's actually coming from malicious actors or people wanting to help Lemmy by creating news/attention.

However, regardless of the motives, I still feel uneasy about sharing the Threadiverse with a bunch of bot accounts. It just seems like a problem waiting to happen. Thanks for putting in the legwork to fix the biggest offenders, by targeting the servers on your list we can clean up most of the problem with comparatively little effort. Nice work.
Im not deep into the tech behind lemmy, so forgive me my simple question. What counts as an activr user? Someone who posts, comments or votes? What about lurkers?
- An active user is anyone who posts or comments within a specified timeframe. I used 6 months for my spreadsheet.
  
  It's at the bottom of this page: in the Lemmy documentation
  
  https://join-lemmy.org/docs/contributors/07-ranking-algo.html
Good idea, although to add to the above some of us are just here to read and don't post often. It'd be a shame to be deleted as a bot
You're a real one! 🙏
Good job 👍
Amazing job. This whole post deserves way more upvotes.
Does this take into reference accounts that are marked as bots? .. I mean i have my friend: @csm10495_bot@sh.itjust.works .. he's a nice guy.
I salute you for your service 🫡
@kersploosh I’m a real boy!
When I was on Reddit I felt that I was at the whim of the company or the admins/mods. On Lemmy I feel like I have the power to make a positive change like you just did. Thanks
Great post. Half of the time, when I come across a bot, it's not for malicious purposes. It's usually because they want to help grow an instance. The wheels begin to fall off when they start to populate (sometimes spam) communities with unwanted posts, thinking they're helping.

The ad bots are the worse. I've started to see an increase in those. I guess it means Lemmy and the Fediverse are becoming more popular.
Great work!!
As for the reason to create bot accounts, maybe some people tried to increase the number of lemmy user to gain momentum and trigger a (more) massive migration from reddit ?
Thank you very much for such an excellent work! I like Lemmy a lot, and it would be a shame to see that many bots ruining instances.
Would it be possible to do a certificate-based authentication scheme?

The idea is that Lemmy instances could collaboratively maintain API keys that grant access to posting.

You've viewed 41 comments.