Skip Navigation

Bots are running rampant. How do we stop them from ruining Lemmy?

Social media platforms like Twitter and Reddit are increasingly infested with bots and fake accounts, leading to significant manipulation of public discourse. These bots don't just annoy users—they skew visibility through vote manipulation. Fake accounts and automated scripts systematically downvote posts opposing certain viewpoints, distorting the content that surfaces and amplifying specific agendas.

Before coming to Lemmy, I was systematically downvoted by bots on Reddit for completely normal comments that were relatively neutral and not controversial​ at all. Seemed to be no pattern in it... One time I commented that my favorite game was WoW, down voted -15 for no apparent reason.

For example, a bot on Twitter using an API call to GPT-4o ran out of funding and started posting their prompts and system information publicly.

https://www.dailydot.com/debug/chatgpt-bot-x-russian-campaign-meme/

Bots like these are probably in the tens or hundreds of thousands. They did a huge ban wave of bots on Reddit, and some major top level subreddits were quiet for days because of it. Unbelievable...

How do we even fix this issue or prevent it from affecting Lemmy??

299 comments
  • We already did the first things we could do to protect it from affecting Lemmy:

    1. No corporate ownership
    2. Small user base that is already somewhat resistant to misinformation

    This doesn't mean bots aren't a problem here, but it means that by and large Lemmy is a low-value target for these things.

    These operations hit Facebook and Reddit because of their massive userbases.

    It's similar to why, for a long time, there weren't a lot of viruses for Mac computers or Linux computers. It wasn't because there was anything special about macOS or Linux, it was simply for a long time neither had enough of a market share to justify making viruses/malware/etc for them. Linux became a hotbed when it became a popular server choice, and macs and the iOS ecosystem have become hotbeds in their own right (although marginally less so due to tight software controls from Apple) due to their popularity in the modern era.

    Another example is bittorrent piracy and private tracker websites. Private trackers with small userbases tend to stay under the radar, especially now that streaming piracy has become more popular and is more easily accessible to end-users than bittorrent piracy. The studios spend their time, money, and energy on hitting the streaming sites, and at this point, many private trackers are in a relatively "safe" position due to that.

    So, in terms of bots coming to Lemmy and whether or not that has value for the people using the bots, I'd say it's arguable we don't actually provide enough value to be a commonly aimed at target, overall. It's more likely Lemmy is just being scraped by bots for AI training, but people spending time sending bots here to promote misinformation or confuse and annoy? I think the number doing that is pretty low at the moment.


    This can change, in the long-term, however, as the Fediverse grows. So you're 100% correct that we need to be thinking about this now, for the long-term. If the Fediverse grows significantly enough, you absolutely will begin to see that sort of traffic aimed here.

    So, in the end, this is a good place to start this conversation.

    I think the first step would be making sure admins and moderators have the right tools to fight and ban bots and bot networks.

  • Lemmy.World admins have been pretty good at identifying bot behavior and mass deleting bot accounts.

    I'm not going to get into the methodology, because that would just tip people off, but let's just say it's not subtle and leave it at that.

  • This is another reason why a lack of transparency with user votes is bad.

    As to why it is seemingly done randomly in reddit, it is to decrease your global karma score to make you less influential and to discourage you from making new comments. You probably pissed off someone's troll farm in what they considered an influential subreddit. It might also interest you that reddit was explicitly named as part of a Russian influence effort here: https://www.justice.gov/opa/media/1366201/dl - maybe some day we will see something similar for other obvious troll farms operating in Reddit.

  • No current social network can be bot-proof. And Lemmy is in the most unprotected situation here, saved only by his low fame. On Twitter, I personally have already banned about 15000 Russian bots, but that's less than 1% of the existing ones. I've seen the heads of bots with 165000 followers. Just imagine that all 165000 will register accounts on Lemmy, there is nothing to oppose them. I used to develop a theory for a new social network, where bots could exist as much as he want, but could not influence your circle of subscriptions and subscribers. But it's complicated...

    • Also, the "bot"/"human" distinction doesn't have to be binary. Say one has an account that mostly has a bot post generated text, but then if it receives a message, hands it off to a human to handle. Or has a certain percentage of content be human-crafted. That may potentially defeat a lot of approaches for detecting a bot.

  • A chain/tree of trust. If a particular parent node has trusted a lot of users that proves to be malicious bots, you break the chain of trust by removing the parent node. Orphaned real users would then need to find a new account that is willing to trust them, while the bots are left out hanging.

    Not sure how well it would work on federated platforms though.

    • I don't think that would work well, because I knew no one when I came here.

      • You could always ask someone to vouch for you. It could also be that you have open communities and closed communities. So you would build up trust in an open community before being trusted by someone to be allowed to interact with the closed communities. Open communities could be communities less interesting/harder for the bots to spam and closed communities could be the high risk ones, such as news and politics.

        Would this greatly reduce the user friendliness of the site? Yes. But it would be an option if bots turn into a serious problem.

        I haven't really thought through the details and I'm not sure how well it would work for a decentralised network though. Would each instance run their own trust tree, or would trusted instances share a single trust database 🤷‍♂️

  • I've been thinking postcard based account validation for online services might be a strategy to fight bots.

    As in, rather than an email address, you register with a physical address and get mailed a post card.

    A server operator would then have to approve mailing 1,000 post cards to whatever address the bot operator was working out of. The cost of starting and maintaining a bot farm skyrockets as a result (you not only have to pay to get the postcard, you have to maintain a physical presence somewhere ... and potentially a lot of them if you get banned/caught with any frequency).

    Similarly, most operators would presumably only mail to folks within their nation's mail system. So if Russia wanted to create a bunch of US accounts on "mainstream" US hosted services, they'd have to physically put agents inside of the United States that are receiving these postcards ... and now the FBI can treat this like any other organized domestic crime syndicate.

    • I am absolutely not giving some Lemmy admin my address.

    • Easy way to get around that with "virtual" addresses: https://ipostal1.com/virtual-address.php

      Just pay $10 for every account that you want to create.... you may as well just go with the solution of charging everyone $10 to create an account. At least that way the instance owner is getting supported and it would have the same effect.

      • Just pay $10 for every account that you want to create

        So, making identities expensive helps. It'd probably filter out some. But, look at the bot in OP's image. The bot's operator clearly paid for a blue checkmark. That's (checks) $8/mo, so the operator paid at least $8, and it clearly wasn't enough to deter them. In fact, they chose the blue checkmark because the additional credibility was worth it; X doesn't mandate that they get one.

        And it also will deter humans. I don't personally really care about the $10 because I like this environment, but creating that kind of up-front barrier is going to make a lot of people not try a system. And a lot of times financial transactions come with privacy issues, because a lot of governments get really twitchy about money-laundering via anonymous transactions.

        EDIT: I think that maybe a better route is to try to give users a "credibility score". So, that's not a binary "in" or "out". But other people can see some kind of automated assessment of how likely, for example, a person might be to be a bot.

        thinks more

        I mean, this is just spitballing, but could even be done not at a global level, but at a per-other-user level. Like, okay, suppose you have what amounts to a small neural network, right? So the instance computes a bunch of statistics about a each user, like account age, stuff like that, and then provides that to the client. But it doesn't determine the importance of those metrics in whether the other user should see that post, just provides the raw data. You've got a bunch of inputs to a neural net, then. Then the other user can have a set of classifications. Maybe just "hide", but also maybe something like "bot" or "political activism" or whatever. And it takes those input metrics from the instances, and trains that neural net to produce client-side classifications, and then auto-tags users based on that. That's gonna be a pain to try to defeat, because the bot operator can't even see how they're being scored -- they haven't "gotten over the hurdle" or not.

        But you don't want to make every end user train a neural net from scratch. Hmm.

        So maybe what you do is let users create their own scores and expose those to other users, right? I think that I read that BlueSky does something like that, was working on letting users create "curated feeds" for other users. They're doing something simpler, no machine learning, but that's got some drawbacks, means that you have to spend more time determining whether a score is good. So, okay. Say I'm gonna try to score a user based on whether-or-not I think that they're a bot. I have the option to make that score publicly-available. Other users can "subscribe" to that metric, and when they do, there's a new input node added to their local classifier's list of input nodes. Like, "Dons Bot list".

        But I don't have to subscribe to Don's Bot List, and even if I do, it doesn't mean that I automatically consider that other user a bot. Don's rating is just an input into whether my own classifier considers them a bot. If I regularly disagree with Don, even if I'm subscribed to his list, my local neural net will slash the importance of his rating. If I agree with Don unless some other input to my classifier's neural net is triggered, then the classifier can learn that.

      • Hm... I'm not sure if this is enough to defeat the strategy.

        It looks like even with that service, you have to sign up for Form 1583.

        Even if they're willing in incur the cost, there's a real paper trail pointing back to a real person or organization. In other words, the bot operator can be identified.

        As you note, this is yet another additional cost. So, you'd have say ... $2-3 for the card + an address for the account. If you require every unique address to have no more than 1 account ... that's $13 per bot plus a paper trail to set everything up.

        That certainly wouldn't stop every bot out there ... but the chances of a large scale bot farms operating seem like they would be significantly deterred, no?

    • I was thinking physical mail too. But I think It definitely would require some sort of system that is either third party or government backed that annonomyses you like how the covid Bluetooth tracing system worked (stupidly called track and trace in the UK). Plus you'd have to interact with someone at a postal office to legitimise it. But I'm talking, just a worker at a counter.

      So you'd get a one time unique annonomysed postal address. You go to a post office and hand your letter over to someone. You, and perhaps they, will not know the address, but the system will. Maybe a process which re-envelopes the letter down the line into a letter with the real address on.

      This way, you've kept the server owner private and you've had to involve some form of person to person interaction meaning, not a bot!

      This system could be used for all sorts of verification other than for socal media so may have enough incentive for governments/3rd partys to set up to use beyond that.

      Could it be abused though and if how are there solutions to mitigate them?

  • You can't get rid of bots, nor spammers. The only thing is that you can have a more aggressive automated punishment system, which will unevitably also punish good users, along with the bad users.

  • I think the only way to solve this problem for good would be to tie social media accounts to proof of identity. However, apart from what would certainly be a difficult technical implementation, this would create a whole bunch of different problems. The benefits would probably not outweigh the costs.

  • Signup safeguards will never be enough because the people who create these accounts have demonstrated that they are more than willing to do that dirty work themselves.

    Let's look at the anatomy of the average Reddit bot account:

    1. Rapid points acquisition. These are usually new accounts, but it doesn't have to be. These posts and comments are often done manually by the seller if the account is being sold at a significant premium.
    2. A sudden shift in contribution style, usually preceded by a gap in activity. The account has now been fully matured to the desired amount of points, and is pending sale or set aside to be "aged". If the seller hasn't loaded on any points, the account is much cheaper but the activity gap still exists.
    • When the end buyer receives the account, they probably won't be posting anything related to what the seller was originally involved in as they set about their own mission unless they're extremely invested in the account. It becomes much easier to stay active in old forums if the account is now AI-controlled, but the account suddenly ceases making image contributions and mostly sticks to comments instead. Either way, the new account owner is probably accumulating much less points than the account was before.
    • A buyer may attempt to hide this obvious shift in contribution style by deleting all the activity before the account came into their possession, but now they have months of inactivity leading up to the beginning of the accounts contributions and thousands of points unaccounted for.
    1. Limited forum diversity. Fortunately, platforms like this have a major advantage over platforms like Facebook and Twitter because propaganda bots there can post on their own pages and gain exposure with hashtags without having to interact with other users or separate forums. On Lemmy, programming an effective bot means that it has to interact with a separate forum to achieve meaningful outreach, and these forums probably have to be manually programmed in. When a bot has one sole objective with a specific topic in mind, it makes great and telling use of a very narrow swath of forums. This makes Platforms like Reddit and Lemmy less preferred for automated propaganda bot activity, and more preferred for OnlyFans sellers, undercover small business advertisers, and scammers who do most of the legwork of posting and commenting themselves.

    My solution? Implement a weighted visual timeline for a user's points and posts to make it easier for admins to single out accounts that have already been found to be acting suspiciously. There are other types of malicious accounts that can be troublesome such as self-run engagement farms which express consistent front page contributions featuring their own political or whatever lean, but the type first described is a major player in Reddit's current shitshow and is much easier to identify.

    Most important is moderator and admin willingness to act. Many subreddit moderators on Reddit already know their subreddit has a bot problem but choose to do nothing because it drives traffic. Others are just burnt out and rarely even lift a finger to answer modmail, doing the bare minimum to keep their subreddit from being banned.

  • On an instance level, you can close registration after a threshold level of users that you are comfortable with. Then, you can defederate the instances that are driven by capitalistic ideals like eternal growth (e.g. Threads from meta)

  • Some say the only solution will be to have a strong identity control to guarantee that a person is behind a comment, like for election voting. But it raises a lot of concerns with privacy and freedom of expression.

  • Maybe stop letting any random person create an account with no verification whatsoever

    • Are you THE AlexanderESmith of social.alexanderesmith.com fame??

      • Indeed I am! But I don't let all that fame go to my head (I have a special deal for autographs right now, just $20!)

        But seriously, while I consider lackluster (or completely missing) new-account verification to be the much larger issue, federation is one to watch as well. My instance is so-named because I'm the only one who uses it.

        At least it's a fairly significant effort to set up an entire instance for a single user. That should keep spam from single-user instances reasonably low. And if someone sets up a vaguely legitimate-looking instance, but enough users are muted/blocked/moderated/etc, you can just block the entire instance. Changing instance names is more of a hassle than nuking it entirely and starting over (new domain, new database, new IPs if the admins are paying attention, etc).

  • Is this a problem here? One thing we should also avoid is letting paranoia divide the community. It's very easy to take something like this and then assume everyone you disagree with must be some kind of bot, which itself is damaging.

    • Yeah, it’s a problem. You just don’t see it as often yet. A while back there were a large number of communities being blasted by bots, and they would make it into the hot category because nothing else was going on at the time.

    • Is this a problem here?

      Not yet, but it most certainly will be once Lemmy grows big enough.

299 comments