Skip Navigation
30 comments
  • LLM scraping is a parasite on the internet. In the actual ecological definition of parasite: they place a burden on other unwitting organisms computer systems, making it harder for the host to survive or carry out their own necessary processes, solely for the parasite's own benefit while giving nothing to the host in return.

    I know there's an ongoing debate (both in the courts and on social media) about whether AI should have to pay royalties to its training data under copyright law, but I think they should at the very least be paying to use infrastructure while collecting the data, even free data, given that it costs the organisation hosting said data real money and resources to be scraped, and it's orders of magnitude more money and resources compared to serving that data to individual people.

    The case can certainly be made that copying is not theft, but copying is by no means free either, especially when done at the scales LLMs do.

  • One of my sites was close to being DoS'd by openAI's crawler along with a couple of other crawlers. Blocking them made the site much faster.

    I'd admit the software design offering search suggestions as HTML links didn't exactly help (this is a FOSS software used for hundreds of sites, and this issue likely applies to similar sites) but their rapid speed of requests turned this from pointless queries into a negligent security threat.

30 comments