Fediverse @lemmy.world blue_berry @lemmy.world 1 yr. ago

The Living Web: Will there eventually be a protocol on top of (or as a part of) ActivityPub to run distributed LLMs on instances, with the LLMs feeding on the data generated by the Fediverse?

Last year, the corporate-dominated web came alive, much thanks to the help of corporate-owned social media platforms - how long will it take until the open sections of the web do so?

And if that has been achieved: with stuff happening in the digital world that doesn't in the real world, it could actually be worthwile for people to immerse further into the digital world, which up until now was always the problem behind the idea of the "Metaverse" (I'm not necessarily for this; its just something that came to my mind yesterday). Could that be the actual next iteration of the web and realize what was in the past considered the "smart" web or would it be a dystopia?

And one last question that came to my mind: would it be possible to make the LLMs somehow run independently (could the blockchain maybe be finally put to some use here?) and how would all of that be experienced like from the user perspective?

Fediverse @lemmy.ml blue_berry @lemmy.world 1 yr. ago

The Living Web: Will there eventually be a protocol on top of (or as a part of) ActivityPub to run distributed LLMs on instances, with the LLMs feeding on the data generated by the Fediverse?

11 comments

I'm not sure what you mean at all. Why would someone run an LLM on an instance? If you're saying how long it'll take for people to use the Fediverse for training data, it's already happening. It's no secret the web is being scraped entirely for training data.

would it be possible to make the LLMs somehow run independently (could the blockchain maybe be finally put to some use here?)

???

I think you're confusing a lot of different topics together.
- Ok, I got carried away a bit here
Why would instances want to train an LLM on text users post? What does it mean for an LLM to be distributed?
- But wasn't ChatGPT trained on huge amount of data from social media, Reddit etc.? With distributed I mean that if the server fails the LLM still stays alive, like this. So I thought about fail-safe. Maybe I didn't thought it through enough ...
  
  What do you mean by "stay alive"? LLMs are statistical models that require a lot of number crunching in order to output a response. That's not something you can just host without some major costs associated (Far more than any single Fediverse instance for sure, GPUs are not cheap to rent).
There's no need to run an LLM on the same system it was trained on. Once the model is built it contains all the information already. If you want a model to live on long term you would just release the file(s) publicly, like hugging face does with theirs, then anyone could use it or host an interface for it.
- Is huggingface non-profit by the way?
  
  I don't believe so. I'm not sure what their long term goals look like.
Last year, the corporate-dominated web came alive, much thanks to the help of corporate-owned social media platforms - how long will it take until the open sections of the web do so?

Honestly, I'm not sure how someone could come to this conclusion. The "corporate-dominated web" has been around a real long time. Remember Digg? Facebook? Myspace? We also already see several corporations using ActivityPub protocols for federation for a few years. Companies are real good at adopting technologies that work. For better or worse.

with stuff happening in the digital world that doesn’t in the real world

What? This is the real world.

it could actually be worthwile for people to immerse further into the digital world

That seems unlikely. But it's a good set of tools to use.

Could that be the actual next iteration of the web and realize what was in the past considered the “smart” web or would it be a dystopia?

I think what you're asking here is whether the "next iteration" will be users using more services? "Metaverse" is simply Facebook adding a bunch of new services beyond just facebook. Like Google does, really. Get your email, chat, social media (once upon a time), video service, books, etc. all from one place. Baidu does this too.

would it be possible to make the LLMs somehow run independently

This question doesn't make sense, TBH. Independently of what? They already run independently. You can turn off websites arbitrarily and an LLM will keep predicting the next word for you.

and how would all of that be experienced like from the user perspective

You already experience this. Whenever you use the translation app on your phone, or your phone offers the next word you might want to type? That's you experiencing this.

could the blockchain maybe be finally put to some use here?

Block chains have nothing at all to do with LLMs, or neural networks generally really. So I'm not sure if this post was produced by a prompt given to an LLM at this point.
The real world is where things that matter happen. Life, love, nature. The Web is distracting and loud, but it's a big, flimsy illusion. So I don't think there is any chance that Metaverse ideas will take off.

Regarding LLMs: good news! You can already run them at home. Check out KoboldAI. LLMs will become smaller as time goes on, too. There's lots of room for improvement in that field.
I see many good uses of LLMs:

Auto generated Images descriptions.( untapped potential )

video summeriser

news summeriser ( already exists.Thanks to the dev(s) ! )

Automatic credibility check of news by searching other sources.

You've viewed 11 comments.