Hot take: LLM technology is being purposefully framed as AI to avoid accountability
Which of the following sounds more reasonable?
I shouldn't have to pay for the content that I use to tune my LLM model and algorithm.
We shouldn't have to pay for the content we use to train and teach an AI.
By calling it AI, the corporations are able to advocate for a position that's blatantly pro corporate and anti writer/artist, and trick people into supporting it under the guise of a technological development.
I think it’s the same reason the CEO’s of these corporations are clamoring about their own products being doomsday devices: it gives them massive power over crafting regulatory policy, thus letting them make sure it’s favorable to their business interests.
Even more frustrating when you realize, and feel free to correct me if I’m wrong, these new “AI” programs and LLMs aren’t really novel in terms of theoretical approach: the real revolution is the amount of computing power and data to throw at them.
IMO content created by either AI or LLMs should have a special license and be considered AI public domain (unless they can prove that they own all content the AI was trained on). Commercial content made based on content marked with this license would be subject to a flat % tax that should be applied to the product price which would be earmarked for a fund distributing to human creators (coders, writers, musicians etc.).
both sound the same to me IMO. Private companies scraping ostensibly public data to sell it. No matter how you word it they are trying to monetize stuff that is out in the open.
Our legal system has the concept of mechanical licensing. If your song exists, someone can demand the right to cover it and the law will favor them. The result of an LLM has less to do with your art that a cover of your song does.
There are plenty of cases of a cover eclipsing the original version of a song in popularity and yet I have never met a single person argue that we should get rid of the right to cover a song.
I'm not sure what you're trying to say here; LLMs are absolutely under the umbrella of AI, they are 100% a form of AI. They are not AGI/STRONG AI, but they are absolutely a form of AI. There's no "reframing" necessary.
No matter how you frame it, though, there's always going to be a battle between the entities that want to use a large amount of data for profit (corporations) and the people who produce said content.
I'll note that there are plenty of models out there that aren't LLMs and that are also being trained on large datasets gathered from public sources.
Image generation models, music generation models, etc.
Heck, it doesn't even need to be about generation. Music recognition and image recognition models can also be trained on the same sort of datasets, and arguably come with similar IP right questions.
It's definitely a broader topic than just LLMs, and attempting to enumerate exhaustively the flavors of AIs/models/whatever that should be part of this discussion is fairly futile given the fast evolving nature of the field.
If an LLM was trained on a single page of GPL code or a single piece of CC-BY art, the entire set of model weights and any outputs from the model must be licensed the same way. Otherwise this whole thing is just blatant license laundering.
In fairness, AI is a buzzword that came out well before LLMs. It's used to mean "tHe cOmpUtER cAn tHink!". We play against "AI" in games all the time, but they arent AI as we know it today.
ML (machine learning) is a more accurate descriptor but blah doesn't have the same pizzazz as AI does.
The larger issue is that innovation is sometimes done for innovation's sake. Profits gets mixed up there and a board has to show profits to shareholders and then you get VCs trying to "productize" and monetize everything.
What's more is there are only a handful of players in the AI space, but because they are giving API access to other companies, those companies are building more and more sketchy uses of that tech.
It wouldn't be a huge deal if LLMs trained on copywritten material and then gave the service away for free. As it stands, some LLMs are churning out work that could be protected under copywrite law by humans (AI work can't be copywritten under US law), and turning a profit.
I don't think "it was AI" will hold up in court though. May need to do some more innovation.
Also there are some LLMs being trained on public domain info, to avoid copywrite problems. But works go into the public domain after 70 years past the copywrite holder's death (disney being the biggest extender of that rule), so your AI will be a tad out dated in it's "knowledge".
I think you are likely right, but it's more general than just about training costs. The term "AI" carries a ton of baggage, both good and bad.
To some extent, I think we also keep pushing back the boundary of what we consider "intelligence" as we learn more and better understand what we're creating. I wonder if every future tech generation will continue this cycle until/unless humanity actually does create a general artificial intelligence--every iteration getting slightly closer but still falling short of "true" AI, then being looked at as a disappointment and not worthy of the term anymore. Rinse and repeat.
That's absolutely not correct. AI is a field of computer science/scientific computing built on the idea that some capabilities of biological intelligences could be simulated or even reproduced "in silicon", i.e. by using computers.
Nowadays is an extremely broad term that covers a lot of computational methodologies. LLM in particular are a evolution of methods born to simulate and act as human neural network. Nowadays they work very differently, but they still provide great insights on how an "artificial" intellicenge can be built. It is only one small corner of what will be a real general artificial intelligence, and a small step in that direction.
AI as a name is absolutely unrelated with how programs based on the methodologies are built.
Human intelligences are in charge of all copyright part. AI and copyright are orthogonal, people are those who cannot tell the 2 and keep talking about AI.
There is AI, and there is copyright, it is time for all of us to properly frame the discussion on "copyright discussion related to <company>'s product"
Both of those statements are reasonable. You shouldn't have to pay to utilize anything you scrape from the internet, so long as you don't violate copyright by redistributing it
Honestly, I see 0 difference. I think you are suggesting that somehow it is more logical to give information to AI for free sounds more reasonable than to LLM (which is absolutely AI). I see no reason at all to believe so. Maybe you can elaborate?
What is meant by the term "AI" has definitely shifted overtime. What I would have considered to be an AI back in time is nowadays referred to as an "AGI". So they simply changed the language. LLMs are not really capable of "intelligence" they are just automated statistics. On the other hand what really is intelligence? The output does appear intelligent. Maybe in the end it does not matter how it is generated.
If we're unmasking tech, LLM's right now are also all just Computer Vision models with a lot of more abstraction layers thrown at them. Nothing but fit assessment machine with a ludicrous amount of extra steps.
I am convinced this is all pedantry, and these models are going to become the de facto basis for true AI at some point. It was already weird enough that this type of tech got discovered from the goal of checking if an image has a cat or not.
It’s just a happy coincidence for them, they call it AI because calling it “a search engine that steals stuff instead of linking to it and blends different sources together to look smarter” wouldn’t be as interesting to clueless financial markets people