Do we know whether federated content (say from Lemmy or Mastodon) with these sites may be under the deal as well?
They don’t need a deal with Lemmy.
It’s an open platform, they can just scrape all the data.
Is that legal? When you sign onto a proprietary platform you usually sign away your rights, but with lemmy this isn’t the case, so scraping your data to use to train AI would violate copyright laws, right?
They could set up their own fully federated Lemmy instance and scrape themselves.
They could, and we couldn’t stop them, but I think they legally couldn’t use content from other instances or even from users from other instances. Not that that will stop them, of course.
this may make it easier tho. as in, why set up another instance when you can just buy it from a well-known player?
When you chose to use their free service, you already sold your soul to devil.
This is such a flawed argument though, many of us remember when these services started coming out and the general Zeitgeist was “wow! What an amazing and interesting way to connect to each other!” There wasn’t too much public concern that our works would be sold to companies because these were just “platforms” places where you could shout out to the world about your passion.
The idea that this was a mistake the end user should have known better about is wrong because there was no preconception that your creative ideas were at any sort of risk, AI didn’t exist and it was commonly accepted that “of course you owned this, you made it”.
If you apply such a modern lens to the very early stages of the internet, of course it’s going to look stupid. But remember that most people at the time thought they’d be safe and wouldn’t willingly subject themselves to this kind of treatment
I don’t disagree with you, nor am I trying to blame people who didn’t know. I didn’t know myself either 20 years ago. I’m just stating a fact and hope people can learn these, and if they still choose one thing over the other, don’t come and cry.
Say the line Bart!
If you’re not paying for the product, then you are the product.
Also applies: no such thing as a free lunch.
Why would someone pay to train a LLM on Tumblr???
At least Google paying for reddit content, would have something useful mixed with all the memes.
But Tumblr would definitely poison the results
Trying to get a head of the 4chan AI