The Internet Archive just lost its appeal over ebook lending

fossilesque@mander.xyz · edit-2 1 year ago

The Internet Archive just lost its appeal over ebook lending

DrCake@lemmy.world · 1 year ago

So when’s the ruling against OpenAI and the like using the same copyrighted material to train their models

irotsoma@lemmy.world · 1 year ago

But OpenAI not being allowed to use the content for free means they are being prevented from making a profit, whereas the Internet Archive is giving away the stuff for free and taking away the right of the authors to profit. /s

Disclaimer: this is the argument that OpenAI is using currently, not my opinion.

norimee@lemmy.world · edit-2 1 year ago

Ah, I see you got that all wrong.

Open IA AI uses that content to generate billions in profit on the backs of The People. The Internet Archive just does it for the good of The People.

We can’t have that. “Good for The People” is not how the economy works, pal. We need profit and exploitation for the world to work…

v_krishna@lemmy.ml · 1 year ago

OpenAI is burning billions of dollars not making profit.

Agret@lemmy.world · 1 year ago

Sounds like they are operating the same as all the other big tech companies then

buddascrayon@lemmy.world · 1 year ago

Wrong

https://futurism.com/the-byte/openai-copyrighted-material-parliament

v_krishna@lemmy.ml · 1 year ago

Eh? That article says nothing about their profit margins. Today they have something like $3.5B in ARR (not really, that’s annualized from their latest peak, in Feb they had like $2B ARR). Meanwhile they have operating costs over $7B. Meaning they are losing money hand over fist and not making a profit.

I’m not suggesting anything else, just that they are not profitable and personally I don’t see a road to profitability beyond subsidizing themselves with investment.

buddascrayon@lemmy.world · 1 year ago

It’s in the first bloody paragraph. 😮‍💨

OpenAI is begging the British Parliament to allow it to use copyrighted works because it’s supposedly “impossible” for the company to train its artificial intelligence models — and continue growing its multi-billion-dollar business — without them.

And if you follow the link the title of the article says it all:

#OpenAI is set to see its valuation at $80 billion—making it the third most valuable startup in the world

v_krishna@lemmy.ml · 1 year ago

I take it you don’t understand how startups work?

OpenAI is not making any profit and is losing money hand over fist today. Valuation and raising investment rounds isn’t profit.

finitebanjo@lemmy.world · 1 year ago

I think you accidentally swapped OpenAI and Open IA which happens to initialize Internet Archive, a little confusing.

norimee@lemmy.world · 1 year ago

I didn’t even realise. Thank you for pointing it out, I fixed it.

Anyolduser@lemmynsfw.com · 1 year ago

Hot on the heels of this one, I’d imagine.

iAmTheTot@sh.itjust.works · 1 year ago

Fat chance. Line must go up.

shrugs@lemmy.world · 1 year ago

So, let’s say we create an llm that will be fed will all the copyrighted data and we design it, so that it recalls the originals when asked?! Does that count as piracy or as the kind of legal shananigans openai is doing?

wizblizz@lemmy.world · 1 year ago

Aaaaaany minute now.

PriorityMotif@lemmy.world · 1 year ago

deleted by creator

Gsus4@mander.xyz · edit-2 1 year ago

The matter is not LLMs reproducing what they have learned, it is that they didn’t pay for the books they read, like people are supposed to do legally.

This is not about free use, this is about free access, which at the scale of an individual reading books is marketed as “piracy”…at the scale of reading all books known to man…it’s onmipiracy?

We need some kind of deal where commercial LLMs have to pay a rent to a fund that distributes that among creators or remain nonprofit, which is never gonnna happen, because it’ll be a bummer for all the grifters rushing into that industry.

PriorityMotif@lemmy.world · 1 year ago

I think we need to re-examine what copyright should be. There’s nothing inherently immoral about “piracy” when the original creator gets almost nothing for their work after the initial release.

barsoap@lemm.ee · 1 year ago

it is that they didn’t pay for the books they read, like people are supposed to do legally.

If I can read a book from a library, why shouldn’t OpenAI or anybody else?

…but yes from what I’ve heard they (or whoever, don’t remember) actually trained on libgen. OpenAI can be scummy without the general process of feeding AI books you only have read access to being scummy.

General_Effort@lemmy.world · 1 year ago

Meta is defending because they trained on books3 which contained all of Bibliotik. https://en.wikipedia.org/wiki/The_Pile_(dataset)

Gsus4@mander.xyz · edit-2 1 year ago

This is not like reading a book from a library…unless you want to force the LLM to only train one book per day and keep no copies after that day.

barsoap@lemm.ee · 1 year ago

They don’t keep copies and learning speed? Why one day? Does it count if I skim through a book?

Gsus4@mander.xyz · 1 year ago

deleted by creator

index@sh.itjust.works · 1 year ago

stop asking questions and go back to work

MigratingtoLemmy@lemmy.world · 1 year ago

If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let’s make a fantastic model trained on what the internet archive has. Tell you what, let Mistral’s engineers lead that charge, and put an AGPL license on the project so that companies can’t fuck us over.

I refuse to believe that nobody has thought of this yet

bandwidthcrisis@lemmy.world · 1 year ago

An AI trained on old Internet material would be like a synthetic Grandpa Simpson:

“In my day we said ‘all your base’ and laughed all day long, because it took all day to download the video.”

General_Effort@lemmy.world · 1 year ago

What do you think Mistral trains its models on? Public domain stuff?

Dkarma@lemmy.world · 1 year ago

“AI write Hamlet” AI writes Idiocracy.

werefreeatlast@lemmy.world · 1 year ago

Better yet! Train an AI to re-write the books into brand new books and let us read, review the content, add notes etc so that the AI can refresh the books if we find errors.

Kick the private collections to the curb! Teeth in like in American History X.

capital@lemmy.world · 1 year ago

We get it, y’all hate LLMs and the companies who make them.

This comparison is disingenuous and I have to think you’re smart enough to know that, making this disinformation.

If/when an LLM like ChatGPT spits out a full copy of training text, that’s considered a bug and is remediated fairly quickly. It’s not a feature.

What IA was doing was sharing the full text as a feature.

As far as I know, there are some court cases pending regarding determining if companies like Open AI are guilty of copyright infringement but I haven’t seen any convictions yet (happy to be corrected here).

All that said, I love IA and have a Warrior container scheduled to run nightly to help contribute.

MigratingtoLemmy@lemmy.world · 1 year ago

Hmm, true. IA wouldn’t be as supported if we couldn’t get the full text of the source.

Can you tell me more about the “warrior container”?

capital@lemmy.world · 1 year ago

It’s mentioned in the OP but it’s this:

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

Basically, distributed collection.

JackGreenEarth@lemm.ee · 1 year ago

Another sad day for pro-preservation advocates

/home/pineapplelover@lemm.ee · 1 year ago

A sad day for intellects

Lettuce eat lettuce@lemmy.ml · 1 year ago

Artificial scarcity at its finest. Imagine recording a song digitally, then pretending there are a limited amount of copies of that song in existence. Then you sell an agreement to another person that says they have to pretend there is only a certain made up number of copies that they bought, and if they allow more than that number of people to listen to those copies at rhe same time, they will get sued for “stealing” additional pretend copies?

I hope everybody can see how this is the insane and pathetic result of Capitalism’s unrelenting drive to commodify everything it possibly can in the pursuit of profit.

As always, the solution is sailing the high seas. Throughout history, those who created or saved illegal copies/translations of literature and art were important to preserving and furthering human knowledge.

Many incredibly powerful people, empires, and countries have tried very hard to suppress that, but they keep failing. You cannot suppress the human drive for curiosity and knowledge.

Ming@lemmy.dbzer0.com · 1 year ago

True, and the fleet is big and strong. There are many people seeding hundreds of terabytes of books/research papers/etc. The knowledge will not be lost. Yarr, can’t catch me in the high seas…

fpslem@lemmy.world · 1 year ago

Not a surprise, but still somehow crushing. It’s a loss for us all.

HexesofVexes@lemmy.world · 1 year ago

Ah, I see we’re burning the Library of Alexandria again… Just as with last time, the survival of texts will rely upon copies.

Stern@lemmy.world · 1 year ago

Oh sure I want to read copyright books it’s an issue, but OpenAI does it and it’s vital to their business so they can keep going.

drislands@lemmy.world · edit-2 1 year ago

My understanding is that the IA had implemented a digital library, where they had (whether paid or not) some number of licenses for a selection of books. This implementation had DRM of some variety that meant you could only read the book while it was checked out. In theory, this means if the IA has 10 licenses of a book, only 10 people have a usable copy they borrowed from the IA at a time.

And then the IA disabled the DRM system, somehow, and started limitlessly lending the books they had copies of to anyone that asked.

I definitely don’t like the obnoxious copyright system in the USA, but what the IA did seems obviously ~~wrong~~ against the agreement they entered into. Like if your local library got a copy of Book X and then when someone wanted to borrow it they just copied it right there and let you keep the copy.

ETA: updated my wording. I don’t believe what the IA did was morally wrong, per se, but rather against the agreement I presume they entered into with the owners of the books they lent.

MrScottyTay@sh.itjust.works · 1 year ago

They disabled drm during lockdown so people had something to do

accideath@lemmy.world · 1 year ago

Which was nice of them, but that doesn’t mean they should’ve done that, especially in the eyes of the law. (Also, if you’re after free ebooks, why are you pirating them on archive.org instead of libgen?)

CondensedPossum@lemmy.world · 1 year ago

Removed by mod

accideath@lemmy.world · 1 year ago

Where did I say that find it good that they got sued or lost their appeal? I just said that the reason why they lost the appeal is because according to the law they’re bound to, what they did was wrong. And maybe they should’ve left that to a platform that enjoys a little more immunity from said law, because there are plenty of those. It was stupid of them. They painted an unnecessary target on their back that doesn’t help their cause and I‘d prefer them not to have to shut down at some point because I’m all for the Internet archive archiving anything and everything. They should’ve stayed a legitimate library and everything would have been fine and would have served their cause sufficiently well.

CondensedPossum@lemmy.world · 1 year ago

Removed by mod

accideath@lemmy.world · 1 year ago

Ah, so you‘re one of those people that would be well at home at lemmygrad. And what fate are you talking about? Not getting sued?

azuth@sh.itjust.works · 1 year ago

The decision is that even lending out ebooks against owned copies is illegal

What the IA may be illegal but is certainly not wrong.

finitebanjo@lemmy.world · 1 year ago

Wrong? No.

Against the terms of agreements they made? Yes.

Actions also protected by laws exempting nonprofits and archives from copyright restrictions? Also supposed to be yes.

drislands@lemmy.world · 1 year ago

Against the terms of agreements they made? Yes.

To be fair, this is what I meant when I said wrong. Enough people have taken umbrage with my wording that I think I should update it, though. Thank you for your reply.

CondensedPossum@lemmy.world · 1 year ago

Removed by mod

eskimofry@lemmy.world · 1 year ago

Like if your local library got a copy of Book X and then when someone wanted to borrow it they just copied it right there and let you keep the copy.

That’s how it works in the rest of the world.

TheGrandNagus@lemmy.world · 1 year ago

No it isn’t.

fossilesque@mander.xyz · 1 year ago

Direct link to the court document: https://storage.courtlistener.com/recap/gov.uscourts.ca2.60988/gov.uscourts.ca2.60988.306.1.pdf

✺roguetrick✺@lemmy.world · 1 year ago

Side note: court listener’s RECAP is often quite disliked by the legal system. They do not like it when people put stuff from PACER fee waved sources on there like Aaron Schwartz did. https://en.m.wikipedia.org/wiki/Free_Law_Project

NotAnotherLemmyUser@lemmy.world · 1 year ago

Woah, I wish I had known about this sooner. Thanks!

ZILtoid1991@lemmy.world · 1 year ago

They need to rename themselves “Intelligent Archive” then claim they’re an AI service that can just happen to regenerate whole books.

bamfic@lemmy.world · 1 year ago

Libgen.rs

sircac@lemmy.world · 1 year ago

But I’m training my organic LLM, can’t I?

Grass@sh.itjust.works · 1 year ago

what does warrior do? The git readme seems to just be setup instructitons

zzx@lemmy.world · 1 year ago

I had the same question. Here’s the answer:

The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the Archive Team archiving efforts. It will download sites and upload them to our archive—and it’s really easy to do!

The warrior is a container running inside a virtual machine, so there is almost no security risk to your computer. (“Almost”, because in practice nothing is 100% secure.) The warrior will only use your bandwidth and some of your disk space, as well as some of your CPU and memory. It will get tasks from and report progress to the Tracker.

fossilesque@mander.xyz · edit-2 1 year ago

click wiki link in readme: https://wiki.archiveteam.org/index.php?title=ArchiveTeam_Warrior

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 1 year ago

Yeah I’m wondering as well. It seems to save webpages, whereas the issue is with scanned books which may be removed from IA…

Parabola@lemmy.world · 1 year ago

If only the readme clearly said what it was with a link you could click…

Grass@sh.itjust.works · 1 year ago

somehow I didn’t see anything above getting started. Looking again I don’t know how I missed it with the big logos unless they didn’t load and the rest was behind a notification or something.

Etterra@lemmy.world · 1 year ago

shotgun_crab@lemmy.world · 1 year ago

Can we make the internet archive archive?