OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
If I memorize the text of Harry Potter, my brain does not thereby become a copyright infringement.
A copyright infringement only occurs if I then reproduce that text, e.g. by writing it down or reciting it in a public performance.
Training an LLM from a corpus that includes a piece of copyrighted material does not necessarily produce a work that is legally a derivative work of that copyrighted material. The copyright status of that LLM’s “brain” has not yet been adjudicated by any court anywhere.
If the developers have taken steps to ensure that the LLM cannot recite copyrighted material, that should count in their favor, not against them. Calling it “hiding” is backwards.
You are a human, you are allowed to create derivative works under the law. Copyright law as it relates to machines regurgitating what humans have created is fundamentally different. Future legislation will have to address a lot of the nuance of this issue.
And allowed get sued anyway
Another sensationalist title. The article makes it clear that the problem is users reconstructing large portions of a copyrighted work word for word. OpenAI is trying to implement a solution that prevents ChatGPT from regurgitating entire copyrighted works using “maliciously designed” prompts. OpenAI doesn’t hide the fact that these tools were trained using copyrighted works and legally it probably isn’t an issue.
you bought the book to memorize from, anyway.
No, I shoplifted it from an Aldi