cross-posted from: https://nom.mom/post/121481
OpenAI could be fined up to $150,000 for each piece of infringing content.https://arstechnica.com/tech-policy/2023/08/report-potential-nyt-lawsuit-could-force-openai-to-wipe-chatgpt-and-start-over/#comments
While that’s understandable, I think it’s important to recognize that this is something where we’re going to have to treat pretty carefully.
If a human wants to become a writer, we tell them to read. If you want to write science fiction, you should both study the craft of writing ranging from plots and storylines to character development to Stephen King’s advice on avoiding adverbs. You also have to read science fiction so you know what has been done, how the genre handles storytelling, what is allowed versus shunned, and how the genre evolved and where it’s going. The point is not to write exactly like Heinlein (god forbid), but to throw Heinlein into the mix with other classic and contemporary authors.
Likewise, if you want to study fine art, you do so by studying other artists. You learn about composition, perspective, and color by studying works of other artists. You study art history, broken down geographically and by period. You study DaVinci’s subtle use of shading and Mondrian’s bold colors and geometry. Art students will sit in museums for hours reproducing paintings or working from photographs.
Generative AI is similar. Being software (and at a fairly early stage at that), it’s both more naive and in some ways more powerful than human artists. Once trained, it can crank out a hundred paintings or short stories per hour, but some of the people will have 14 fingers and the stories might be formulaic and dull. AI art is always better when glanced at on your phone than when looked at in detail on a big screen.
In both the cases of human learners and generative AI, a neural network(-like) structure is being conditioned to associate weights between concepts, whether it’s how to paint a picture or how to create one by using 1000 words.
A friend of mine who was an attorney used to say “bad facts make bad law.” It means that misinterpretation, over-generalization, politicization, and a sense of urgency can make for both bad legislation and bad court decisions. That’s especially true when the legislators and courts aren’t well educated in the subjects they’re asked to judge.
In a sense, it’s a new technology that we don’t fully understand - and by “we” I’m including the researchers. It’s theoretically and in some ways mechanically grounded in old technology that we also don’t understand - biological neural networks and complex adaptive systems.
We wouldn’t object to a journalism student reading articles online to learn how to write like a reporter, and we rightfully feel anger over the situation of someone like Aaron Swartz. As a scientist, I want my papers read by as many people as possible. I’ve paid thousands of dollars per paper to make sure they’re freely available and not stuck behind a paywall. On the other hand, I was paid while writing those papers. I am not paid for the paper, but writing the paper was part of my job.
I realize that is a case of the copyright holder (me) opening up my work to whoever wants a copy. On the other other hand, we would find it strange if an author forbade their work being read by someone who wants to learn from it, even if they want to learn how to write. We live in a time where technology makes things like DRM possible, which attempts to make it difficult or impossible to create a copy of that work. We live in societies that will send people to prison for copying literal bits of information without a license to do so. You can play a game, and you can make a similar game. You can play a thousand games, and make one that blends different elements of all of them. But if you violate IP, you can be sued.
I think that’s what it comes down to. We need to figure out what constitutes intellectual property and what rights go with it. What constitutes cultural property, and what rights do people have to works made available for reading or viewing? It’s easy to say that a company shouldn’t be able to hack open a paywall to get at WSJ content, but does that also go for people posting open access to Medium?
I don’t have the answers, and I do want people treated fairly. I recognize the tremendous potential for abuse of LLMs in generating viral propaganda, and I recognize that in another generation they may start making a real impact on the economy in terms of dislocating people. I’m not against legislation. I don’t expect the industry to regulate itself, because that’s not how the world works. I’d just like for it to be done deliberately and realistically and with the understanding that we’re not going to get it right and will have to keep tuning the laws as the technology and our understanding continue to evolve.
Sorry this is a bit too level-headed for me, can you please repeat with a bullhorn, and use 4-letter words instead? I need to know who to blame here.
Well written sir.