• abhibeckert@lemmy.world
      link
      fedilink
      English
      arrow-up
      51
      arrow-down
      6
      ·
      edit-2
      9 months ago

      Every video ever created is copyrighted.

      The question is — do they need a license? Time will tell. This is obviously going to court.

      • iknowitwheniseeit@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        2
        ·
        9 months ago

        There are definitely non copyrighted videos! Both old videos (all still black and white I think) and also things released into the public domain by copyright holders.

        But for sure that’s a very small subset of videos.

    • Bogasse@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      9 months ago

      And on the other hand it is a very obvious question to expect. If you have something hide how on the world are you not prepared for this question !? 🤡

    • VirtualOdour@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      3
      ·
      9 months ago

      It’s a question that is based on a purposeful misunderstanding of the technology, it’s like expecting a bee keeper to know each bees name and bedtime. Really it’s like asking a bricklayer where each brick came from in the pile, He can tell you the batch but not going to know this brick came from the forth row of the sixth pallet, two from the left. There is no reason to remember that it’s not important to anyone.

      The don’t log it because it would take huge amounts of resources and gain nothing.

  • CosmoNova@lemmy.world
    link
    fedilink
    English
    arrow-up
    46
    arrow-down
    7
    ·
    edit-2
    9 months ago

    I almost want to believe they legitimately do not know nor care they‘re committing a gigantic data and labour heist but the truth is they know exactly what they‘re doing and they rub it under our noses.

    • laxe@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      3
      ·
      9 months ago

      Of course they know what they’re doing. Everybody knows this, how could they be the only ones that don’t?

    • Bogasse@lemmy.ml
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      3
      ·
      9 months ago

      Yeah, the fact that AI progress just relies on “we will make so much money that no lawsuit will consequently alter our growth” is really infuriating. The fact that general audience apparently doesn’t care is even more infuriating.

      • toddestan@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        9 months ago

        I’d say not really, Tolkien was a writer, not an artist.

        What you are doing is violating the trademark Middle-Earth Enterprises has on the Gandalf character.

        • A_Very_Big_Fan@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          9 months ago

          The point was that I absorbed that information to inform my “art”, since we’re equating training with stealing.

          I guess this would have been a better example lol. It’s clearly not Gandalf, but I wouldn’t have ever come up with it if I hadn’t seen that scene

    • jaemo@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      7
      ·
      9 months ago

      It also tells us how hypocritical we all are since absolutely every single one of us would make the same decisions they have if we were in their shoes. This shit was one bajillion percent inevitable; we are in a river and have been since we tilled soil with a plough in the Nile valley millennia ago.

      • adrian783@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        4
        ·
        9 months ago

        most of us would never be in their shoes because most of us are not sociopathic techbros

    • blazeknave@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      3
      ·
      9 months ago

      I feel like at their scale, if there’s going to be a figure head marketable CTO, it’s going to be this company. If not, you’re right, and she’s lying lol

  • TheObviousSolution@lemm.ee
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    4
    ·
    9 months ago

    Then wipe it out and start again once you have where your data is coming from sorted out. Are we acting like you having built datacenter pack full of NVIDIA processors just for this sort of retraining? They are choosing to build AI without proper sourcing, that’s not an AI limitation.

  • Fedizen@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    9 months ago

    this is why code AND cloud services shouldn’t be copyrightable or licensable without some kind of transparency legislation to ensure people are honest. Either forced open source or some kind of code review submission to a government authority that can be unsealed in legal disputes.

  • RatBin@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 months ago

    Obviously nobody fully knows where so much training data come from. They used Web scraping tool like there’s no tomorrow before, with that amount if informations you can’t tell where all the training material come from. Which doesn’t mean that the tool is unreliable, but that we don’t truly why it’s that good, unless you can somehow access all the layers of the digital brains operating these machines; that isn’t doable in closed source model so we can only speculate. This is what is called a black box and we use this because we trust the output enough to do it. Knowing in details the process behind each query would thus be taxing. Anyway…I’m starting to see more and more ai generated content, YouTube is slowly but surely losing significance and importance as I don’t search informations there any longer, ai being one of the reasons for this.

  • dezmd@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    20
    ·
    9 months ago

    LLM is just another iteration of Search. Search engines do the same thing. Do we outlaw search engines?

    • AliasAKA@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      3
      ·
      9 months ago

      SoRA is a generative video model, not exactly a large language model.

      But to answer your question: if all LLMs did was redirect you to where the content was hosted, then it would be a search engine. But instead they reproduce what someone else was hosting, which may include copyrighted material. So they’re fundamentally different from a simple search engine. They don’t direct you to the source, they reproduce a facsimile of the source material without acknowledging or directing you to it. SoRA is similar. It produces video content, but it doesn’t redirect you to finding similar video content that it is reproducing from. And we can argue about how close something needs to be to an existing artwork to count as a reproduction, but I think for AI models we should enforce citation models.

      • dezmd@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        11
        ·
        9 months ago

        How does a search engine know where to point you? It injests all that data and processes it ‘locally’ on the search engines systems using algorithms to organize the data for search. It’s effectively the same dataset.

        LLM is absolutely another iteration of Search, with natural language ouput for the same input data. Are you advocating against search engine data injest as not fair use and copyright violations as well?

        You equate LLM to Intelligence which it is not. It is algorithmic search interation with natural language responses, but that doesn’t sound as cool as AI. It’s neat, it’s useful, and yes, it should cite the sourcing details (upon request), but it’s not (yet?) a real intelligence and is equal to search in terms of fair use and copyright arguments.

        • AliasAKA@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          2
          ·
          9 months ago

          I never equated LLMs to intelligence. And indexing the data is not the same as reproducing the webpage or the content on a webpage. For you to get beyond a small snippet that held your query when you search, you have to follow a link to the source material. Now of course Google doesn’t like this, so they did that stupid amp thing, which has its own issues and I disagree with amp as a general rule as well. So, LLMs can look at the data, I just don’t think they can reproduce that data without attribution (or payment to the original creator). Perplexity.ai is a little better in this regard because it does link back to sources and is attempting to be a search engine like entity. But OpenAI is not in almost all cases.

    • dantheclamman@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      9 months ago

      I feel conflicted about the whole thing. Technically it’s a model. I don’t feel that people should be able to sue me as a scientist for making a model based on publicly available data. I myself am merely trying to use the model itself to explain stuff about the world. But OpenAI are also selling access to the outputs of the model, that can very closely approximate the intellectual property of people. Also, most of the training data was accessed via scraping and other gray market methods that were often explicitly violating the TOU of the various places they scraped from. So it all is very difficult to sort through ethically.