• Falmarri@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    4
    ·
    1 year ago

    What’s the basis for this? Why can a human read a thing and base their knowledge on it, but not a machine?

    • BURN@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      arrow-down
      7
      ·
      1 year ago

      Because a human understands and transforms the work. The machine runs statistical analysis and regurgitates a mix of what it was given. There’s no understanding or transformation, it’s just what is statistically the 3rd most correct word that comes next. Humans add to the work, LLMs don’t.

      Machines do not learn. LLMs do not “know” anything. They make guesses based on their inputs. The reason they appear to be so right is the scale of data they’re trained on.

      This is going to become a crazy copyright battle that will likely lead to the entirety of copyright law being rewritten.

      • fkn@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        5
        ·
        1 year ago

        I don’t know if I agree with everything you wrote but I think the argument about llms basically transforming the text is important.

        Converting written text into numbers doesn’t fundamentally change the text. It’s still the authors original work, just translated into a vector format. Reproduction of that vector format is still reproduction without citation.

      • Dojan@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        It’s also the scale of their context, not just the data. More (good) data and lots of (good) varied data is obviously better, but the perceived cleverness isn’t owed to data alone.

        I do hope copyright law gets rewritten. It is dated and hasn’t kept up with society or technology at all.

      • atzanteol@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        1 year ago

        This is going to become a crazy copyright battle that will likely lead to the entirety of copyright law being rewritten.

        I think this is very unlikely. All of law is precedent.

        Google uses copyrighted works for many things that are “algorithmic” but not AI and people aren’t shitting themselves over it.

        Why would AI be different? So long as copyright isn’t infringed at least.

    • gcheliotis@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      5
      ·
      edit-2
      1 year ago

      That machine is a commercial product. Quite unlike a human being, in essence, purpose and function. So I do not think the comparison is valid here unless it were perhaps a sentient artificial being, free to act of its own accord. But that is not what we’re talking about here. We must not be carried away by our imaginations, these language models are (often proprietary and for profit) products.

      • Falmarri@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        1 year ago

        I don’t see how that’s relevant. A company can pay someone to read copyrighted work, learn from it, and then perform a task for the benefit of the company related to the learning.

        • krische@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          1 year ago

          But how did that person acquire the copyrighted work? Was the copyrighted material paid for?

          That’s the crux of the issue, Open AI isn’t paying for the copyrighted work they are “reading”, are they?

          • Falmarri@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            What does paying for anything have to do with what we’re talking about here. They’re ingesting freely available content, that anyone with a web browser could read