A new paper suggests diminishing returns from larger and larger generative AI models. Dr Mike Pound discusses.

The Paper (No “Zero-Shot” Without Exponential Data): https://arxiv.org/abs/2404.04125

  • Lvxferre@mander.xyz
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    8 months ago

    My personal take is that the current generation of generative models peaked, for the reasons stated in the video (diminishing returns). This current gen will be useful, but progress-wise it’ll be a dead end.

    In the future however I believe that models with a different architecture will cause a breakthrough, being able to perform better with less training. And probably less energy requirements, too.

  • bamboo@lemm.ee
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    8
    ·
    edit-2
    8 months ago

    I think it’s incredibly naïve to think that because we’ve hit a boundary on one particular aspect of LLMs that the technology has peaked as a whole. There are lots of ways to improve LLMs that aren’t just increasing the parameter size, for example there’s been an uptick in smaller models that are optimized to run on client devices without large GPUs. There is probably a future where we have small 3-7B models that are competitive with today’s best 70B models, but can run in real time on any smartphone. We’ll have larger context windows, allowing LLMs to work on larger problems. And we’ll have better techniques for getting high quality information out of LLMs, there are already adversarial methods where two LLMs hold a debate on a subject that have proven more accurate and comprehensive data is possible. They’ll also continue to be embedded into different places in software that make them more useful, not just like a chatbot that lives in its own world.

    • barsoap@lemm.eeOP
      link
      fedilink
      English
      arrow-up
      25
      ·
      8 months ago

      There are lots of ways to improve LLMs that aren’t just increasing the parameter size

      The paper isn’t about parameter size but the need for exponentially more training data to get a mere linear increase in output performance.

    • Murvel@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      8 months ago

      What you mentioned is assumed video and paper in question.

      The main argument being that no matter our computational techniques, the diminishing returns in predictive precision is reached far sooner than we achieve general intelligence.

      • Womble@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        No the argument is current techniques give logarithmic returns in data size, which is bad. But it said nothing about other potential techniques or made any suggestion that this was a general result.

        • Murvel@lemm.ee
          link
          fedilink
          English
          arrow-up
          4
          ·
          8 months ago

          Well obviously they cannot rule out techniques no one has though of but likewise they obviously accounted for what they deemed to be within the realm of possibility

  • randon31415@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    8 months ago

    I am just waiting until it makes the leap to 3D . With that, you will start seeing 3d assets in videogames become cheaper and quicker to make, VR rigging will soon follow, and when the tech reaches its peak - automated design for 3d printing.

  • chrash0@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    4
    ·
    8 months ago

    gotem!

    seriously tho, you don’t think OpenAI is tracking this? architecural improvements and training strategies are developing all the time

    • barsoap@lemm.eeOP
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      4
      ·
      8 months ago

      …and aren’t making progress on that front: A linear increase in generalisation still requires a more than linear increase in amount of data.

      Also it’s not btw that we wouldn’t know that our current architectures won’t lead to proper intelligence, tl:dr: While current architectures can learn, and represent information, they cannot develop learning strategies or decide smartly on how to represent a particular bit of information. All the improvement that are happening are on that “how to learn better” area, we have no idea whatsoever how to make the jump on how to teach an AI to learn how to learn. AlphaZero is able to learn rules of a game, yes, but it can’t learn arbitrary information – once you throw something other than a game at it it has no idea how to make sense of anything.

      • chrash0@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        6
        ·
        8 months ago

        “we don’t know how” != “it’s not possible”

        i think OpenAI more than anyone knows the challenges with scaling data and training. anyone working on AI knows the line: “a baby can learn to recognize elephants from a single instance”. reducing training data and time is fundamental to advancement. don’t get me wrong, it’s great to put numbers to these things. i just don’t think this paper is super groundbreaking or profound. a bit clickbaity and sensational for Computerphile

        • barsoap@lemm.eeOP
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          1
          ·
          8 months ago

          …and a baby doesn’t use the same architecture, not even close, as generative AIs. Babies are T3 systems, they aren’t simply systems which have rules on how to learn, they are systems which have rules on how to develop learning strategies that they then use to learn.

          I’m not doubting, in the slightest, that AI can’t get there: It’s definitely possible. It’s just not possible with the current approaches, and the iterative refinements that “oh OpenAI is constantly coming up with new topologies” refers to is just more of the same. Show me a topology that can come up with topologies, then we’ll have a chance to break through the need for exponential amounts of data.