Wikipedia has a new initiative called WikiProject AI Cleanup. It is a task force of volunteers currently combing through Wikipedia articles, editing or removing false information that appears to have been posted by people using generative AI.

Ilyas Lebleu, a founding member of the cleanup crew, told 404 Media that the crisis began when Wikipedia editors and users began seeing passages that were unmistakably written by a chatbot of some kind.

  • sbv@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    75
    ·
    2 months ago

    As for why this is happening, the cleanup crew thinks there are three primary reasons.

    “[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,”

    That last one. Ouch.

    • BigDanishGuy@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      2 months ago

      Well, I was in doubt, so I asked the AI whether I could trust the answers and it told me not to worry about it. That must mean that I only get accurate answers, right? /s

  • narc0tic_bird@lemm.ee
    link
    fedilink
    English
    arrow-up
    36
    arrow-down
    1
    ·
    2 months ago

    Best case is that the model used to generate this content was originally trained by data from Wikipedia so it “just” generates a worse, hallucinated “variant” of the original information. Goes to show how stupid this idea is.

    Imagine this in a loop: AI trained by Wikipedia that then alters content on Wikipedia, which in turn gets picked up by the next model trained. It would just get worse and worse, similar to how converting the same video over and over again yields continuously worse results.

    • Wrench@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 months ago

      Yes, this is what many of us worry will become the internet in general. AI content generated on from AI trained on AI garbage.

      AI bots can trivially outpace humans.

      • kboy101222@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        2 months ago

        I was just discussing with a friend of mine how we’re rapidly approaching the dead internet. At some point, many websites will likely just be chat bots talking to other chat bots, which then gets used to train further chat bots. Human made content is already becoming harder and harder to find on algorithm heavy websites like Reddit and facebooks suite of sites. The bots can easily outpace any algorithmic changes they might make to help deter them, but my fb using family members all constantly block those weird Jesus accounts and they still show up constantly

    • 8uurg@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 months ago

      A very similar situation to that analysed in this paper that was recently published. The quality of what is generated degrades significantly.

      Although they mostly investigate replacing the data with ai generated data in each step, so I doubt the effect will be as pronounced in practice. Human writing will still be included and even curation of ai generated text by people can skew the distribution of the training data (as the process by these editors would inevitably do, as reasonable text could get through the cracks.)

      • Blaster M@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 month ago

        AI model makers are very well aware of this and there is a move from ingesting everything to curating datasets more aggressively. Data prep is something many upstarts have no idea is critical, but everyone is learning about, sometimes the hard way.

    • Bahnd Rollard@lemmy.world
      link
      fedilink
      English
      arrow-up
      25
      ·
      2 months ago

      They used to be contained, every village has their idiot. Now that the internet is the global village, all the formerly isolated idiots have a place to chat.

  • randon31415@lemmy.world
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    1
    ·
    2 months ago

    If anyone can survive the AI text apocalypse, it is wikipedia. They have been fending off and regulating article writing bots since someone coded up a US town article writer from the 2000 census (not the 2010 or 2020 census, the 2000 census. This bot was writing wikipedia articles in 2003)

  • nutsack@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    ·
    2 months ago

    why the fuck would anyone stick ai shit on wikipedia that doesn’t make any sense

    • NateNate60@lemmy.world
      link
      fedilink
      English
      arrow-up
      28
      ·
      2 months ago

      “[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,” Lebleu said.

    • InverseParallax@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 months ago

      The irony being a huge amount of the llm knowledge was based on WP in the first place, that and scientific papers.

  • RubberDuck@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    2 months ago

    Require someone that wants to add stuff to pay a small amount to the Wikimedia Foundation for activating their account and refund it if they moderate a certain amount.

    • aubertlone@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 months ago

      Yeah I mean I’ve had minor edits reversed because I didn’t source the fact properly

      And that was like 10 years ago I’m surprised these edits are getting through in the first place

      • Shdwdrgn@mander.xyz
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 months ago

        Seems like that would be an easy problem to solve… require all edits to have a peer review by someone with a minimum credibility before they go live. I can understand when Wikipedia was new, allowing anyone to post edits or new content helped them get going. But now? Why do they still allow any random person to post edits without a minimal amount of verification? Sure it self-corrects given enough time, but meanwhile what happens to all the people looking for factual information and finding trash?

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          Or at least give it a certain amount of time before it goes live. So if nobody comes around to approve it in 24 hours, it goes live.

          Usually bad edits are corrected within hours, if not minutes, so that should catch the lion’s share w/o bogging down the approval queue too much.

        • RubberDuck@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          2 months ago

          Croudsourcing is the strenght that led to the vast resource and also the weakness as displayed here. So probably there will be a need for some form of barrier. Hence my suggestion.