A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine::Researchers warn that most of the text we view online has been poorly translated into one or more languages—usually by a machine.

  • Brkdncr@lemmy.world
    link
    fedilink
    English
    arrow-up
    63
    ·
    1 year ago

    I recently was searching for some tips on overlanding routes. So many sites are just long strung together SEO word salad.

  • grue@lemmy.world
    link
    fedilink
    English
    arrow-up
    51
    arrow-down
    1
    ·
    1 year ago

    I’ve been saying for quite a while now that the Internet was best in the '90s and early 2000s back before it was commercialized, even despite all the “under construction” gifs and whatnot. The signal/noise ratio has only continued to drop since then.

    • maness300@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      1 year ago

      Counterpoint: the Internet still exists as it did back then, but relatively smaller compared to what it’s become.

      You just need to find the right people and content to interact with, which is harder now because there’s so much more garbage. I’d say they have grown in absolute numbers.

    • jawa21@startrek.website
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      4
      ·
      1 year ago

      You forgot the pop-ups, forced midi music, easily injected malware, difficulty in verifying sources, html frames that frequently broke, the entire concept of needing a site map, fucking keywords, true banner ads that could force clicks with Javascript, and RealPlayer to name a few. I don’t miss it at all.

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        16
        arrow-down
        1
        ·
        edit-2
        1 year ago

        No, I didn’t forget anything. It was still better even despite all that.

    • wikibot@lemmy.worldB
      link
      fedilink
      English
      arrow-up
      50
      arrow-down
      1
      ·
      1 year ago

      Here’s the summary for the wikipedia article you mentioned in your comment:

      The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists mainly of bot activity and automatically generated content that is manipulated by algorithmic curation, marginalizing organic human activity. Proponents of the theory believe these bots are created intentionally to help manipulate algorithms and boost search results in order to ultimately manipulate consumers. Furthermore, some proponents of the theory accuse government agencies of using bots to manipulate public perception, stating “The U. S. government is engaging in an artificial intelligence powered gaslighting of the entire world population”.

      to opt out, pm me ‘optout’. article | about

  • SomeGuy69@lemmy.world
    link
    fedilink
    English
    arrow-up
    31
    ·
    1 year ago

    I need an AI Firefox extension that detects badly translated AI text and automatically blocks those domains.

  • aesthelete@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    1
    ·
    edit-2
    1 year ago

    For a time I thought this Fediverse thing would help or change things or something, but honestly…the Internet is just plain boring now…and it’s pretty clear what is causing that: AI / SEO trash content, social media’s rise, and commercialization of the Internet generally.

    One day I was even feeling nostalgic so I went back to where I spent hours upon hours of my youth: EFNet on IRC…there was basically nobody there and of the few channels I saw some were even Trump-leaning weirdo “communities”.

    It’s basically finished. I can’t even find a decent place to procrastinate or hang out anymore on this POS. It’s all just a giant ad surface and e-commerce portal. The fucking owners won.

    • Just_Pizza_Crust@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      1 year ago

      The fucking owners won.

      Always has been 🔫

      That said, I would suggest smaller communities and private messaging. Find your niche and make it home.

      • Jax@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 year ago

        Yep, it might have been hijacked by consumers but it’s still a communication network.

    • AMDIsOurLord@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      1 year ago

      EFNet is boomer shit. Most of IRC happens on other servers now, like LiberaChat, or on new protocols like Matrix.

      We’re still here, we’re still alive

  • Jayu@lemm.ee
    link
    fedilink
    English
    arrow-up
    25
    ·
    1 year ago

    The most annoying aspect of this is when you know actual information has to be out there, but it is being drowned out by dozens of sites reposting the less relevant and low quality information… And then you go to search in another language and you see substandard machine translations of all the garbage you were just fleeing, lol.

    • TheRealKuni@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      1 year ago

      I was trying to find the radius of the corner of the iPad Pro. Not the screen, the actual device. No matter what I modified my search term to all I could find was information about the screen corner (and how it isn’t a true radius and blah blah blah) or AI generated bullshit.

      Eventually I gave up and changed the way I was tackling my project. I know the info is out there, people make cases for these things.

    • Misconduct@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      It’s getting to the point where I have to use AI to help me sift through all the AI bullshit :(

  • maegul (he/they)@lemmy.ml
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 year ago

    The whole webring idea needs to come back. Human curated recommendations of good resources and pages. So long as these pages remain in the control of humans and dedicated to curation and are decentralised, unlike the search engines, then they’ll be reliable.

    Plugging in some social and community organisation, perhaps like a wiki, and you could get even more out of it.

  • BetaDoggo_@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    2
    ·
    edit-2
    1 year ago

    This isn’t shocking at all. The markets for obscure language content are incredibly small so there’s no incentive for most to spend resources on it. I’d argue mediocre machine translation is better than nothing at all in many cases, but for unsupervised training it does pose a challenge.

    • xantoxis@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      1 year ago

      They didn’t only look at low-resource languages, they just started there because that was the problem domain. They found that 57% of ALL sentences on the Internet appeared to be machine translated, including translations into high-resource languages. The remaining 43% might also be machine generated, it just wasn’t found to be part of a multi-way parallel group.

  • Falcon@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    1 year ago

    Translation is very different from generation.

    As a matter of fact, even AI generation has different grades of quality.

    SEO garbage is certainly not the same as an article with AI generated components and very different from a translated article.

  • Meowoem@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    6
    ·
    1 year ago

    This is basic math, articles are written in one language but there are lots of languages they can be translated into so if a site written in English has a Spanish, french, and Portuguese version 75% of that counts as ai translated garbage - because apparently having stuff available to non English speakers is a bad thing now?

    As for ‘poorly’ What’s their mechanism for determinng it? How much is well translated or are they just assuming it’s poor because it’s possible it could be? Likewise what percentage is human translated and how do they determine that? Or is it another assumption to fit their narrative?

    Clickbait doomer nonsence.