BBC will block ChatGPT AI from scraping its content

L4sBot@lemmy.world · 2 years ago

BBC will block ChatGPT AI from scraping its content

csm10495@sh.itjust.works · 2 years ago

I wonder if anyone thinks robots.txt is binding or not ignored by anyone who wants.

totallynotfbi@lemm.ee · 2 years ago

I mean, you could just block OpenAI’s crawlers’ IP addresses, if you wanted to

Noite_Etion@lemmy.world · 2 years ago

Big businesses wont lift a finger to halt global warming, but the second their precious copyrights are attacked they go into full force.

Moneo@lemmy.world · 2 years ago

I mean, yeah? Corporations are always going to act in their best interest, that’s why regulation exists.

Free Palestine 🇵🇸@sh.itjust.works · 2 years ago

Kinda late

porkins@sh.itjust.works · 2 years ago

I’d rather have ChatGPT know about news content than not. I appreciate the convenience. The news shouldn’t have barriers.

Free Palestine 🇵🇸@sh.itjust.works · 2 years ago

But ChatGPT often takes correct and factual sources and adds a whole bunch of nonsense and then spits out false information. That’s why it’s dangerous. Just go to the fucking news websites and get your information from there. You don’t need ChatGPT for that.

Touching_Grass@lemmy.world · 2 years ago

You just described news

guacupado@lemmy.world · 2 years ago

More data fixes that flaw, not less.

CurlyMoustache@lemmy.world · 2 years ago

It is not “a flaw”, it is the way language learning models work. They try to replicate how humans write by guessing based on a language model. It has no knowledge of what is a fact or not, and that is why using LLMs to do research or use them as a search engine is both stupid and dangerous

Touching_Grass@lemmy.world · edit-2 2 years ago

How would it hallucinate information from an article you gave it. I haven’t seen it make up information by summarizing text yet. I have seen it happen when I ask it random questions

CurlyMoustache@lemmy.world · 2 years ago

It does not hallucinate, it guesses based on the model to make you think the text could be written by a human. Personal experience when I ask into summarize a text. It has errors in it, and sometimes it adds stuff to it. Same if you for instance ask it to make an alphabetic a list of X numbers of items. It may add random items.

Touching_Grass@lemmy.world · 2 years ago

I’ve had it make up things if I ask it for a list of say 5 things but there’s only 4 things worth listing. I haven’t seen it stray from summarizing something I’ve fed it though. If its giving text, its been pretty accurate. Only gets funky when you ask it things where information isn’t available. Then it goes with what you probably want

Free Palestine 🇵🇸@sh.itjust.works · 2 years ago

Not too long ago, ChatGPT didn’t know what year it is. You’re telling me it needs more data than it already has to figure out the current year? I like AI for certain things (mostly some programming/scripting stuff) but you definitely don’t need it to read the news.

ours@lemmy.film · edit-2 2 years ago

Yes. The LLM doesn’t know what year it currently is, it needs to get that info from a service and then answer.

It’s a Large Language Model. Not an actual sentient being.

Free Palestine 🇵🇸@sh.itjust.works · 2 years ago

That’s a fucking lame excuse. AI is not reliable, and you definitely shouldn’t use it to get your news.

ours@lemmy.film · 2 years ago

It’s not an excuse, relax, it’s just how it works and I don’t see where I’m endorsing it to get your news.

Orygin@sh.itjust.works · 2 years ago

deleted by creator

Apollo@sh.itjust.works · edit-2 2 years ago

Who get their news from chatgpt lol

Flying Squid@lemmy.world · 2 years ago

A disturbing number of people.

Touching_Grass@lemmy.world · edit-2 2 years ago

You don’t get your news from it but building tools can be useful. Scrapping news websites to measure different articles for thinga like semantic analysis or identify media tricks that manipulate readers is a fun practice. You can use llm to identify propaganda much easier. I can get why media would be scared that regular people can run these tools on their propaganda machine easily.

spez_@lemmy.world · 2 years ago

I do

Apollo@sh.itjust.works · 2 years ago

Why?

prashanthvsdvn@lemmy.world · 2 years ago

It’s funny seeing Apollo and spez_ fighting on a topic regarding ChatGPT.

Apollo@sh.itjust.works · 2 years ago

Natural enemies must fight

abhibeckert@lemmy.world · edit-2 2 years ago

Because ChatGPT doesn’t do clickbait headlines or have auto-play video ads, auto play video news that follows me if I try to scroll past it, or a house ad that tries to convince me to stop reading the news and instead read a puff piece about how to clean my water bottle. Which I’d bet fifty bucks will result in me seeing ads for new water bottles every day for the next month. No thanks.

With the “Web Browsing” plugin, which essentially does a Bing search then summarises the result, ChatGPT is a far better experience if you want to find out what’s going on in Israel today for example.

Ad4mWayn3@lemmy.world · 2 years ago

Neither does lemmy, here (and in other instances) there’s plenty of communities for news, and with better control of misinformation.

ManOMorphos@lemmy.world · edit-2 2 years ago

Reuters is pretty good. No autoplay vids, only 1-2 quiet ads an article, and is mainly cut-and-dry news.

No news source is 100% reliable, but I can easily see AI picking up bad information or misinterpreting human text. Nothing wrong with AI news by itself, but it’s a good habit to verify any source by yourself.

Regardless I recommend UBlock for any device or browser. Ads are over the line nowadays so I don’t feel bad blocking them when possible.

C4d@lemmy.world · edit-2 2 years ago

The pure ChatGPT output would probably be garbage. The dataset will be full of all manner of sources (together with their inherent biases) together with spin, untruths and outright parody and it’s not apparent that there is any kind of curation or quality assurance on the dataset (please correct me if I’m wrong).

I don’t think it’s a good tool for extracting factual information from. It does seem to be good at synthesising prose and helping with writing ideas.

I am quite interested in things like this where the output from a “knowledge engine” is paired with something like ChatGPT - but it would be for eg writing a science paper rather than news.

Touching_Grass@lemmy.world · 2 years ago

I don’t think its generating news. Sounds like people are using it to reformat articles already writing to remove all the bullshit propganada from the news. Like taking a fox news article and just pulling out key information

C4d@lemmy.world · 2 years ago

Exactly. The data harvest has had years in the making.

patawan@lemmy.world · 2 years ago

Curious what the mechanism for this will be. CAPTCHA can sometimes be relatively easy to pass and at worst can be farmed out to humans.

body_by_make@lemmy.dbzer0.com · 2 years ago

ChatGPT took down its Internet search to implement a robots.txt rule it would obey and allow content providers time to add it to their lists. This was done because they were being used to get around paywalls. So it’s actually very easy for them to do this for ChatGPT, specifically, which makes articles like this ridiculous.

callmepk@lemmy.world · 2 years ago

Also FYI, you can see what some of the most popular websites that already blocked ChatGPT: https://wayde.gg/websites-blocking-openai

flossdaily@lemmy.world · 2 years ago

This is a bit like companies blocking Google from their websites.

You’re only hurting yourself.

xenomor@lemmy.world · 2 years ago

It should be illegal for entities like BBC to do this. Copyright is meant to be a temporary, limited construct that carves out an opportunity for creators to profit from their works. It is not perpetual legal dominion over specific ideas. Entities that harvest content to train LLMs should pay for access like everyone else, but after that, they can use the information they learn however they see fit. Now, if their product plagiarizes, or doesn’t properly attribute authorship, that is a problem. But it’s a different issue from what the BBC is fighting here.

I think there are some content creators that believe they are owed royalties if you even think about a piece they wrote or drew. That is, of course, absurd in terms of human minds. It’s also absurd in terms of other kinds of minds.

Touching_Grass@lemmy.world · edit-2 2 years ago

News doesn’t want people to capture their daily propaganda pieces and be able to analyze it.

Meanwhile news media will buy up all kinds of scrapped data on users to better target their propaganda.

Cambridge analytica for me but none for thee

BBC will block ChatGPT AI from scraping its content

BBC will block ChatGPT AI from scraping its content

BBC Will Block ChatGPT AI From Scraping Its Content