What are some tricks to efficiently search for information on the internet?

cll7793@lemmy.world · 13 hours ago

What are some tricks to efficiently search for information on the internet?

ieatpwns@lemmy.world · 31 minutes ago

I know you came here for answers but how would one start making their own metadata search engine you got any guides to point me towards? I hate google so much I’m willing to learn to make my own search engine

theneverfox@pawb.social · 2 hours ago

Go back to 2022 and run your search then

porcoesphino@mander.xyz · 8 hours ago

Deny list plugins!?? I’d been looking for a search engine with that built it. It seems so obvious. I didn’t even think to look up a plugin. I had been writing keyword searches for browsers that manually added the query params for particularly frustrating results.

evasive_chimpanzee@lemmy.world · edit-2 2 hours ago

Just found uBlacklist.

Now to find something for whitelist searches (basically I only ever want recipes or medical information from a small list of sites).

Edit: duckduckgo has the capability built in, too

Tywèle [she|her]@lemmy.dbzer0.com · 7 hours ago

Kagi has that feature built in though it is a paid search engine.

Truscape@lemmy.blahaj.zone · edit-2 10 hours ago

Utilizing books from a shadow library like Anna’s archive (you can use Wikipedia to find the right domains), you can read prior written material for academic subjects, relevant books on various subjects from the pre-internet area, and so forth. Some users from newer fields (such as 3d printing/CAD) are going as far as to upload their PDF works onto Anna’s for distributed access.

bluemoon@piefed.social · edit-2 7 hours ago

StartPage, Mojeek, SearXNG, YaCy
hyperlink surfing “extranets”, as you would WikiMedia WikiPedia InternetArchive FediVerse posts etc.
webscrapers like Monolith etc. for offline PIR and just as you say convenience of having it all there

i look forward to reading what you come up with, because i am still kinda at the theoretical stage with keeping such a knowledgebase.

edit: i keep thinking a plaintext document of information is way simpler to deal with than webpages. at what point is information posted online preserved in it’s “original” form? just dumping this FediThread into a plaintext file or a folder of plaintext files with names being ‘hierarchy•postID•username’ or something so it is presented self-organized.

OP is ¤, 1st rank comments are ¤a ¤b ¤c and 2nd rank comments attached to comment ¤a are ¤a-a ¤a-b ¤a-c and 3rd rank comments attached to ¤a-c are ¤a-c-a ¤a-c-b ¤a-c-c so on. this then lists itself in a self-organized way, given all ASCII & unicode characters are provided in order. not just a-Z… because that would limit size of posts to take on.

ofcourse more difficult and complicated solutions like selfhosting webservers and managing ports and databases exist… not that i grasp the necessity for so many services.

CrocodilloBombardino@piefed.social · 12 hours ago

Use resources available through your library’s website

Tollana1234567@lemmy.today · edit-2 11 hours ago

for research, stem, look for sites like researchgate, and others for peer reviewed papers. articles, magazines, blogs are not good sources unless they are citing said research paper that links you to the proper site, and important to not put it out of context which might lull people into pseudoscience beliefs. some people jump the gun on these sites which are basically articles, often using dumbed down wording. universities/colleges often have access to most if not the full library of papers, that usually are behind paywalls of publishers, if yuo somehow can get acces to those go for it.

paequ2@lemmy.today · 12 hours ago

Ask Lemmy.

INHALE_VEGETABLES@aussie.zone · 7 hours ago

You know how unhinged our grasp on reality is, right?

remon@ani.social · 7 hours ago

Well, OP just said “efficiently” … nothing about the quality. So you are technically correct.

Kissaki@feddit.org · 12 hours ago

Help, I’m in a loop - between this ask Lemmy and this comment

e0qdk@reddthat.com · 11 hours ago

If you’re interested in building a new general purpose search engine, it probably makes the most sense to start with Common Crawl’s data set and augment it rather than starting from scratch.

your_paranoid_neighbour@lemmy.dbzer0.com · 11 hours ago

Searxng with brave, duckduckgo, google, mullvadleta, mullvadleta brave and qwant as the search engines. Law of big numbers makes it quite useful.

WhatsHerBucket@lemmy.world · 12 hours ago

There are some paid options that are pretty good (I’m thinking Kagi).

Easy, but one obvious downside.

porcoesphino@mander.xyz · edit-2 8 hours ago

Does Kagi let you add a domain to a denylist (like a new well SEOed site thats genAI with inaccuracies you’ve noticed), or positively bias search results (like saying you know you want Wikipedia entries high in the list)?

evasive_chimpanzee@lemmy.world · 2 hours ago

For whatever reason, wikipedia seems to have been really pushed down the page on search engines specifically for medical information. It’s a shame because I can acquire the surface level of information (which is all i really ever need) way faster from wikipedia than the other sites that come to the top of the list (mayo clinic, John’s Hopkins, Cleveland clinic, govt sites).

I really shouldn’t complain about it too much, cause they could be pushing pseudoscience blogs.

baggachipz@sh.itjust.works · 6 hours ago

It’s one of their best features. No ads being the best, since that also means you get real results and no “sponsored” bullshit. They also have ai slop filters.

Tywèle [she|her]@lemmy.dbzer0.com · edit-2 7 hours ago

It does. You can outright block domains, rank them higher or lower and I think even pin them to the top.