Choices have slowly been running out when it comes to effective search engines. It seems inevitable an open source search engine project independent from big tech will be needed.
Some of my own tricks are:
- Use the blacklist plugin to block sites from search.
- Search for forum sites and communities instead of specific queries. (Wikipedia has a list of forums that might be useful)
- For technical questions favor Q&A websites like stack exchange.
- YouTube videos often offer better information than results from search engines. (Use search engines instead of YT search)
- Look for blogs and journals that specialize in the topic you’re searching for.
- Use boolean search when possible.
- Self-host and customize your own metadata search engine. Create a graph network linking websites based on subject/topic. You may not be able to query specific questions but you can discover sites that you otherwise can’t in traditonal search. This is a great way to discover hidden gems! (Example: https://internet-map.net/)
- (Difficult) Self-host and scrape sites across the web in order to create your own query-able database. This would be the most effective way to search the internet and would be completely independent from potential enshittification and censorship. The cost however is quite high both in term of hardware and time. Kiwix offers a way to download websites for offline use. (Ex: Wikipedia, Stack exchange). This is a good starting point to build your own custom search engine.
I would love to hear the tips and tricks you use! I hope this post helps others in more efficiently finding information on the internet!
I know you came here for answers but how would one start making their own metadata search engine you got any guides to point me towards? I hate google so much I’m willing to learn to make my own search engine
Go back to 2022 and run your search then
Deny list plugins!?? I’d been looking for a search engine with that built it. It seems so obvious. I didn’t even think to look up a plugin. I had been writing keyword searches for browsers that manually added the query params for particularly frustrating results.
Just found uBlacklist.
Now to find something for whitelist searches (basically I only ever want recipes or medical information from a small list of sites).
Edit: duckduckgo has the capability built in, too
Kagi has that feature built in though it is a paid search engine.
Utilizing books from a shadow library like Anna’s archive (you can use Wikipedia to find the right domains), you can read prior written material for academic subjects, relevant books on various subjects from the pre-internet area, and so forth. Some users from newer fields (such as 3d printing/CAD) are going as far as to upload their PDF works onto Anna’s for distributed access.
- StartPage, Mojeek, SearXNG, YaCy
- hyperlink surfing “extranets”, as you would WikiMedia WikiPedia InternetArchive FediVerse posts etc.
- webscrapers like Monolith etc. for offline PIR and just as you say convenience of having it all there
i look forward to reading what you come up with, because i am still kinda at the theoretical stage with keeping such a knowledgebase.
edit: i keep thinking a plaintext document of information is way simpler to deal with than webpages. at what point is information posted online preserved in it’s “original” form? just dumping this FediThread into a plaintext file or a folder of plaintext files with names being ‘hierarchy•postID•username’ or something so it is presented self-organized.
OP is ¤, 1st rank comments are ¤a ¤b ¤c and 2nd rank comments attached to comment ¤a are ¤a-a ¤a-b ¤a-c and 3rd rank comments attached to ¤a-c are ¤a-c-a ¤a-c-b ¤a-c-c so on. this then lists itself in a self-organized way, given all ASCII & unicode characters are provided in order. not just a-Z… because that would limit size of posts to take on.
ofcourse more difficult and complicated solutions like selfhosting webservers and managing ports and databases exist… not that i grasp the necessity for so many services.
Use resources available through your library’s website
for research, stem, look for sites like researchgate, and others for peer reviewed papers. articles, magazines, blogs are not good sources unless they are citing said research paper that links you to the proper site, and important to not put it out of context which might lull people into pseudoscience beliefs. some people jump the gun on these sites which are basically articles, often using dumbed down wording. universities/colleges often have access to most if not the full library of papers, that usually are behind paywalls of publishers, if yuo somehow can get acces to those go for it.
Ask Lemmy.
You know how unhinged our grasp on reality is, right?
Well, OP just said “efficiently” … nothing about the quality. So you are technically correct.
Help, I’m in a loop - between this ask Lemmy and this comment
If you’re interested in building a new general purpose search engine, it probably makes the most sense to start with Common Crawl’s data set and augment it rather than starting from scratch.
Searxng with brave, duckduckgo, google, mullvadleta, mullvadleta brave and qwant as the search engines. Law of big numbers makes it quite useful.
There are some paid options that are pretty good (I’m thinking Kagi).
Easy, but one obvious downside.
Does Kagi let you add a domain to a denylist (like a new well SEOed site thats genAI with inaccuracies you’ve noticed), or positively bias search results (like saying you know you want Wikipedia entries high in the list)?
For whatever reason, wikipedia seems to have been really pushed down the page on search engines specifically for medical information. It’s a shame because I can acquire the surface level of information (which is all i really ever need) way faster from wikipedia than the other sites that come to the top of the list (mayo clinic, John’s Hopkins, Cleveland clinic, govt sites).
I really shouldn’t complain about it too much, cause they could be pushing pseudoscience blogs.
It’s one of their best features. No ads being the best, since that also means you get real results and no “sponsored” bullshit. They also have ai slop filters.
It does. You can outright block domains, rank them higher or lower and I think even pin them to the top.








