Firefox is the only major browser not backed by a billionaire and our independence shapes everything we build. This independence allows us to prioritize bu
Wish they’d make bookmark not suck so much that using them felt like a commitment to organisationnal chores.
The bookmark system is largely unchanged since the netscape days.
You cant search texts inside bookmarks because they only store the url. Which will break. Instead of saving the html itself, as if we still only has hundreds of gigabytes.
It should have a library level search system, capable of not just symbol text but intelligent summarization, categorization, search by relecant, content discovery algorithm, rss feed support all fully local, offline capable.
The whole thing, metadata, html, inages, video, files, code, replay of the changes over time. Yes I should be able to replay clicking “read more” as I expand comments on facebook. I should not lose my work to a page reload ever again. And no that’s nor “too much space”. Web pages are largely text sent super efficiently it is not that much information even compared to a gigabyte.
I’ve actually been thinking about this a lot. “Save Webpage” is useless nowadays because everything is loaded externally through scripts. What if it saved a timeline of requests and responses somehow and could play it back? This might require recording the entire JS state though… and so much more with browser APIs. Saving just the requests+responses as a cache would fail if the scripting was non-deterministic. Maybe it would make sense to literally save a “recording” of the HTML and CSS changes, playing back only the results of any network requests or JS?
This would be a whole new pipeline to make interactivity work. Emulating a server with cached responses would allow to reuse the JS part of websites and is easier to do. I have no doubt that some pages wouldn’t work and there would be a shitton of security considerations I can’t imagine.
We can save entire operating systems in that way, the heavy burden is borne by the hardware, as far as the software is concerned it is to dump the memory snapshot of the engine into a file and reload it later.
I mean, it’s been almost 30 years and this aspect hasn’t evolved because of a long expired belief that we will be able to re-download it all later as if the internet wasn’t eventually going to churn over and all links will eventually break.
Ok, so your average site doesn’t download content directly. The initial load is just the framework required to fetch and render the content dynamically.
Short of just crawling the whole site, there is no real way to know what, when or why a thing is loaded into memory.
You can’t even be sure that some pages will stay the same after every single refresh.
Comparing it to saving the state of OS isn’t fair because the state is in one place. On the machine running the code. The difference here is that the state of the website is not in control of the browser and there’s no standard way to access it in a way that would allow what you’re describing.
Now, again, saving rendered HTML is trivial, but saving the whole state of a dynamic website require a full on web crawler and then not only loading saved pages and scripts, but also emulating the servers to fetch the data rendered.
It could crawl elements within the DOM to save a word cloud of visible text for each bookmark as metadata for later searches. I think it’s doable. Separating nonvisible and visible stuff is very difficult though.
This is supported, but not integrated in bookmark lookup. I mean, if you hit ctrl+s, the browser will save currently rendered HTML. No crawling required. Hooking up some text indexing for search seems perfectly doable.
I understand a VM isn’t the same since at least it is somewhat self-contained.
But at the end of the day, a browser does end up showing you something and has a stable state waiting for your input.
These stable moments are like checkpoints or snapshots that can be saved in place, the whole render engine state machine. And that can be saved at multiple times, similar to how internet archive takes periodic static snapshots of websites.
It should be trivial, a one-click action for the user to save the last couple of these checkpoint states to a format that can be consulted later and offline or after the website has gone. Whether that’s just saving “everything” it needs to recreate the machine state, or by saving only the machine state itself.
That doesn’t mean the whole website will remain interactive but it will at the very least preserve what was inside the scroll buffer of the browser
And that is a LOT better than just saving a broken link, or just saving a scrolling screenshot, which already would be an improvement over the current state of things.
It would also allow a text search of the page content of all bookmarked pages. Which would be huge since the current bookmark manager can barely search titles and very poorly at that.
The bookmarks system is long LONG due for a full overhaul
This “machine state” definition and manipulation is exactly the hard part of the concept. I’m not saying it can’t be done, but it’s a beast of a problem.
Our best current solutions are just dumb web crawler bots.
To me a simple page saving (ctrl+s) integration seems like a most realistic solution.
Pocket is one service of theirs I did use from time to time. Save an article you want to read later without committing it to a bookmark.
Wish they’d make bookmark not suck so much that using them felt like a commitment to organisationnal chores. The bookmark system is largely unchanged since the netscape days.
You cant search texts inside bookmarks because they only store the url. Which will break. Instead of saving the html itself, as if we still only has hundreds of gigabytes.
It should have a library level search system, capable of not just symbol text but intelligent summarization, categorization, search by relecant, content discovery algorithm, rss feed support all fully local, offline capable.
The whole thing, metadata, html, inages, video, files, code, replay of the changes over time. Yes I should be able to replay clicking “read more” as I expand comments on facebook. I should not lose my work to a page reload ever again. And no that’s nor “too much space”. Web pages are largely text sent super efficiently it is not that much information even compared to a gigabyte.
What you’re describing is so much more difficult from a technical standpoint than you give it credit.
Static pages – sure, the plague of single page applications – oof, that’s a challenge.
I’ve actually been thinking about this a lot. “Save Webpage” is useless nowadays because everything is loaded externally through scripts. What if it saved a timeline of requests and responses somehow and could play it back? This might require recording the entire JS state though… and so much more with browser APIs. Saving just the requests+responses as a cache would fail if the scripting was non-deterministic. Maybe it would make sense to literally save a “recording” of the HTML and CSS changes, playing back only the results of any network requests or JS?
This would be a whole new pipeline to make interactivity work. Emulating a server with cached responses would allow to reuse the JS part of websites and is easier to do. I have no doubt that some pages wouldn’t work and there would be a shitton of security considerations I can’t imagine.
We can save entire operating systems in that way, the heavy burden is borne by the hardware, as far as the software is concerned it is to dump the memory snapshot of the engine into a file and reload it later.
I mean, it’s been almost 30 years and this aspect hasn’t evolved because of a long expired belief that we will be able to re-download it all later as if the internet wasn’t eventually going to churn over and all links will eventually break.
Ok, so your average site doesn’t download content directly. The initial load is just the framework required to fetch and render the content dynamically.
Short of just crawling the whole site, there is no real way to know what, when or why a thing is loaded into memory.
You can’t even be sure that some pages will stay the same after every single refresh.
Comparing it to saving the state of OS isn’t fair because the state is in one place. On the machine running the code. The difference here is that the state of the website is not in control of the browser and there’s no standard way to access it in a way that would allow what you’re describing.
Now, again, saving rendered HTML is trivial, but saving the whole state of a dynamic website require a full on web crawler and then not only loading saved pages and scripts, but also emulating the servers to fetch the data rendered.
It could crawl elements within the DOM to save a word cloud of visible text for each bookmark as metadata for later searches. I think it’s doable. Separating nonvisible and visible stuff is very difficult though.
This is supported, but not integrated in bookmark lookup. I mean, if you hit ctrl+s, the browser will save currently rendered HTML. No crawling required. Hooking up some text indexing for search seems perfectly doable.
I understand a VM isn’t the same since at least it is somewhat self-contained.
But at the end of the day, a browser does end up showing you something and has a stable state waiting for your input. These stable moments are like checkpoints or snapshots that can be saved in place, the whole render engine state machine. And that can be saved at multiple times, similar to how internet archive takes periodic static snapshots of websites.
It should be trivial, a one-click action for the user to save the last couple of these checkpoint states to a format that can be consulted later and offline or after the website has gone. Whether that’s just saving “everything” it needs to recreate the machine state, or by saving only the machine state itself.
That doesn’t mean the whole website will remain interactive but it will at the very least preserve what was inside the scroll buffer of the browser
And that is a LOT better than just saving a broken link, or just saving a scrolling screenshot, which already would be an improvement over the current state of things.
It would also allow a text search of the page content of all bookmarked pages. Which would be huge since the current bookmark manager can barely search titles and very poorly at that.
The bookmarks system is long LONG due for a full overhaul
This “machine state” definition and manipulation is exactly the hard part of the concept. I’m not saying it can’t be done, but it’s a beast of a problem.
Our best current solutions are just dumb web crawler bots.
To me a simple page saving (ctrl+s) integration seems like a most realistic solution.
Now imagine having google, bing, qwant, duck duck go and ecosia bookmarked.
You’d get a mostly empty page with a search box in the middle … and a few hundred megs of tracking software.