Show HN: Self-hosted offline Internet from your browsing history

Last modified on November 12, 2020

🏛️ - An archivist browser controller that caches all of the items you browse, a library server with plump textual convey search to once more your archive.

News - distinctive binaries

  • Overview
    • License
    • About
    • Net 22120
    • Utilizing
      • Deem set up mode or once more mode
      • Exploring your 22120 archive
    • Structure
    • Why now not WARC (or one different format like MHTML) ?
    • How it really works
    • FAQ
      • Discontinue I've to safe one factor?
      • Can I exhaust this with a browser that's not often any longer Chrome-essentially based mostly?
      • How does this work in conjunction with Ad blockers?
      • How accumulate is operating chrome with a great distance away debugging port launch?
      • Is that this free?
      • What's the roadmap?
      • What about streaming convey?
      • Can I unhappy guidelines domains to now not archive them?
      • Is there a DEBUG mode for troubleshooting?
      • Can I mannequin the archive?
      • Can I commerce the archive path?
      • Can I commerce this factor extra?

License

Copyright (c) 2018, 2020, Dosyago and/or its associates. All rights reserved.

That is a launch of 22120, a web archiver.

License data may effectively effectively moreover be declare within the LICENSE file.

This instrument is dual-licensed. For data about industrial licensing, look Dosyago Commercial License for OEMs, ISVs and VARs.

Prime

About

This enterprise actually makes your web having a look out there COMPLETELY OFFLINE. Your browser would not even know the excellence. Or now not it's actually that unparalleled. Sure.

Build your having a look, then swap off the on-line and fling to http://localhost: 22120 and swap mode to once more then browse what you browsed previous to. All of it mute works.

warning: should you dangle obtained Chrome launch, this might effectively truthful shut it robotically everytime you occur to launch 22120, and relaunch it. Which you can also truthful lose any unsaved work.

Prime

Net 22120

Three methods to get it:

  1. Net binary from the releases web page., or
  2. Speed with npx: npx archivist1@most original, or
    • npm i -g archivist1@most original && archivist1
  3. Clone this repo and scamper as a Node.JS app: npm i && npm originate

Also, coming quickly is a Chrome Extension.

Prime

Utilizing

Deem set up mode or once more mode

Bolt to http://localhost: 22120 in your browser,
and follow the directions.

Prime

Exploring your 22120 archive

Archive might be situated in 22120-arc/public/library*

However or now not it's now not public, do not effort!

Which you can also moreover are trying the archive index, for a itemizing of each and every title within the archive. The index is out there from the postpone a watch on web page, which by default is at http://localhost: 22120 (besides you modified the port).

*Demonstrate: 22120-arc is the archive root of a single archive, and by defualt it's a great distance positioned in your condominium itemizing. However that you must commerce the daddy or mom itemizing for 22120-arc to dangle a couple of archvies.

Prime

Structure

The archive format is:

22120-arc/public/library//.json

Inner the JSON file, is a JSON object with headers, response code, key and a tainted 64 encoded response physique.

Prime

Why now not WARC (or one different format like MHTML) ?

The case for the 22120 format.

Different codecs set up translations of the sources you archive. They develop changes, truthful like altering the inside development of the HTML, altering hyperlinks and URLs into "flat" embedded information URIs, and require different "hacksin declare to ascertain a "perceptually an identical" copy of the archived useful useful resource.

22120 throws all that out, and calls garbage on it. 22120 saves a verbatim high-fidelity copy of the sources your archive. It would not alter their inside development in any technique. As a change it data every and every useful useful resource in its dangle metadata file.

Why?

At 22120, we ponder within the sources and in verbatim copies. We do not annoint ourselves as all spicy sufficient to change the useful useful resource supply of reality previous to we archive it, truthful so it may actually effectively effectively "match the formatwe dangle. We do not ponder we should mute be modifying or altering sources we archive. We belive we should mute set up them exactly as that they had been introduced. We ponder the format should mute match (or at the least accommodate, and be agreeable to) the useful useful resource, now not the completely different technique round. We do not ponder in conflating metadata with convey; so we separate them. We ponder isolating metadata and convey, and conserving the convey pure and altered in the way forward for the archiving path of is never any longer best the trusty factor to ticket, it simplifies every and every part of the audit path, as a result of we all know that the changes between archived copies of a useful useful resource of which talent that of changes to the sources themselves, now not artefacts of the format or archiving path of.

Each and every WARC and MHTML require mutilatious changes of the sources in order that the sources may effectively effectively moreover be "compelled to check" the format. At 22120, we ponder proper this is now not required (and regardless of all of the items should mute on no yarn be carried out). We look it as truthful like lopping off the arms of a Roman statue in declare to check it staunch right into a presentation and security reveal area. How ridiculous! The accumulate could be a extra "pliable" medium however that will not imply we should mute deal with it with out respect for its inherent convey.

Why is altering the inside development of sources so execrable?

In our glimpse, the inside development of the useful useful resource as introduced, is the cannon. Inner development is never any longer solely substitutable "presentation" - no, mainly it encodes a should dangle semantic data truthful like hyperlink relationships, supply picks, and the "strokes" of the useful useful resource creator as they develop their convey, although or now not it's mediated by a web server or web framework.

Why else is 22120 the obtrusive and pure completely different?

22120 moreover archives sources exactly as they're despatched to the browser. It runs linked to a browser, and so is able to get right of entry to the plump-scope of sources (with, at the moment, the exception of video, audio and websockets, for now) of their highest constancy, with out modification, that the browser receives and is able to archive them within the actual format introduced to the individual. Many sources endure presentational and processing changes previous to they're introduced to the individual. That is the ever-declare, "web app", the construct shopper-aspect scripting enabled by JavaScript, creates sources and useful useful resource views on the fly. These types of "hyper sources" or "realtime" or "shopper facet" sources, prevalent in SPAs, are now not in a neighborhood to be archived, at the least now not using the licensed archive float, inside dilapidated wget-essentially based mostly archiving instruments.

In non permanent, the accumulate is an on-line medium, and it may actually effectively effectively mute be archived and introduced within the equivalent fashion. 22120 archives convey exactly as it's a great distance obtained and introduced by a browser, and it moreover replays that convey exactly as if the useful useful resource had been being taken from on-line. Sure, it requires a browser for this command, however that browser want now not be linked to the accumulate. It's miles best pure that viewing a web useful useful resource requires the accumulate browser. And attributable to 22120 the browser would not know the excellence! Assets introduced to the browser ticket a much away web state of affairs, and sources given to the browser by 22120, are thought of by the browser as exactly the equivalent. This ensures that the opposite of us viewing the archive are moreover now not let down and are given the commerce to dangle the precise equivalent talents as if that they had been viewing the useful useful resource on-line.

Prime

How it really works

Makes exhaust of DevTools protocol to intercept all requests, and caches responses towards a key fabricated from (METHOD and URL) onto disk. It moreover maintains an in reminiscence dwelling of keys so it's a great distance acutely aware of what it has on disk.

Prime

FAQ

Discontinue I've to safe one factor?

Sure. However....If you occur to love 22120, you'd respect the clientless hosted mannequin coming in future. It's good to be in a neighborhood to operate your archives on-line from any instrument, with none safe, then safe the archive to scamper on any desktop. You will wish to research in to exhaust it, however that you must soar the queue and take a look at in proper this second.

Can I exhaust this with a browser that's not often any longer Chrome-essentially based mostly?

No.

However...look #57. Appropriate resolve on to dwelling some expectations, proper this is best an investigation and considering it, it may actually effectively effectively additionally truthful now not ever get completed. However, your voices made a distinction, as I wasn't even considering it previous to.

Prime

How does this work in conjunction with Ad blockers?

Interacts truthful comely. The issues advert blockers destroy is never any longer going to be archived.

Prime

How accumulate is operating chrome with a great distance away debugging port launch?

Appears fairly accumulate. Or now not it's now not uncovered to most of the people web, and pages you load that attempted to exhaust it may actually effectively effectively now not exhaust the protocol for one factor else (excluding to launch a novel tab, which they'll ticket anyway).

Prime

Is that this free?

Sure proper this is completely free to safe and exhaust. Or now not it's moreover launch supply (beneath AGPL-3.0) so ticket what that it's most likely going you will effectively effectively per probability like with it. For extra data about licensing, look the license fragment.

Prime

What's the roadmap?

  • Beefy textual convey search
  • Library server to once more archive publicly.
  • Distributed p2p web browser on IPFS

Prime

What about streaming convey?

The following are doubtlessly onerous (and I dangle not concept worthy about):

  • Streaming convey (audio, video)
  • "Impure" request response pairs (truthful like should you name GET /endpoint 1 time you get "A", should you name it a 2nd time you get "AA", and different examples like this).
  • WebSockets (the style to obtain and replay that faithfully?)

Potentially some technique to ticket this tho.

Prime

Can I unhappy guidelines domains to now not archive them?

Sure! Build any domains into 22120-arc/no.json*, eg:

[
  "*.horribleplantations.com",
  "*.cactusfernfurniture.com",
  "*.gustymeadows.com",
  "*.nytimes.com",
  "*.cnn.co?"
]

Is not any longer going to cache any useful useful resource with a bunch matching these. Wildcards:

  • * (Zero or extra one factor else) and
  • ? (Zero or 1 one factor else)

*Demonstrate: the no file is per-archive. 22120-arc is the archive root of a single archive, and by defualt it's a great distance positioned in your condominium itemizing. However that you must commerce the daddy or mom itemizing for 22120-arc to dangle a couple of archvies, and every and every archive requires its dangle no file, to confirm that you simply a blacklist in that archive.

Prime

Is there a DEBUG mode for troubleshooting?

Sure, truthful make sure you dwelling an atmosphere variable referred to as DEBUG_22120 to 1 factor else non empty.

So as an illustration in posix methods:

Prime

Can I mannequin the archive?

Sure! However or now not it's a need to to exhaust git for versioning. Appropriate provoke a git repo in your archive repository. And everytime you occur to ought to ascertain a snapshot, ticket a novel git commit.

Prime

Can I commerce the archive path?

Sure, there's a postpone a watch on for altering the archive path within the postpone a watch on web page: http://localhost: 22120

Prime

Can I commerce this factor extra?

There's a pair of repeat line arguments. You will look the format printed as a result of essentially the most main printed line everytime you occur to originate this system.

For different issues that you must seek for the supply code.

Prime

Read More

Similar Products:

    None Found

Recent Content