From ab680479c404d9031aaa7e83c7951da4e75d1b4d Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Fri, 23 Apr 2021 20:58:06 -0400 Subject: [PATCH] add re-snapshot ui button to readme --- README.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index b341ff7c..44489c71 100644 --- a/README.md +++ b/README.md @@ -530,20 +530,20 @@ Installing directly on **Windows without Docker or WSL/WSL2/Cygwin is not offici ## Caveats -### Archiving Private URLs +### Archiving Private Content -If you're importing URLs containing secret slugs or pages with private content (e.g Google Docs, unlisted videos, etc), **you may want to disable some of the extractor modules to avoid leaking private URLs to 3rd party APIs** during the archiving process. +If you're importing pages with private content or URLs containing secret tokens you don't want public (e.g Google Docs, paywalled content, unlisted videos, etc.), **you may want to disable some of the extractor methods to avoid leaking that content to 3rd party APIs or the public**. ```bash -# don't do this: -archivebox add 'https://docs.google.com/document/d/12345somelongsecrethere' -archivebox add 'https://example.com/any/url/you/want/to/keep/secret/' +# don't save private content to ArchiveBox, e.g.: +archivebox add 'https://docs.google.com/document/d/12345somePrivateDocument' +archivebox add 'https://vimeo.com/somePrivateVideo' -# without first disabling share the URL with 3rd party APIs: +# without first disabling saving to Archive.org: archivebox config --set SAVE_ARCHIVE_DOT_ORG=False # disable saving all URLs in Archive.org -# if extra paranoid or anti-google: -archivebox config --set SAVE_FAVICON=False # disable favicon fetching (it calls a google API) +# if extra paranoid or anti-Google: +archivebox config --set SAVE_FAVICON=False # disable favicon fetching (it calls a Google API passing the URL's domain part only) archivebox config --set CHROME_BINARY=chromium # ensure it's using Chromium instead of Chrome ``` @@ -571,6 +571,9 @@ archivebox add 'https://example.com#2020-10-24' archivebox add 'https://example.com#2020-10-25' ``` +There is also a "Re-Snapshot" button in the UI to do this automatically.
+Re-Snapshot Button in Admin UI + ### Storage Requirements Because ArchiveBox is designed to ingest a firehose of browser history and bookmark feeds to a local disk, it can be much more disk-space intensive than a centralized service like the Internet Archive or Archive.today. However, as storage space gets cheaper and compression improves, you should be able to use it continuously over the years without having to delete anything.