1
0
Fork 0
mirror of synced 2024-05-17 02:43:16 +12:00

add re-snapshot ui button to readme

This commit is contained in:
Nick Sweeting 2021-04-23 20:58:06 -04:00
parent 07f4a63635
commit ab680479c4

View file

@ -530,20 +530,20 @@ Installing directly on **Windows without Docker or WSL/WSL2/Cygwin is not offici
## Caveats
### Archiving Private URLs
### Archiving Private Content
If you're importing URLs containing secret slugs or pages with private content (e.g Google Docs, unlisted videos, etc), **you may want to disable some of the extractor modules to avoid leaking private URLs to 3rd party APIs** during the archiving process.
If you're importing pages with private content or URLs containing secret tokens you don't want public (e.g Google Docs, paywalled content, unlisted videos, etc.), **you may want to disable some of the extractor methods to avoid leaking that content to 3rd party APIs or the public**.
```bash
# don't do this:
archivebox add 'https://docs.google.com/document/d/12345somelongsecrethere'
archivebox add 'https://example.com/any/url/you/want/to/keep/secret/'
# don't save private content to ArchiveBox, e.g.:
archivebox add 'https://docs.google.com/document/d/12345somePrivateDocument'
archivebox add 'https://vimeo.com/somePrivateVideo'
# without first disabling share the URL with 3rd party APIs:
# without first disabling saving to Archive.org:
archivebox config --set SAVE_ARCHIVE_DOT_ORG=False # disable saving all URLs in Archive.org
# if extra paranoid or anti-google:
archivebox config --set SAVE_FAVICON=False # disable favicon fetching (it calls a google API)
# if extra paranoid or anti-Google:
archivebox config --set SAVE_FAVICON=False # disable favicon fetching (it calls a Google API passing the URL's domain part only)
archivebox config --set CHROME_BINARY=chromium # ensure it's using Chromium instead of Chrome
```
@ -571,6 +571,9 @@ archivebox add 'https://example.com#2020-10-24'
archivebox add 'https://example.com#2020-10-25'
```
There is also a "Re-Snapshot" button in the UI to do this automatically.<br/>
<img src="https://user-images.githubusercontent.com/511499/115942091-73c02300-a476-11eb-958e-5c1fc04da488.png" alt="Re-Snapshot Button in Admin UI" height="24px"/>
### Storage Requirements
Because ArchiveBox is designed to ingest a firehose of browser history and bookmark feeds to a local disk, it can be much more disk-space intensive than a centralized service like the Internet Archive or Archive.today. However, as storage space gets cheaper and compression improves, you should be able to use it continuously over the years without having to delete anything.