From 843ebc8831f40cd23e7a1b534a25544034fc5cdf Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Mon, 29 May 2017 13:05:16 -0500 Subject: [PATCH] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index adc40aca..8dee9b29 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ (Your own personal Way-Back Machine) -Save an archived copy of all websites you star using Pocket or Pinboard, indexed in an html file. Powered by the new [headless](https://developers.google.com/web/updates/2017/04/headless-chrome) Google Chrome and good 'ol `wget`. +Save an archived copy of all websites you star using Pocket or Pinboard, indexed in an html file. Powered by the new [headless](https://developers.google.com/web/updates/2017/04/headless-chrome) Google Chrome and good 'ol `wget`. NEW: Also submits each link to archive.org! ![](screenshot.png) @@ -34,6 +34,9 @@ add-apt-repository ppa:fkrull/deadsnakes && apt update && apt install python3.6 ``` If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start. +To swtich from Google Chrome to chromium, change the `CHROME_BINARY` variable at the top of `archive.py`. +If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice. + **Archiving:** 1. Download your pocket export file `ril_export.html` from https://getpocket.com/export @@ -47,6 +50,7 @@ organized by timestamp. For each sites it saves: - wget of site, e.g. `en.wikipedia.org/wiki/Example.html` with .html appended if not present - `sreenshot.png` 1440x900 screenshot of site using headless chrome - `output.pdf` Printed PDF of site using headless chrome + - `archive.org.txt` A link to the saved site on archive.org You can tweak parameters like screenshot size, file paths, timeouts, etc. in `archive.py`. You can also tweak the outputted html index in `index_template.html`. It just uses python