From 6c957556e17d869678929cc18a41d49941fd2be6 Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Fri, 5 May 2017 05:10:50 -0400 Subject: [PATCH] readme --- README.md | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 16669911..969f2145 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,53 @@ -# pocket-archive-stream +# Pocket Stream Archive + Save an archived copy of all websites starred using Pocket, indexed in an html file. + +![](screenshot.png) + +## Quickstart + +**Dependencies:** Google Chrome headless, wget + +```bash +brew install Caskroom/versions/google-chrome-canary +brew install wget + +# OR on linux + +wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - +sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list' +apt update; apt install google-chrome-beta +``` + +**Usage:** + +1. Download your pocket export file `ril_export.html` from https://getpocket.com/export +2. Download this repo `git clone https://github.com/pirate/pocket-archive-stream` +3. `cd pocket-archive-stream/` +4. `./archive.py path/to/ril_export.html` + +It produces a folder `pocket/` containing an `index.html`, and archived copies of all the sites, +organized by timestamp. For each sites it saves: + - wget of site, with .html appended if not present + - screenshot of site using headless chrome + - PDF of site using headless chrome + +The wget archive is suitable for serving on your personal server, you can upload the pocket +archive to `/var/www/pocket` and allow people to access your saved copies of sites. + + +## Info + +This is basically an open-source version of [Pocket Premium](https://getpocket.com/). I got tired of sites I saved going offline, +or changing their URLS, so I want to archive a copy of them whenever I save them now. + +Now I can rest soundly knowing important articles and resources I like wont dissapear off the internet. + +**WARNING:** + +Hosting other people's site content has security implications for your domain, make sure you understand +the dangers of hosting other people's CSS & JS files on your domain. It's best to put this on a domain +of its own to slightly mitigate CSRF attacks. + +It might also be prudent to blacklist your archive in your `robots.txt` so that search engines dont index +the content on your domain.