1
0
Fork 0
mirror of synced 2024-06-01 10:09:49 +12:00

Merge pull request #124 from f0086/new-name-propagation

Propagate the new name of the project
This commit is contained in:
Nick Sweeting 2018-12-21 11:39:37 -08:00 committed by GitHub
commit 0b48fd376f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
12 changed files with 48 additions and 48 deletions

View file

@ -20,7 +20,7 @@ RUN apt-get update && apt-get install -y curl --no-install-recommends \
# ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
# RUN chmod +x /usr/local/bin/dumb-init
RUN git clone https://github.com/pirate/bookmark-archiver /home/chromeuser/app \
RUN git clone https://github.com/pirate/ArchiveBox /home/chromeuser/app \
&& pip3 install -r /home/chromeuser/app/archiver/requirements.txt
# Add user so we area strong, independent chrome that don't need --no-sandbox.

View file

@ -1,10 +1,10 @@
# ArchiveBox: Open source local web archiving <img src="https://nicksweeting.com/images/archive.png" height="22px"/> [![Github Stars](https://img.shields.io/github/stars/pirate/bookmark-archiver.svg)](https://github.com/pirate/bookmark-archiver) [![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/thesquashSH)
# ArchiveBox: Open source local web archiving <img src="https://nicksweeting.com/images/archive.png" height="22px"/> [![Github Stars](https://img.shields.io/github/stars/pirate/bookmark-archiver.svg)](https://github.com/pirate/ArchiveBox) [![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/thesquashSH)
### (Recently [renamed](https://github.com/pirate/ArchiveBox/issues/108) from `Bookmark Archiver`)
### (Recently [renamed](https://github.com/pirate/ArchiveBox/issues/108) from `Bookmark Archiver`)
"Your own personal Way-Back Machine"
▶️ [Quickstart](#quickstart) | [Details](#details) | [Configuration](#configuration) | [Manual Setup](#manual-setup) | [Troubleshooting](#troubleshooting) | [Demo](https://archive.sweeting.me) | [Source](https://github.com/pirate/bookmark-archiver/tree/master) | [Changelog](#changelog) | [Donate](https://github.com/pirate/bookmark-archiver/blob/master/DONATE.md)
▶️ [Quickstart](#quickstart) | [Details](#details) | [Configuration](#configuration) | [Manual Setup](#manual-setup) | [Troubleshooting](#troubleshooting) | [Demo](https://archive.sweeting.me) | [Source](https://github.com/pirate/ArchiveBox/tree/master) | [Changelog](#changelog) | [Donate](https://github.com/pirate/ArchiveBox/blob/master/DONATE.md)
---
@ -62,8 +62,8 @@ Follow the links here to find instructions for exporting a list of URLs from eac
**2. Create your archive:**
```bash
git clone https://github.com/pirate/bookmark-archiver
cd bookmark-archiver/
git clone https://github.com/pirate/ArchiveBox
cd ArchiveBox/
./setup # install all dependencies
# add a list of links from a file
@ -95,8 +95,8 @@ it will keep the index up-to-date without duplicate links.
This example archives a pocket RSS feed and an export file every 24 hours, and saves the output to a logfile.
```bash
0 24 * * * yourusername /opt/bookmark-archiver/archive https://getpocket.com/users/yourusername/feed/all > /var/log/bookmark_archiver_rss.log
0 24 * * * yourusername /opt/bookmark-archiver/archive /home/darth-vader/Desktop/bookmarks.html > /var/log/bookmark_archiver_firefox.log
0 24 * * * yourusername /opt/ArchiveBox/archive https://getpocket.com/users/yourusername/feed/all > /var/log/archivebox_rss.log
0 24 * * * yourusername /opt/ArchiveBox/archive /home/darth-vader/Desktop/bookmarks.html > /var/log/archivebox_firefox.log
```
(Add the above lines to `/etc/crontab`)
@ -190,13 +190,13 @@ The chrome/chromium dependency is _optional_ and only required for screenshots,
The archive produced by `./archive` is suitable for serving on any provider that can host static html (e.g. github pages!).
You can also serve it from a home server or VPS by uploading the outputted `output` folder to your web directory, e.g. `/var/www/bookmark-archiver` and configuring your webserver.
You can also serve it from a home server or VPS by uploading the outputted `output` folder to your web directory, e.g. `/var/www/ArchiveBox` and configuring your webserver.
Here's a sample nginx configuration that works to serve archive folders:
```nginx
location / {
alias /path/to/bookmark-archiver/output/;
alias /path/to/ArchiveBox/output/;
index index.html;
autoindex on; # see directory listing upon clicking "The Files" links
try_files $uri $uri/ =404;
@ -266,8 +266,8 @@ Follow the instruction links above in the "Quickstart" section to download your
**3. Run the archive script:**
1. Clone this repo `git clone https://github.com/pirate/bookmark-archiver`
3. `cd bookmark-archiver/`
1. Clone this repo `git clone https://github.com/pirate/ArchiveBox`
3. `cd ArchiveBox/`
4. `./archive ~/Downloads/bookmarks_export.html`
You may optionally specify a second argument to `archive.py export.html 153242424324` to resume the archive update at a specific timestamp.
@ -369,7 +369,7 @@ a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid site
**No links parsed from export file:**
Please open an [issue](https://github.com/pirate/bookmark-archiver/issues) with a description of where you got the export, and
Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and
preferrably your export file attached (you can redact the links). We'll fix the parser to support your format.
**Lots of skipped sites:**
@ -383,12 +383,12 @@ If you're still having issues, try deleting or moving the `output/archive` folde
**Lots of errors:**
Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally.
Open an [issue](https://github.com/pirate/bookmark-archiver/issues) with a description of the errors if you're still having problems.
Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems.
**Lots of broken links from the index:**
Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots.
If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/bookmark-archiver/issues)
If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues)
with some of the URLs that failed to be archived and I'll investigate.
**Removing unwanted links from the index:**
@ -398,7 +398,7 @@ If you accidentally added lots of unwanted links into index and they slow down y
### Hosting the Archive
If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL.
If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/bookmark-archiver/issues)
If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/ArchiveBox/issues)
if you have problem with a particular nginx config.
@ -468,10 +468,10 @@ If you feel like contributing a PR, some of these tasks are pretty easy. Feel f
- Index links now work without nginx url rewrites, archive can now be hosted on github pages
- added setup.sh script & docstrings & help commands
- made Chromium the default instead of Google Chrome (yay free software)
- added [env-variable](https://github.com/pirate/bookmark-archiver/pull/25) configuration (thanks to https://github.com/hannah98!)
- added [env-variable](https://github.com/pirate/ArchiveBox/pull/25) configuration (thanks to https://github.com/hannah98!)
- renamed from **Pocket Archive Stream** -> **Bookmark Archiver**
- added [Netscape-format](https://github.com/pirate/bookmark-archiver/pull/20) export support (thanks to https://github.com/ilvar!)
- added [Pinboard-format](https://github.com/pirate/bookmark-archiver/pull/7) export support (thanks to https://github.com/sconeyard!)
- added [Netscape-format](https://github.com/pirate/ArchiveBox/pull/20) export support (thanks to https://github.com/ilvar!)
- added [Pinboard-format](https://github.com/pirate/ArchiveBox/pull/7) export support (thanks to https://github.com/sconeyard!)
- front-page of HN, oops! apparently I have users to support now :grin:?
- added Pocket-format export support
- v0.0.0 released: created Pocket Archive Stream 2017/05/05
@ -485,4 +485,4 @@ If you feel like contributing a PR, some of these tasks are pretty easy. Feel f
talented engineers. If you want to help sponsor this project long-term or just say thanks or suggest changes, contact
me at bookmark-archiver@sweeting.me.
[Grants / Donations](https://github.com/pirate/bookmark-archiver/blob/master/DONATE.md)
[Grants / Donations](https://github.com/pirate/ArchiveBox/blob/master/DONATE.md)

View file

@ -1,7 +1,7 @@
#!/usr/bin/env python3
# Bookmark Archiver
# ArchiveBox
# Nick Sweeting 2017 | MIT License
# https://github.com/pirate/bookmark-archiver
# https://github.com/pirate/ArchiveBox
import os
import sys
@ -39,14 +39,14 @@ from util import (
__AUTHOR__ = 'Nick Sweeting <git@nicksweeting.com>'
__VERSION__ = GIT_SHA
__DESCRIPTION__ = 'Bookmark Archiver: Create a browsable html archive of a list of links.'
__DOCUMENTATION__ = 'https://github.com/pirate/bookmark-archiver'
__DESCRIPTION__ = 'ArchiveBox: Create a browsable html archive of a list of links.'
__DOCUMENTATION__ = 'https://github.com/pirate/ArchiveBox'
def print_help():
print(__DESCRIPTION__)
print("Documentation: {}\n".format(__DOCUMENTATION__))
print("Usage:")
print(" ./bin/bookmark-archiver ~/Downloads/bookmarks_export.html\n")
print(" ./bin/archivebox ~/Downloads/bookmarks_export.html\n")
def merge_links(archive_path=OUTPUT_DIR, import_path=None, only_new=False):

View file

@ -28,7 +28,7 @@ CHECK_SSL_VALIDITY = os.getenv('CHECK_SSL_VALIDITY', 'True'
OUTPUT_PERMISSIONS = os.getenv('OUTPUT_PERMISSIONS', '755' )
CHROME_BINARY = os.getenv('CHROME_BINARY', 'chromium-browser' ) # change to google-chrome browser if using google-chrome
WGET_BINARY = os.getenv('WGET_BINARY', 'wget' )
WGET_USER_AGENT = os.getenv('WGET_USER_AGENT', 'Bookmark Archiver')
WGET_USER_AGENT = os.getenv('WGET_USER_AGENT', 'ArchiveBox')
CHROME_USER_DATA_DIR = os.getenv('CHROME_USER_DATA_DIR', None)
TIMEOUT = int(os.getenv('TIMEOUT', '60'))
FOOTER_INFO = os.getenv('FOOTER_INFO', 'Content is hosted for personal archiving purposes only. Contact server owner for any takedown requests.',)

View file

@ -43,8 +43,8 @@ def write_json_links_index(out_dir, links):
path = os.path.join(out_dir, 'index.json')
index_json = {
'info': 'Bookmark Archiver Index',
'help': 'https://github.com/pirate/bookmark-archiver',
'info': 'ArchiveBox Index',
'help': 'https://github.com/pirate/ArchiveBox',
'version': GIT_SHA,
'num_links': len(links),
'updated': str(datetime.now().timestamp()),

View file

@ -1,5 +1,5 @@
"""
In Bookmark Archiver, a Link represents a single entry that we track in the
In ArchiveBox, a Link represents a single entry that we track in the
json index. All links pass through all archiver functions and the latest,
most up-to-date canonical output for each is stored in "latest".

View file

@ -110,7 +110,7 @@
<img src="static/archive.png" style="height: 100%;"/>
</a>
<br/>
<a href="https://github.com/pirate/bookmark-archiver">
<a href="https://github.com/pirate/ArchiveBox">
Github
</a>
</div>
@ -143,8 +143,8 @@
<br/>
<center>
<small>
Archive created using <a href="https://github.com/pirate/bookmark-archiver" title="Github">Bookmark Archiver</a>
version <a href="https://github.com/pirate/bookmark-archiver/commit/$git_sha" title="Git commit">$short_git_sha</a> &nbsp; | &nbsp;
Archive created using <a href="https://github.com/pirate/ArchiveBox" title="Github">ArchiveBox</a>
version <a href="https://github.com/pirate/ArchiveBox/commit/$git_sha" title="Git commit">$short_git_sha</a> &nbsp; | &nbsp;
Download index as <a href="index.json" title="JSON summary of archived links.">JSON</a>
<br/><br/>
$footer_info

View file

@ -56,7 +56,7 @@
<hr/>
<a href="./../../index.html" class="nav-icon" title="Archived Sites">
<img src="https://nicksweeting.com/images/archive.png" alt="Archive Icon" height="20px">
Bookmark Archiver: Link Index
ArchiveBox: Link Index
</a>
</footer>
</body>

View file

@ -21,7 +21,7 @@
<DT><A HREF="https://duckduckgo.com/?q=firefox+export+bookmarks&t=ffhp&ia=web" ADD_DATE="1497562974" LAST_MODIFIED="1497562974" ICON_URI="https://duckduckgo.com/favicon.ico" ICON="">firefox export bookmarks at DuckDuckGo</A>
<DT><A HREF="https://duckduckgo.com/?q=archive+firefox+bookmarks&t=ffab&ia=web" ADD_DATE="1497562974" LAST_MODIFIED="1497562974" ICON_URI="https://duckduckgo.com/favicon.ico" ICON="">archive firefox bookmarks at DuckDuckGo</A>
<DT><A HREF="https://github.com/nodiscc" ADD_DATE="1497562974" LAST_MODIFIED="1497562974" ICON_URI="https://assets-cdn.github.com/favicon.ico" ICON="">nodiscc (nodiscc) · GitHub</A>
<DT><A HREF="https://github.com/pirate/bookmark-archiver#troubleshooting" ADD_DATE="1497562975" LAST_MODIFIED="1497562975" ICON_URI="https://assets-cdn.github.com/favicon.ico" ICON="">pirate/bookmark-archiver · Github</A>
<DT><A HREF="https://github.com/pirate/ArchiveBox#troubleshooting" ADD_DATE="1497562975" LAST_MODIFIED="1497562975" ICON_URI="https://assets-cdn.github.com/favicon.ico" ICON="">pirate/ArchiveBox · Github</A>
<DT><A HREF="http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf" ADD_DATE="1497562976" LAST_MODIFIED="1497562976" ICON_URI="https://assets-cdn.github.com/favicon.ico" ICON="">Phonotactic Reconstruction of Encrypted VoIP Conversations</A>
<DT><A HREF="https://www.ghacks.net/2009/07/23/firefox-bookmarks-archiver/" ADD_DATE="1497562974" LAST_MODIFIED="1497562974" ICON_URI="https://www.ghacks.net/wp-content/uploads/2005/10/favicon.ico" ICON="">Firefox Bookmarks Archiver - gHacks Tech News</A>
</DL><p>

View file

@ -49,14 +49,14 @@ def check_dependencies():
python_vers = float('{}.{}'.format(sys.version_info.major, sys.version_info.minor))
if python_vers < 3.5:
print('{}[X] Python version is not new enough: {} (>3.5 is required){}'.format(ANSI['red'], python_vers, ANSI['reset']))
print(' See https://github.com/pirate/bookmark-archiver#troubleshooting for help upgrading your Python installation.')
print(' See https://github.com/pirate/ArchiveBox#troubleshooting for help upgrading your Python installation.')
raise SystemExit(1)
if FETCH_PDF or FETCH_SCREENSHOT or FETCH_DOM:
if run(['which', CHROME_BINARY], stdout=DEVNULL).returncode:
print('{}[X] Missing dependency: {}{}'.format(ANSI['red'], CHROME_BINARY, ANSI['reset']))
print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format(CHROME_BINARY))
print(' See https://github.com/pirate/bookmark-archiver for help.')
print(' See https://github.com/pirate/ArchiveBox for help.')
raise SystemExit(1)
# parse chrome --version e.g. Google Chrome 61.0.3114.0 canary / Chromium 59.0.3029.110 built on Ubuntu, running on Ubuntu 16.04
@ -68,33 +68,33 @@ def check_dependencies():
if int(version) < 59:
print(version_lines)
print('{red}[X] Chrome version must be 59 or greater for headless PDF, screenshot, and DOM saving{reset}'.format(**ANSI))
print(' See https://github.com/pirate/bookmark-archiver for help.')
print(' See https://github.com/pirate/ArchiveBox for help.')
raise SystemExit(1)
except (IndexError, TypeError, OSError):
print('{red}[X] Failed to parse Chrome version, is it installed properly?{reset}'.format(**ANSI))
print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format(CHROME_BINARY))
print(' See https://github.com/pirate/bookmark-archiver for help.')
print(' See https://github.com/pirate/ArchiveBox for help.')
raise SystemExit(1)
if FETCH_WGET:
if run(['which', 'wget'], stdout=DEVNULL).returncode or run(['wget', '--version'], stdout=DEVNULL).returncode:
print('{red}[X] Missing dependency: wget{reset}'.format(**ANSI))
print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format('wget'))
print(' See https://github.com/pirate/bookmark-archiver for help.')
print(' See https://github.com/pirate/ArchiveBox for help.')
raise SystemExit(1)
if FETCH_FAVICON or SUBMIT_ARCHIVE_DOT_ORG:
if run(['which', 'curl'], stdout=DEVNULL).returncode or run(['curl', '--version'], stdout=DEVNULL).returncode:
print('{red}[X] Missing dependency: curl{reset}'.format(**ANSI))
print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format('curl'))
print(' See https://github.com/pirate/bookmark-archiver for help.')
print(' See https://github.com/pirate/ArchiveBox for help.')
raise SystemExit(1)
if FETCH_AUDIO or FETCH_VIDEO:
if run(['which', 'youtube-dl'], stdout=DEVNULL).returncode or run(['youtube-dl', '--version'], stdout=DEVNULL).returncode:
print('{red}[X] Missing dependency: youtube-dl{reset}'.format(**ANSI))
print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format('youtube-dl'))
print(' See https://github.com/pirate/bookmark-archiver for help.')
print(' See https://github.com/pirate/ArchiveBox for help.')
raise SystemExit(1)
@ -174,7 +174,7 @@ def progress(seconds=TIMEOUT, prefix=''):
return end
def pretty_path(path):
"""convert paths like .../bookmark-archiver/archiver/../output/abc into output/abc"""
"""convert paths like .../ArchiveBox/archiver/../output/abc into output/abc"""
return path.replace(REPO_DIR + '/', '')
@ -319,7 +319,7 @@ def manually_merge_folders(source, target):
assert answer in ('', 'a', 'b', 'q'), 'Invalid choice.'
if answer == 'q':
print('\nJust run Bookmark Archiver again to pick up where you left off.')
print('\nJust run ArchiveBox again to pick up where you left off.')
raise SystemExit(0)
elif answer == '':
return
@ -409,7 +409,7 @@ def cleanup_archive(archive_path, links):
for folder, link in bad_folders:
fix_folder_path(archive_path, folder, link)
elif bad_folders:
print('[!] Warning! {} folders need to be merged, fix by running bookmark archiver.'.format(len(bad_folders)))
print('[!] Warning! {} folders need to be merged, fix by running ArchiveBox.'.format(len(bad_folders)))
if unmatched:
print('[!] Warning! {} unrecognized folders in html/archive/'.format(len(unmatched)))

View file

@ -1,9 +1,9 @@
#!/bin/bash
# Bookmark Archiver Setup Script
# ArchiveBox Setup Script
# Nick Sweeting 2017 | MIT License
# https://github.com/pirate/bookmark-archiver
# https://github.com/pirate/ArchiveBox
echo "[i] Installing bookmark-archiver dependencies. 📦"
echo "[i] Installing ArchiveBox dependencies. 📦"
echo ""
echo " You may be prompted for a password in order to install the following dependencies:"
echo " - Chromium Browser (see README for Google-Chrome instructions instead)"
@ -84,5 +84,5 @@ echo ""
echo "[X] Failed to install some dependencies! ‼️"
echo " - Try the Manual Setup instructions in the README.md"
echo " - Try the Troubleshooting: Dependencies instructions in the README.md"
echo " - Open an issue on github to get help: https://github.com/pirate/bookmark-archiver/issues"
echo " - Open an issue on github to get help: https://github.com/pirate/ArchiveBox/issues"
exit 1