1
0
Fork 0
mirror of synced 2024-06-23 08:30:29 +12:00

Update README.md

This commit is contained in:
Nick Sweeting 2020-10-12 02:57:31 -04:00 committed by GitHub
parent c38d13a355
commit 274fd40c9d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -87,11 +87,12 @@ At the end of the day, the goal is to sleep soundly knowing that the part of the
- [**Free & open source**](https://github.com/pirate/ArchiveBox/blob/master/LICENSE), doesn't require signing up for anything, stores all data locally
- [**Few dependencies**](https://github.com/pirate/ArchiveBox/wiki/Install#dependencies) and [simple command line interface](https://github.com/pirate/ArchiveBox/wiki/Usage#CLI-Usage)
- [**Comprehensive documentation**](https://github.com/pirate/ArchiveBox/wiki), [active development](https://github.com/pirate/ArchiveBox/wiki/Roadmap), and [rich community](https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-Community)
- **Doesn't require a constantly-running server**, proxy, or native app
- Easy to set up **[scheduled importing](https://github.com/pirate/ArchiveBox/wiki/Scheduled-Archiving) from multiple sources**
- Uses common, **durable, [long-term formats](#saves-lots-of-useful-stuff-for-each-imported-link)** like HTML, JSON, PDF, PNG, and WARC
- ~~**Suitable for paywalled / [authenticated content](https://github.com/pirate/ArchiveBox/wiki/Configuration#chrome_user_data_dir)** (can use your cookies)~~ (do not do this until v0.5 is released with some security fixes)
- Can [**run scripts during archiving**](https://github.com/pirate/ArchiveBox/issues/51) to [scroll pages](https://github.com/pirate/ArchiveBox/issues/80), [close modals](https://github.com/pirate/ArchiveBox/issues/175), expand comment threads, etc.
- **Doesn't require a constantly-running daemon**, proxy, or native app
- Provides a CLI, Python API, self-hosted web UI, and REST API (WIP)
- Architected to be able to run [**many varieties of scripts during archiving**](https://github.com/pirate/ArchiveBox/issues/51), e.g. to extract media, summarize articles, [scroll pages](https://github.com/pirate/ArchiveBox/issues/80), [close modals](https://github.com/pirate/ArchiveBox/issues/175), expand comment threads, etc.
- Can also [**mirror content to 3rd-party archiving services**](https://github.com/pirate/ArchiveBox/wiki/Configuration#submit_archive_dot_org) automatically for redundancy
## Input formats
@ -193,22 +194,22 @@ apt install python3 python3-pip python3-dev git curl wget youtube-dl chromium-br
# Install Node + NPM
curl -s https://deb.nodesource.com/gpgkey/nodesource.gpg.key | apt-key add - \
&& echo 'deb https://deb.nodesource.com/node_14.x $(lsb_release -cs) main' >> /etc/apt/sources.list \
&& apt-get update -qq \
&& apt-get install -qq -y --no-install-recommends nodejs
&& apt-get update \
&& apt-get install --no-install-recommends nodejs
# Make a directory to hold your collection
mkdir data && cd data # (doesn't have to be called data)
mkdir data && cd data # (can be anywhere, doesn't have to be called data)
# Install python package (or do this in a .venv if you want)
pip install --upgrade archivebox
# Install node packages (needed for SingleFile, Readability, and Puppeteer)
npm install --prefix data 'git+https://github.com/pirate/ArchiveBox.git'
npm install --prefix . 'git+https://github.com/pirate/ArchiveBox.git'
archivebox init
archivebox add 'https://example.com' # add URLs via args or stdin
archivebox add 'https://example.com' # add URLs as args pipe them in via stdin
# or import an RSS/JSON/XML/TXT feed/list of links
# it can injest links from many formats, including RSS/JSON/XML/MD/TXT and more
curl https://getpocket.com/users/USERNAME/feed/all | archivebox add
archivebox add --depth=1 https://example.com/table-of-contents.html
```