1
0
Fork 0
mirror of synced 2024-05-17 10:53:34 +12:00

Update README.md

This commit is contained in:
Nick Sweeting 2023-11-08 02:27:40 -08:00 committed by GitHub
parent e65b80b084
commit ee6bf58170
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -31,11 +31,21 @@ curl -sSL 'https://get.archivebox.io' | sh
➡️ Use ArchiveBox as a [command-line package](#quickstart), [self-hosted web app](#quickstart), or [desktop app](https://github.com/ArchiveBox/electron-archivebox) (alpha), on Linux, macOS, and Windows (via Docker).
```python3
pip install archivebox
```
**You can feed it URLs one at a time, or schedule regular imports** from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. See <a href="#input-formats">input formats</a> for a full list.
**It saves snapshots of the URLs you feed it in several formats:** HTML, PDF, PNG screenshots, WARC, and more out-of-the-box, with a wide variety of content extracted and preserved automatically (social media, article text, PDFs/audio/video, git repos, etc.). See <a href="#output-formats">output formats</a> for a full list.
> 🏛️ Whether you're a `journalist` `collecting media snippets`, a `human rights lawyer` `reviewing vast social media feeds`, a `researcher` `analyzing or training on web content`, or in any ony other role that needs better web preservation tools, ArchiveBox [can help](https://zulip.archivebox.io/#narrow/stream/167-enterprise/topic/welcome/near/1191102).
---
> ArchiveBox is for [professionals](https://zulip.archivebox.io/#narrow/stream/167-enterprise/topic/welcome/near/1191102) and hobbyists alike:
> **Journalists:** `collecting media snippets`, `preserving quotes`, `fact-checking`
> **Lawyers:** `aggregating social media posts`, `extracting and tagging content`, `preserving evidence`, `documenting custody`
> **Researchers:** `automating scheduled crawling`, `adding archive post-processing steps`, `feeding archives into training or analysis pipelines`
> **Individuals:** `preserving browsing history`, `curating bookmarks`, `extracting content to common formats`
The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats [for decades](#background--motivation) after it goes down.