ArchiveBox

mirror of synced 2024-09-30 17:17:12 +13:00

History

Ross Williams 310b4d1242 Add htmltotext extractor Saves HTML text nodes and selected element attributes in `htmltotext.txt` for each Snapshot. Primarily intended to be used for search indexing.		2023-10-23 21:42:32 -04:00
..
backends	bail out on sonic indexing after 5 errors	2021-04-10 05:18:03 -04:00
__init__.py
utils.py	Add htmltotext extractor	2023-10-23 21:42:32 -04:00