1
0
Fork 0
mirror of synced 2024-09-30 17:17:12 +13:00
ArchiveBox/archivebox/search
Ross Williams 310b4d1242 Add htmltotext extractor
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
..
backends bail out on sonic indexing after 5 errors 2021-04-10 05:18:03 -04:00
__init__.py
utils.py Add htmltotext extractor 2023-10-23 21:42:32 -04:00