1
0
Fork 0
mirror of synced 2024-05-03 20:12:52 +12:00
Commit graph

87 commits

Author SHA1 Message Date
jim winstead 5478d13d52 Add generic_jsonl parser
Resolves #1369
2024-03-14 15:42:29 -07:00
Nick Sweeting 099f7d00fe
Use feedparser for RSS parsing (#1362)
Fixes #1171
Fixes #870 (probably, would need to test against a Wallabag Atom file to
Fixes #135
Fixes #123
Fixes #106
2024-03-14 01:51:45 -07:00
jim winstead 741ff5f1a8 Make it a little easier to run specific tests
Changes ./bin/test.sh to pass command line options to pytest, and default to
only running tests in the tests/ directory instead of everywhere excluding
a few directories which is more error-prone.

Also keeps the mock_server used in testing quiet so access log entries don't
appear on stdout.
2024-03-01 12:43:53 -08:00
jim winstead 0f402df42f Merge with latest dev 2024-03-01 12:05:43 -08:00
jim winstead e7119adb0b Add tests for generic_rss and pinboard_rss parsers 2024-03-01 11:27:59 -08:00
jim winstead 1f828d9441 Add tests for generic_rss and pinboard_rss parsers 2024-03-01 11:22:28 -08:00
jim winstead ccabda4c7d Handle list of tags in JSON, and be more clever about comma vs. space 2024-02-28 17:38:49 -08:00
jim winstead 178e676e0f Fix JSON parser by not always mangling the input
Rather than by assuming the JSON file we are parsing has junk at the beginning
(which maybe only used to happen?), try parsing it as-is first, and then fall
back to trying again after skipping the first line

Fixes #1347
2024-02-27 14:48:19 -08:00
Nick Sweeting a680724367
Merge branch 'dev' into search_index_extract_html_text 2023-10-27 23:09:28 -07:00
Ross Williams 310b4d1242 Add htmltotext extractor
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
Ross Williams b44f7e68b1 Add URL-specific method allow/deny lists
Allows enabling only allow-listed extractors or disabling specific
deny-listed extractors for a regular expression matched against an added
site's URL.
2023-08-02 09:36:40 -04:00
Sascha Ißbrücker 40c122515a fix: make oneshot command return successful exist code 2023-05-29 10:01:27 +02:00
Nick Sweeting 9f1470cf03 fix output permissions tests 2021-05-31 20:57:46 -04:00
Nick Sweeting eef9adbfcb fix select invalid test 2021-04-03 15:50:48 -04:00
Nick Sweeting 354b4627ed fix tests 2021-03-30 23:39:15 -04:00
Nick Sweeting bd6d9c165b enforce utf8 on literally all file operations because windows sucks 2021-03-27 01:16:29 -04:00
Nick Sweeting 33df9c1ebe fix after and before in remove tests 2021-02-18 06:21:44 -05:00
Nick Sweeting 4f5bb3776c fix sql err 2021-02-18 05:51:53 -05:00
Nick Sweeting 46a4197514 fix tests 2021-02-18 04:26:56 -05:00
Cristian e82161a768 refactor: Remove setup_django from search 2020-12-11 16:43:48 -05:00
Nick Sweeting e03d17c208 test extract flag on oneshot 2020-12-11 16:49:18 +02:00
Cristian f6c73f9aeb fix: Issue with oneshot command 2020-12-08 18:42:25 -05:00
Nick Sweeting 1b22f8eeef
Merge pull request #515 from cdvv7788/POC-setup-django-on-init 2020-11-27 23:56:37 -05:00
Nick Sweeting efe3027797
Merge branch 'master' into archive-result 2020-11-27 23:18:11 -05:00
Nick Sweeting 0e2ccbc10d update urls to new repo path 2020-11-23 02:06:46 -05:00
Nick Sweeting fdd4effc92
Merge pull request #535 from cdvv7788/extractors-flag 2020-11-13 14:53:17 -05:00
JDC b1dbfcb73f Add test remove tag filter 2020-11-13 14:17:12 -05:00
Cristian 44eede96e5 feat: Add extract flag to add command 2020-11-13 09:24:34 -05:00
Cristian 33182fd53c fix: Add missing assignation 2020-11-04 15:07:45 -05:00
Cristian d064a3eeff fix: Handle case when update tries to re-add a link that is not in the sql index 2020-11-04 15:02:54 -05:00
Cristian e7e33ea7a5 tests: Add tests for several different ways to extract the title 2020-10-30 08:04:26 -05:00
Cristian f6ce1de882 fix: archivebox version was being called as root 2020-10-27 09:15:14 -05:00
Cristian a6bee5f111 feat: Move setup_django to an inner module 2020-10-26 08:02:04 -05:00
Cristian e1d0b8bce7 feat: Initialize django at the beginning 2020-10-26 07:45:21 -05:00
Cristian ae1484b8bf feat: Remove index.json and index.html generation from the regular process 2020-10-23 06:45:56 -05:00
Cristian Vargas a850b4a9d9
Merge branch 'master' into tags 2020-10-20 08:23:25 -05:00
Cristian 62c78e1d10 refactor: Remove django-taggit and replace it with a local tags setup 2020-10-12 13:47:03 -05:00
Angel Rey 73418836f8 Replaced os.path in server.py 2020-10-02 15:46:39 -05:00
Angel Rey 62c9028212 Improved tags 2020-09-24 15:34:23 -05:00
Cristian 0158efb1d0 test: Improve oneshot test 2020-09-24 12:56:16 -05:00
Cristian 62ed11a5ca fix: Improve headers handling 2020-09-24 12:55:51 -05:00
Angel Rey ee6caca3ca Added more asserts 2020-09-23 11:07:00 -05:00
Angel Rey 1cce786d6d Added test headers extractor 2020-09-23 11:07:00 -05:00
Cristian 46b9e3d536 fix: Fix mercury extractor test 2020-09-23 10:34:05 -05:00
ttimasdf e3329be291 tests: add test for mercury-parser 2020-09-22 18:44:12 -05:00
Cristian fa622d3e14 refactor: Replace --index with --with-headers in the list command to make it more explicit. Change it so it affects the csv output too. 2020-09-15 08:05:46 -05:00
Cristian 2aa8d69b72 fix: Save history in main index (to mimic previous behaviour) 2020-09-15 08:05:46 -05:00
Cristian 7e9d195d13 feat: Update list command to sort using sqlite 2020-09-15 08:05:46 -05:00
Cristian f55153eab3 feat: Update update command to work with querysets 2020-09-15 08:05:46 -05:00
Cristian dafa1dd63c tests: Add tests for before and after flags in remove command 2020-09-15 08:05:46 -05:00