1
0
Fork 0
mirror of synced 2024-06-28 02:50:24 +12:00
Commit graph

124 commits

Author SHA1 Message Date
Angel Rey dc160daba8 Fixed lint 2020-09-23 11:07:00 -05:00
Angel Rey 7fd7dced9a Added curl params 2020-09-23 11:07:00 -05:00
Angel Rey 852e3c9cff Added headers extractor 2020-09-23 11:07:00 -05:00
Cristian eb34a6af62 lint: Fix mercury extractor lint issues 2020-09-23 10:35:39 -05:00
Cristian 46b9e3d536 fix: Fix mercury extractor test 2020-09-23 10:34:05 -05:00
ttimasdf 357b677363 fix: add mercury-parser to extractors list 2020-09-22 18:44:12 -05:00
ttimasdf 706bd895e0 feat: Add mercury-parser 2020-09-22 18:44:12 -05:00
Cristian b18bbf8874 test: Fix tests post-rebase 2020-09-17 09:09:52 -05:00
Cristian 50f3f16203 lint: Remove unused import 2020-09-15 08:05:46 -05:00
Cristian 0a83392cbf fix: Replace any typing with Union[Iterable[Link], QuerySet] in archive_links 2020-09-15 08:05:46 -05:00
Cristian 018bd91745 refactor: Remove get_iter lambda from archive_links 2020-09-15 08:05:46 -05:00
Cristian 01fb44fd40 refactor: Change archive_links check to focus on queryset, so it allows other iterables and not just lists 2020-09-15 08:05:46 -05:00
Cristian fe9604a772 feat: Add tests for remove command 2020-09-15 08:05:46 -05:00
Cristian be520d137a feat: Refactor add method to use querysets 2020-09-15 08:05:46 -05:00
Cristian 874403e667 feat: Remove patch_main_index 2020-09-15 08:05:46 -05:00
Cristian 31343c1367 feat: Update extractors and add command to use sql index as source of truth 2020-09-15 08:05:46 -05:00
Cristian bd3c824d45 fix: Escape JSON output on command failure so the user can run the command manually 2020-09-04 10:23:41 -05:00
Nick Sweeting a645f36b87
add comment about fake cmd 2020-09-01 19:42:22 -04:00
Cristian 66037535fd feat: Add curl command on readability as default command to debug 2020-09-01 10:16:24 -05:00
Cristian bf3ea42141 fix: Add a default cmd value to handle case where the html cannot be retrieved 2020-08-27 09:51:33 -05:00
Nick Sweeting a2c158e43e catch OSErrors due to missing path 2020-08-18 19:09:45 -04:00
Nick Sweeting 7144e0bdce search for node dependencies in output dir first 2020-08-18 18:40:19 -04:00
Nick Sweeting e87f1d57a3 fix linters 2020-08-18 09:22:12 -04:00
Nick Sweeting c9b3bab84d fix pull title not working 2020-08-18 08:49:26 -04:00
Nick Sweeting b0c0a676f8 re-enable readability and singlefile by default now that its less noisy 2020-08-18 08:29:46 -04:00
Nick Sweeting d7d53cfb12 dont show skipped extractors to reduce visual noise 2020-08-18 08:13:35 -04:00
Nick Sweeting 92de20af15 better detect missing dependencies on startup 2020-08-18 04:38:13 -04:00
Nick Sweeting b681a477ae add overwrite flag to add command to force re-archiving 2020-08-18 04:37:54 -04:00
Cristian 05c71fc302 fix: Organize readability extractor so a timeout does not break the whole process 2020-08-17 08:34:40 -05:00
Nick Sweeting 58e928520a tweak log output for skipped methods 2020-08-14 13:12:50 -04:00
Nick Sweeting 03b73bfe77
Update archivebox/extractors/readability.py 2020-08-14 12:55:22 -04:00
Cristian b7aa3df8d2 feat: Disable singlefile and readability by default 2020-08-12 14:42:21 -05:00
Cristian 5dc7e63792 feat: Update dockerfile to support readability 2020-08-11 11:52:43 -05:00
Cristian 2a68af1b94 tests: Add readability tests 2020-08-11 11:15:15 -05:00
Cristian 8aa7b34de7 tests: Add readability to ignored methods in tests 2020-08-11 08:58:49 -05:00
Cristian dc87d8b68c tests: Update failing tests 2020-08-11 08:48:13 -05:00
Cristian 0ec747f64e feat: Look in wget, singlefile or dom outputs before attempting to download the information again 2020-08-11 08:37:12 -05:00
Cristian a14762640e feat: Avoid running readability when the target is a file 2020-08-11 08:37:12 -05:00
Cristian 61e08a7c43 docs: Update docs link 2020-08-11 08:37:12 -05:00
Cristian b33c66a9f7 feat: Split output of readability into multiple files 2020-08-11 08:37:12 -05:00
Cristian 7e2b249388 feat: Initial version of readability extractor 2020-08-11 08:37:12 -05:00
Nick Sweeting 430be7bc68 add missing staticfile check to singlefile 2020-08-10 13:42:20 -04:00
Cristian 06d0e9de6c feat: Add support for singlefile in docker 2020-08-03 13:23:05 -05:00
Nick Sweeting 5b6eb5e4ad make filenames consistent with program name 2020-08-03 13:23:05 -05:00
Cristian 42b0c80465 feat: Add singlefile to link_details 2020-08-03 13:22:06 -05:00
Cristian 787a5ad43e fix: Commit code review suggestions 2020-08-03 13:22:06 -05:00
Cristian 853685668c feat: Add initial support for singlefile extractor 2020-08-03 13:22:06 -05:00
Cristian e6c571beb2 fix: Remove title from extractors for oneshot 2020-07-31 10:24:58 -05:00
Cristian 8bcb171e74 fix: Remove support for multiple urls in oneshot command 2020-07-31 09:05:40 -05:00
Cristian 3afb2401bc fix: Add condition to avoid breaking the add command 2020-07-29 11:53:49 -05:00
Cristian c073ea141d feat: Initial oneshot command proposal 2020-07-29 11:19:06 -05:00
Nick Sweeting 2e0b751376 accept methods argument to filder archive_link 2020-07-28 05:58:38 -04:00
Nick Sweeting 032c2458de add missing setup_django import 2020-07-28 05:58:13 -04:00
Nick Sweeting 55a237a435 also set snapshot title inside of fetch_title directly 2020-07-28 05:56:34 -04:00
Nick Sweeting 273059f054 accept gzipped responses when using curl 2020-07-28 05:55:54 -04:00
Nick Sweeting af9084ee95 update Snapshot.title to latest_title after fetching 2020-07-28 05:55:09 -04:00
Nick Sweeting 943453a9a8 pass overwrite properly 2020-07-28 05:54:42 -04:00
Cristian a5550b2105 fix: Rename logging folder to avoid naming conflicts (and circular import issues) 2020-07-22 11:02:13 -05:00
Nick Sweeting 0965031d8f fix archive_org header rename 2020-07-22 01:46:38 -04:00
Cristian f4d1b5121e refactor: Move logging.py to main module to avoid circular import issues 2020-07-17 18:00:04 -05:00
Cristian 23e6803f02 fix: Add change to calculate wget folder when there is a port present 2020-07-17 16:55:56 -05:00
Nick Sweeting ae208435c9 fix the add links form 2020-07-13 12:21:37 -04:00
Nick Sweeting 215d5eae32 normal git clone instead of mirror 2020-07-13 11:41:37 -04:00
Nick Sweeting b4ce20cbe5 write link details json before and after archiving 2020-07-13 11:41:27 -04:00
Nick Sweeting d3bfa98a91 fix depth flag and tweak logging 2020-07-13 11:26:34 -04:00
Nick Sweeting df593dea0a fix missing imports 2020-06-30 05:55:34 -04:00
Nick Sweeting 602e141f08 fix config file atomic writing bugs 2020-06-30 02:04:16 -04:00
Nick Sweeting 79b19ddf35 use atomic writes for config file writing as well 2020-06-30 01:12:06 -04:00
Nick Sweeting 5c2bbe7efe bufixes 2020-06-25 22:14:40 -04:00
Nick Sweeting cb67b09f9d Merge branch 'master' into django 2020-06-25 21:30:29 -04:00
Nick Sweeting 43c471e4af cli experience improvements 2020-06-25 17:47:55 -04:00
Nick Sweeting 2829b18b0b new save playlists option 2020-04-22 21:14:20 -04:00
Nick Sweeting 95007d9137 split up utils into separate files 2019-04-30 23:13:04 -04:00
Nick Sweeting 1b8abc0961 move everything out of legacy folder 2019-04-27 17:26:24 -04:00