1
0
Fork 0
mirror of synced 2024-07-03 21:40:51 +12:00
Commit graph

1246 commits

Author SHA1 Message Date
Nick Sweeting b6d7c74680 speed up the Snapshot handling view and show index page when extractor output is missing or multiple snapshots returned 2021-02-15 20:52:08 -05:00
Nick Sweeting 0375853683 log error tracebacks to logs/errors.log file and filter noisy 404s and 200s from log output 2021-02-15 20:51:23 -05:00
Nick Sweeting 0ec9bfb971 fix dead missing template variables 2021-02-15 20:50:12 -05:00
Nick Sweeting b3a50a2c10 fix server quick-init param not being passed properly to subcommand 2021-02-15 20:49:40 -05:00
Nick Sweeting b06e256ad9 fix add command not updating snapshot detail index pages when passed index-only and overwrite flags together 2021-02-15 20:49:23 -05:00
Nick Sweeting 8e98cef7ad fix after and before args flipped when filtering 2021-02-15 20:48:51 -05:00
Nick Sweeting 33d180afe7 allow filtering snapshots by timestamp in list, update, and remove cmds 2021-02-15 20:48:35 -05:00
Nick Sweeting 0c9db1c554 remove symbols from random secret key for easier copy pastin 2021-02-15 20:45:42 -05:00
Nick Sweeting 4faef03ba3 compute snapshot properties directly without loading whole Link 2021-02-15 20:44:08 -05:00
Nick Sweeting 9ce3bd5bdc use index.LINK_FILTERS to validate filter-type args instead of hardocding them twice 2021-02-15 20:43:36 -05:00
Nick Sweeting c28ad8bd1b fix AddLinkForm widget complaining about missing template var class 2021-02-15 20:42:59 -05:00
Nick Sweeting 78463c243a remove unused GIT_SHA config option 2021-02-15 20:42:33 -05:00
Nick Sweeting 9cd4ba38f0 add new SNAPSHOTS_PER_PAGE pagination limit config 2021-02-15 20:42:00 -05:00
Nick Sweeting 00ae1f15a7 ignore shm db file and config files in archivebox data dir on init 2021-02-15 14:52:37 -05:00
Nick Sweeting 3c3bae02d2 add quick-init option to skip reimporting all snapshot dirs on init 2021-02-15 14:52:10 -05:00
Nick Sweeting e61e12c889 use setup.py to determine dependencies in Dockerfile instead of egg-info requires.txt 2021-02-15 14:51:32 -05:00
Nick Sweeting 0407d03b6b add cli tests file back 2021-02-15 13:39:49 -05:00
Nick Sweeting 611216765d switch sqlite to use WAL mode by default to prevent database locked errors 2021-02-15 13:39:03 -05:00
Nick Sweeting 683a08772b change wording of db not found error 2021-02-08 23:27:46 -05:00
Nick Sweeting 6705354e57 fix assertion 2021-02-08 23:24:48 -05:00
Nick Sweeting a49884ade8 fix emptystrings in cmd_version causing exception 2021-02-08 23:22:02 -05:00
Nick Sweeting 171bbeb69b catch exception on import of old index.json into ArchiveResult 2021-02-01 16:31:29 -05:00
Nick Sweeting 0aea5ed3e8 fix handling of skipped ArchiveResult entries with null output 2021-02-01 14:37:34 -05:00
Nick Sweeting c4b02be24d remove dead tests code 2021-02-01 05:14:43 -05:00
Nick Sweeting 783f597955 minor build fixes 2021-02-01 05:13:46 -05:00
Nick Sweeting aa84a7ff2b fix migration creating conflicting tags based on slug 2021-02-01 05:13:23 -05:00
Nick Sweeting 7d0f5653c3 fix lgtm alerts 2021-02-01 02:27:24 -05:00
Nick Sweeting 04c951cdd5 fix alerts 2021-02-01 02:22:02 -05:00
Nick Sweeting 534ead2440 use the db exclusively for icons instead of hammering filesystem 2021-02-01 02:18:13 -05:00
Nick Sweeting 923f517a8f minor fixes 2021-02-01 02:17:54 -05:00
Nick Sweeting 560d3103a8 cleanup snapshot detail page UI 2021-01-30 22:04:24 -05:00
Nick Sweeting 54c5331693 check for output existance when rendering files icons 2021-01-30 22:04:14 -05:00
Nick Sweeting 15e87353bd only show archive.org if enabled 2021-01-30 22:03:59 -05:00
Nick Sweeting 846c966c4d use globbing to find wget output path 2021-01-30 22:02:39 -05:00
Nick Sweeting e6fa16e13a only chmod wget output if it exists 2021-01-30 22:02:11 -05:00
Nick Sweeting 385daf9af8 save the url as title for staticfiles or non html files 2021-01-30 22:01:49 -05:00
Nick Sweeting 24e24934f7 add headers.json and fix relative singlefile path resolving for sonic 2021-01-30 21:59:34 -05:00
Nick Sweeting c089501073 add response status code to headers.json 2021-01-30 20:44:49 -05:00
Nick Sweeting b9b1c3d9e8 fix singlefile output path not relative 2021-01-30 20:44:49 -05:00
Nick Sweeting d072f1d413 hide ssl warnings when checking SSL is disabled 2021-01-30 20:44:49 -05:00
Nick Sweeting 9d24bfd0dc disable progress bars on mac again 2021-01-30 20:44:49 -05:00
Nick Sweeting 326ce78496 simplify debug 2021-01-30 06:09:26 -05:00
Nick Sweeting d6de04a83a fix lgtm errors 2021-01-30 06:07:35 -05:00
Nick Sweeting cc80ceb0a2 fix icons in public index 2021-01-30 05:47:55 -05:00
Nick Sweeting 1ce0eca217 add trailing slashes to canonical paths 2021-01-30 05:47:55 -05:00
Nick Sweeting 6edae6a17f add future api spec design 2021-01-30 05:47:55 -05:00
Nick Sweeting a98298103d cleanup templates and views 2021-01-30 05:47:55 -05:00
Nick Sweeting ed13ec7655 remove active theme 2021-01-30 05:47:55 -05:00
Nick Sweeting c2aaa41c76 fix missing str path 2021-01-30 01:25:08 -05:00
Nick Sweeting ff7d2ffa09 fix version in legacy footer 2021-01-29 09:18:38 -05:00
Nick Sweeting 6e84890abd improve loading snapshots tooltips 2021-01-29 09:09:23 -05:00
Nick Sweeting 8a4edb45e7 also search url, timestamp, tags on public index 2021-01-29 09:08:03 -05:00
Nick Sweeting f6c3683ab8 fix snapshot favicon loading spinner height 2021-01-29 00:15:32 -05:00
Nick Sweeting 3227f54b52 limit youtubedl download size to 750m and stop splitting out audio files 2021-01-29 00:15:32 -05:00
Nick Sweeting d7df9e58ea hide footer on add page 2021-01-28 23:15:05 -05:00
Nick Sweeting 5c54bcc1f3 fix files icons greying out on public index 2021-01-28 22:57:12 -05:00
Nick Sweeting 7d8fe66d43 consistent tags styling 2021-01-28 22:35:21 -05:00
Nick Sweeting 6a8f6992d8 reuse admin styling for pubic index and add page 2021-01-28 22:28:10 -05:00
Nick Sweeting f0040580c8 fix files icons escaping 2021-01-28 22:27:17 -05:00
Nick Sweeting 39ec77e46c add createsuperuser flag to server command 2021-01-28 22:27:02 -05:00
Nick Sweeting 4b7550c23f
Merge pull request #632 from aggroskater/bugfix/issue-617 2021-01-28 17:03:57 +02:00
Nick Sweeting 15e58bd366 fix using os.path calls on pathlib paths 2021-01-27 11:27:40 -05:00
Preston Maness 1810426774 Remove now-unused mark_safe import 2021-01-25 21:16:06 -06:00
Preston Maness b647581115
Update archivebox/index/html.py
mark_safe is dangerous, as the URL's filename could have malicious HTML fragments in it.

Co-authored-by: Nick Sweeting <git@sweeting.me>
2021-01-25 20:47:57 -06:00
Nick Sweeting 9764a8ed9b check for non html files from wget 2021-01-25 18:15:16 -05:00
Preston Maness 1989275944 Fix issue #617 by using mark_safe in combination with format_html
I have no experience with Django, so all I'm really going off of is this
stackoverflow

https://stackoverflow.com/a/64498319

which cited this bit of Django documentation:

https://docs.djangoproject.com/en/3.1/ref/utils/#django.utils.html.format_html

After using this method, I no longer get the 500 error or KeyError
exception, and can browse the local server and interact with the single
entry in it (the problematic URL in ArchiveBox#617 with curly braces).

Whether this is the "right" method or not, I have no idea. But it is at
least a start.
2021-01-23 20:32:56 -06:00
Dan Arnfield 5420903102 Refactor should_save_extractor methods to accept overwrite parameter 2021-01-21 15:56:32 -06:00
Nick Sweeting ef7711ffa0 fix cookies file arg is path 2021-01-20 19:13:53 -05:00
Nick Sweeting a07ed3989e fix import path 2021-01-20 19:02:31 -05:00
Nick Sweeting 72e2c7b95d use relative imports for util 2021-01-20 18:44:28 -05:00
Nick Sweeting 02bdb3bdeb fix DATABASE_NAME posixpath 2021-01-20 18:42:12 -05:00
jdcaballerov 14df0cbb7c
Update sonic.py
Sonic buffer accepts 20.000 bytes not unicode characters, since the chunking here is on unicode characters, sending 20.000 characters will overflow sonic's buffer.
UTF-8 can take up to 6 bytes, so sending less than (20.000 / 6) rounded minus should be ok.
2021-01-20 14:51:46 -05:00
Dan Arnfield 5c7842ffb3 Fix dependency dict entries 2021-01-20 09:24:34 -06:00
Nick Sweeting a3008c8189 fix migration failing due to null cmd_versions in older archives 2021-01-12 12:56:06 +02:00
Nick Sweeting f2a0068c17
Merge pull request #608 from cdvv7788/extractor-bugs 2021-01-07 16:38:56 +02:00
Cristian 6031ffa3b2 fix: Mercury extractor error was incorrectly initialized 2021-01-07 09:22:46 -05:00
Cristian e9e4adfc34 fix: wget_output_path failing on some extractors. Add a new condition 2021-01-07 09:07:29 -05:00
Cristian 14d1b3209e fix: Make cmd_version nullable 2021-01-06 20:03:40 +02:00
Cristian c21af37ed4 fix: Give cmd_version a default value in case it is not present 2021-01-06 20:03:40 +02:00
Tim Gates 7bf63d91ff docs: fix simple typo, timstamp -> timestamp
There is a small typo in archivebox/index/__init__.py.

Should read `timestamp` rather than `timstamp`.
2021-01-06 20:03:40 +02:00
Nick Sweeting 9784dcb816 better config comments and docstrings 2020-12-20 03:11:19 +02:00
Nick Sweeting 72b8119881
Merge pull request #587 from jdcaballerov/move-vendored-as-submodules
Add submodules and links
2020-12-16 09:59:56 -05:00
jdcaballerov a2694a3e8a Add submodules and links 2020-12-16 08:53:59 -05:00
jdcaballerov c29ce7e7f0 Add border for card select 2020-12-14 16:00:59 -05:00
jdcaballerov 7b66e1514d Merge branch 'v0.5.0' of github.com:ArchiveBox/ArchiveBox into feat-snapshots-grid 2020-12-14 15:05:19 -05:00
jdcaballerov 243fcccd89 Allow actions on grid view 2020-12-14 15:01:24 -05:00
jdcaballerov 6b5c881555 Fix search to include filters 2020-12-14 13:40:38 -05:00
jdcaballerov 45e97ea278 Reverse test condition to avoid redirects with change details 2020-12-14 13:27:06 -05:00
jdcaballerov d4255be077 use localStorage var 2020-12-14 13:00:13 -05:00
jdcaballerov 8fca36a7cd Restore preferred snapshots view from localstorage 2020-12-14 12:52:15 -05:00
jdcaballerov 7db6b0a8a6 Preserve query string between snapshot list views 2020-12-14 12:11:44 -05:00
Nick Sweeting 326fe69eea
fix lint error 2020-12-12 12:35:32 -05:00
jdcaballerov 9b6afa36a3
Update archivebox/search/backends/ripgrep.py
Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-12-12 08:36:08 -05:00
jdcaballerov aa53f4f088
Update archivebox/search/backends/ripgrep.py
Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-12-12 08:36:01 -05:00
jdcaballerov 50df108863
Update archivebox/config.py
Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-12-12 08:34:00 -05:00
jdcaballerov 24d4c44624 Add ripgrep configs 2020-12-12 07:36:31 -05:00
jdcaballerov 254d2502fd Feature implementation 2020-12-11 23:03:46 -05:00
Cristian a57a5b6b83 refactor: call setup_django with the check_db attribute for the commands that actually need the database 2020-12-11 18:02:56 -05:00
Cristian 57d1a3d4e5 refactor: Remove setup_django from html.py 2020-12-11 17:49:16 -05:00
Cristian ce53b0220c refactor: Remove setup_django from index 2020-12-11 17:36:31 -05:00
Cristian e82161a768 refactor: Remove setup_django from search 2020-12-11 16:43:48 -05:00
Cristian a28547cbca refactor: Remove get_empty_snapshot queryset function and generate it directly 2020-12-11 16:27:15 -05:00
Cristian 81d766aba1 refactor: Remove setup_django from title.py 2020-12-11 16:03:50 -05:00
Nick Sweeting 335732649b tweak node dependency version detection order 2020-12-11 21:03:17 +02:00
Nick Sweeting 1c87c27105 patch migration JSONField as well 2020-12-11 20:50:45 +02:00
Nick Sweeting 081d94d799 fallback to old JSONField from lib if django version is old 2020-12-11 20:45:44 +02:00
Nick Sweeting 2db5e51b54 fix windows shutil not able to handle pathlib 2020-12-11 19:33:18 +02:00
Nick Sweeting e90cf05141 fix lint errors 2020-12-11 16:51:11 +02:00
Nick Sweeting 30f8d3f191 show python implementation name and flip verison output order for easier reading when wrapped on small screens 2020-12-11 16:21:52 +02:00
Nick Sweeting 6623497f18 fix MERCURY_PATH in version output when missing 2020-12-11 16:21:33 +02:00
Nick Sweeting c084e70ea8 fix TEMPLATES_DIR location 2020-12-11 16:21:09 +02:00
Nick Sweeting 9fa70b3452 add extractors arg to oneshot command and bump version to v0.5.1 2020-12-11 15:48:46 +02:00
Nick Sweeting a194bb6301
Merge pull request #580 from BlipRanger/master 2020-12-10 12:48:30 -05:00
BlipRanger 6f462b45d7
Update archivebox/core/forms.py
Cleaner handling of the ARCHIVE_METHODS values

Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-12-10 12:46:16 -05:00
BlipRanger 35809eab1c
Update archivebox/core/views.py
Cleaner handling of the archive methods input

Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-12-10 12:45:30 -05:00
BlipRanger 7ce1f63183
Update archivebox/core/forms.py
Format cleanup

Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-12-10 12:44:38 -05:00
BlipRanger 8b0ff2dfee update instead of append 2020-12-10 11:08:27 -05:00
BlipRanger d9fd1e3811 Add selector for archive modes 2020-12-10 10:51:57 -05:00
Cristian 275ad22db7 refactor: Remove skip_index from archive related functions 2020-12-08 18:42:25 -05:00
Cristian 9745a5ac56 fix: Migrations should be silent when running in setup_django 2020-12-08 18:42:25 -05:00
Cristian 9aac09a5e1 feat: Patch setup_django so we can use an inmemory db in specific commands 2020-12-08 18:42:25 -05:00
Cristian 35a5700c73 fix: Move the setup_django command to a place where we already know what the actual subcommand is 2020-12-08 18:42:25 -05:00
Cristian f6c73f9aeb fix: Issue with oneshot command 2020-12-08 18:42:25 -05:00
Cristian db73d92f83 docs: Update shell message to import models 2020-12-06 12:26:22 -05:00
Cristian 8d22ebf988 feat: Remove walrus operator (we still need to support python3.7) 2020-12-06 12:23:02 -05:00
Nick Sweeting 6ac48d7c35 tweak warning msg 2020-12-06 02:11:36 +02:00
Nick Sweeting a0a79cead8 move utils and vendored libs into subfolders 2020-12-06 02:01:18 +02:00
jdcaballerov 172197ae01 refactor: Remove if LENGTH and use text chunker for every input 2020-12-06 01:14:39 +02:00
jdcaballerov 5a6b814c79 Add exception handling for indexable content reader 2020-12-06 01:14:38 +02:00
JDC 15fbd81480 Change MAX_SONIC_TEXT_LENGTH 2020-12-06 01:14:38 +02:00
JDC db9c2edccc Add log print for url indexing 2020-12-06 01:14:38 +02:00
JDC 0acf479b70 Partition long strings in chunks for sonic 2020-12-06 01:14:38 +02:00
JDC caf4660ac8 Add indexing to update command and utilities 2020-12-06 01:14:37 +02:00
JDC 273c9d91c6 Add tag filter to update command 2020-12-06 01:13:39 +02:00
JDC 7903db6dfb Add ArchiveResult Manager and sorted indexable filter 2020-12-06 01:13:39 +02:00
JDC 23a9beb4e0 Add ignored extensions in ripgrep search 2020-12-06 01:13:39 +02:00
JDC 95382b3812 Add ripgrep rg search backend and set as default 2020-12-06 01:13:39 +02:00
JDC 8484bdb973 Fix add search filter to update 2020-12-06 01:13:39 +02:00
JDC c5b1b91708 fix: flush_search_index must be called before removing snapshots 2020-12-06 01:13:39 +02:00
JDC 70cc0c1950 Add search filter-type 2020-12-06 01:13:39 +02:00
JDC 4eeedae815 Exception handling for indexing and searching 2020-12-06 01:13:39 +02:00
JDC 0ed53cc117 Add search filter type for update 2020-12-06 01:13:39 +02:00
JDC 0f7dba07df feat: add search filter-type to list command 2020-12-06 01:13:37 +02:00
JDC fb67d6684c fix: Return empty QuerySet instead of list 2020-12-06 01:12:47 +02:00
JDC 823df34080 Use QuerySets for search backend API instead of pks 2020-12-06 01:12:47 +02:00
JDC f383648ffc Use a generator for snapshot flush from index 2020-12-06 01:12:47 +02:00
JDC 47daa038eb Implement flush for search backend after remove command 2020-12-06 01:12:47 +02:00
JDC c2c01af3ad Add config for search backend 2020-12-06 01:12:47 +02:00
JDC 5f6673c72c Implement backend architecture for search engines 2020-12-06 01:12:46 +02:00
JDC b1f70b2197 Initial implementation 2020-12-06 01:12:45 +02:00
Nick Sweeting 7bc13204e6
Merge branch 'master' into v0.5.0 2020-12-05 17:45:16 -05:00
Nick Sweeting 3b280e6b02
Merge pull request #569 from cdvv7788/extract-command-update
feat: Add --extract flag to update command
2020-12-05 17:43:28 -05:00
Cristian 35389608d1 feat: Add --extract flag to update command 2020-12-05 12:20:47 -05:00
Cristian 34cad4fe8d fix: Update function with --index-only flag was not behaving as expected 2020-12-05 12:10:17 -05:00
mAAdhaTTah ac7ad9e942
Add parser for Pocket API
Pass a url like `pocket://Username` to import that username's archived Pocket
library. Tokens need to be stored in ArchveBox.conf with the following keys:

```
POCKET_CONSUMER_KEY = key-from-custom-pocket-app
POCKET_ACCESS_TOKENS = {"YourUsername": "pocket-token-for-app"}
```

`POCKET_ACCESS_TOKENS` MUST be on a single line, or the JSON will be
misinterpreted by the parser as a new key/value pair.
2020-12-04 22:54:39 -05:00
Nick Sweeting 00dfe2d449
Merge branch 'v0.5.0' into cleanup 2020-12-04 20:40:24 -05:00
jdcaballerov 54b25d9a81 Linting 2020-12-03 15:59:45 -05:00
jdcaballerov d4bca80b50 Use uppercase for constants 2020-12-03 15:44:59 -05:00
jdcaballerov c8d8346e4d Remove duplicate context after rebase 2020-12-03 15:44:21 -05:00
jdcaballerov a1fba9887d Remove write_html_main_index 2020-12-03 09:25:38 -05:00
jdcaballerov 7f39702bd2 Delete legacy/ folder 2020-12-03 09:17:34 -05:00
jdcaballerov 367b12ba40 Replace legacy templates for django templates 2020-12-03 09:16:18 -05:00
jdcaballerov 8ac7a760c9 Fix num_links missing in public index 2020-12-03 08:32:49 -05:00
jdcaballerov 69897f6121 Hotfix public page search
No ordering causes warning and fallback to default unfiltered QuerySet
2020-12-03 08:32:49 -05:00
Hawken Rives 7299b1f5ae
fix "inconsisntencies" typo in error message 2020-12-02 16:28:26 -06:00
Nick Sweeting 193dde03f0
Merge pull request #559 from jdcaballerov/hotfix-public-search
Hotfix public page search
2020-12-01 10:56:32 -05:00
jdcaballerov 4d972571d0 Hotfix public page search
No ordering causes warning and fallback to default unfiltered QuerySet
2020-12-01 10:46:11 -05:00
Cristian 7008f9b735 feat: move import 2020-11-28 13:11:15 -05:00
Cristian 648b4c8aab feat: Remove unused function write_json_main_index 2020-11-28 13:02:39 -05:00
Nick Sweeting a846916b82
remove unused argument 2020-11-28 12:38:15 -05:00
Cristian 10ee6db02f lint: Remove unused variable 2020-11-28 12:35:13 -05:00
Cristian fa5de72f9f refactor: Move indexing logic out of logging module 2020-11-28 12:34:40 -05:00
Nick Sweeting bee1f3e263 fix lint errors 2020-11-28 04:09:59 -05:00
Nick Sweeting 104553489f remove redundant utils file 2020-11-28 02:12:27 -05:00
Nick Sweeting 84507b68b5 add legacy code warning to schema.py 2020-11-28 02:03:40 -05:00
Nick Sweeting 7fdea91311 fix static html num_outputs info 2020-11-28 02:01:53 -05:00
Nick Sweeting fde65c3b7d fix public index missing template context 2020-11-28 01:29:34 -05:00
Nick Sweeting 7d7ce3a790 fix Snapshot count in column header 2020-11-28 01:22:58 -05:00
Nick Sweeting 9fc965d3da remove broken json download link 2020-11-28 01:19:01 -05:00
Nick Sweeting 46a53eafdb simplify history helper 2020-11-28 01:14:45 -05:00
Nick Sweeting c9162a6d09 remove finished/not finished spinners 2020-11-28 01:07:02 -05:00
Nick Sweeting 9661c863b3 css style tweaks for icons 2020-11-28 01:06:23 -05:00
Nick Sweeting 910f3d65c7 default function args can never be mutable 2020-11-28 01:06:11 -05:00
Nick Sweeting 411fdcac87 use database for num_outputs instead of legacy json 2020-11-28 01:05:53 -05:00
Nick Sweeting 7f2c834ea3 fix check_data_folder mypy types 2020-11-28 01:05:35 -05:00
Nick Sweeting 1b22f8eeef
Merge pull request #515 from cdvv7788/POC-setup-django-on-init 2020-11-27 23:56:37 -05:00
Nick Sweeting 00bb55203e
always show WARC icon with opacity set based on exists 2020-11-27 23:45:49 -05:00
Nick Sweeting efe3027797
Merge branch 'master' into archive-result 2020-11-27 23:18:11 -05:00
Nick Sweeting e4d2ac432d
improve OS kernel output in archivebox version 2020-11-27 23:08:23 -05:00
Nick Sweeting 07a56f9d46
also print platform and CPU info in version output 2020-11-27 22:59:18 -05:00
Nick Sweeting d9ef3d0bf8
ignore lost+found dir in data folder 2020-11-27 19:39:19 -05:00
Cristian 4b3f72202b feat: Bump django, update migration and change cmd to use JSONField 2020-11-27 16:23:27 -05:00
Cristian f61e6a74bb feat: Re-add unused icons in list view 2020-11-27 15:55:37 -05:00
Nick Sweeting f84f288bef
Apply suggestions from code review
minor nit
2020-11-27 00:01:34 -05:00
Nick Sweeting 5e7c2d0ab8 show archivebox and node versions in version cmd output 2020-11-23 20:24:44 -05:00
mAAdhaTTah be7a7f8548
Fix string checks in schedule
`s` comes through as a `PosixPath`, so both the `' ' in s` & return value, later
used by `join`, complain.
2020-11-23 18:34:07 -05:00
Cristian 34a1a6d30d fix: Update model according to code review 2020-11-23 18:28:43 -05:00
Nick Sweeting 02551c0152 minor packaging fixes and bump to 0.4.21 2020-11-23 17:28:45 -05:00
Nick Sweeting 83693a5c03 add packaging setup with stdeb for debian and apt
vendor the base32_crockford lib
add build script for debain packages
2020-11-23 16:57:05 -05:00
Nick Sweeting 0e2ccbc10d update urls to new repo path 2020-11-23 02:06:46 -05:00
Nick Sweeting b11d562445
fix splitting on multiple equals in val 2020-11-22 12:33:15 -05:00
Nick Sweeting afe9319c25
Merge pull request #537 from TrAyZeN/master 2020-11-18 23:20:41 -05:00
Nick Sweeting d32b27abcb
Merge pull request #540 from jdcaballerov/hotfix-search-fields
hotfix: Fixes 500 error on Admin search
2020-11-17 10:36:18 -05:00
JDC 8b0250caeb Fixes 500 error on search
The class SnapshotAdmin search_fields includes the
tags ManyToMany field causing a
django.core.exceptions.FieldError: Related Field got invalid lookup: icontains
error.
A related search field tags__name should be used.
2020-11-17 08:36:03 -05:00
TrAyZeN 88cc75a045 Change opacity of inexisting archive type on public view 2020-11-14 17:48:29 +01:00
TrAyZeN a05485f85c Fix file icons order 2020-11-14 17:44:06 +01:00
Nick Sweeting fdd4effc92
Merge pull request #535 from cdvv7788/extractors-flag 2020-11-13 14:53:17 -05:00
Nick Sweeting 257d3f2a98
Update archivebox/cli/archivebox_add.py 2020-11-13 14:52:21 -05:00
JDC d54c3eec9d Add tag filter argument to remove command 2020-11-13 14:16:48 -05:00
Cristian 54df0a035b fix: Move csv split to the add function to avoid optional nullable argument 2020-11-13 13:10:17 -05:00
Cristian 1ec8276514 fix: Use a comma separated input instead of nargs for the extract flag 2020-11-13 13:01:11 -05:00
JDC cbb3d04c12 Allow list filtering by tag name 2020-11-13 12:06:12 -05:00
Cristian db523c9d82 fix: Avoid mutable default input argument 2020-11-13 11:41:50 -05:00
Cristian 44eede96e5 feat: Add extract flag to add command 2020-11-13 09:24:34 -05:00
Nick Sweeting 4372cb6eec stop execution entirely when atomic_write is unsupported 2020-11-12 14:55:21 -05:00
Nick Sweeting 3f160eab8e correctly handle WGET_AUTO_COMPRESSION failing when wget is missing 2020-11-12 14:28:43 -05:00
Cristian 0f13087a09 refactor: Remove unneeded prefetch related 2020-11-12 13:58:13 -05:00
Cristian c565fad75c feat: Use prefetch related to reduce the number of queries to the database on public index view 2020-11-12 11:37:56 -05:00
Cristian 8cfad64271 feat: Add specific logic for archive_org icon 2020-11-12 11:09:34 -05:00
Cristian e594e6a75a feat: WARC link points to the first warc result in target path 2020-11-12 10:57:31 -05:00
Cristian b237e412df feat: Finish reversal. Add ArchiveResults that are not found in the index.json 2020-11-12 10:30:41 -05:00
Cristian f7f0bebdcc feat: Modify migration reverse function to restore index (WIP) 2020-11-11 15:26:54 -05:00
Cristian 508a0bb06e refactor: Unpack extractors tuple instead of using the index to access the relevant information 2020-11-10 12:38:29 -05:00
Nick Sweeting fbd9a7caa6
add explicit error when FSYNC is not supported on filesystem 2020-11-10 01:07:56 -05:00
Cristian 71655220ad feat: Add warc to list and limit check to succeeded archive results 2020-11-05 07:54:40 -05:00
Cristian 33182fd53c fix: Add missing assignation 2020-11-04 15:07:45 -05:00
Cristian d064a3eeff fix: Handle case when update tries to re-add a link that is not in the sql index 2020-11-04 15:02:54 -05:00
Cristian f292cface2 fix: Add condition for oneshot when archiving links 2020-11-04 14:40:44 -05:00
Cristian 4484491fb7 feat: Create ArchiveResult after finishing an extractor process 2020-11-04 11:22:55 -05:00
Cristian b3e0400bc0 feat: initial functional version with icons calculated based on archive results 2020-11-04 10:31:20 -05:00
Cristian 309a87e8fe feat: Add extractor field to the database 2020-11-04 07:28:02 -05:00
Cristian 8f3c03a0f9 feat: Initial (and naive) ArchiveResult model 2020-11-03 09:54:02 -05:00
Cristian ac0ec160d1 lint: Fix warnings in master branch 2020-11-02 08:51:48 -05:00
Nick Sweeting 7d4738a674 fix intermittent BrokenPipe error on macOS when SHOW_PROGRESS=True 2020-10-31 19:38:54 -04:00
Nick Sweeting 9c6ff5036c add suppress output helper 2020-10-31 19:33:17 -04:00
Nick Sweeting 22fb9c2ad7 tweak icons 2020-10-31 19:32:43 -04:00
Nick Sweeting cafe35c595 show pending in light font 2020-10-31 16:33:31 -04:00
Nick Sweeting 5cae05ae76 tweak tags css and add tags to navbar 2020-10-31 07:57:11 -04:00
Nick Sweeting c47398851b nicer timeout hints 2020-10-31 07:57:11 -04:00
Nick Sweeting 651d6c4447 bold snapshots over 50MB 2020-10-31 07:57:11 -04:00
Nick Sweeting b8bbb75f9c logarithmic progress bars woohoo 2020-10-31 07:57:11 -04:00
Nick Sweeting ac9e0e356d config fixes 2020-10-31 07:57:11 -04:00
Nick Sweeting 79051ca15b new package build 2020-10-31 03:08:41 -04:00
Nick Sweeting 18355dc2c6 clean up config loading in settings and config file layout 2020-10-31 03:08:03 -04:00
Cristian e7e33ea7a5 tests: Add tests for several different ways to extract the title 2020-10-30 08:04:26 -05:00
Nick Sweeting aede134ab3 temporarily disable icon highlighting in favor of performance 2020-10-30 05:12:33 -04:00
Nick Sweeting f727ece7b3 add regex fallback back to title parser 2020-10-30 04:57:31 -04:00
Nick Sweeting 79bef1384e
Merge pull request #493 from ttimasdf/feat-ogtitle
Feature: add og:title metadata as alternative title
2020-10-30 04:51:14 -04:00
Nick Sweeting cac3912439 small type fixes 2020-10-30 04:50:14 -04:00
Nick Sweeting 1e5fbf4bd2
Update archivebox/config/__init__.py 2020-10-29 13:46:03 -04:00
Cristian 81dd626b85 fix: CHROME_USER_DATA_DIR was causing an error after the update to posix paths 2020-10-29 11:09:18 -05:00
Cristian a6bee5f111 feat: Move setup_django to an inner module 2020-10-26 08:02:04 -05:00
Cristian e1d0b8bce7 feat: Initialize django at the beginning 2020-10-26 07:45:21 -05:00
Nick Sweeting 5faadee7d1 workaround for mercury version output 2020-10-24 22:59:09 -04:00
Nick Sweeting e727af6f22 allow Path args to get_dir_size and copy_and_overwrite 2020-10-24 22:47:18 -04:00
Cristian f330e6428b lint: Remove unused imports from utils 2020-10-23 06:45:56 -05:00
Cristian f397634dd2 feat: Rename old indexes at the end of init process 2020-10-23 06:45:56 -05:00
Cristian 7fc9b7d456 refactor: Update mentions of the html index in the logs 2020-10-23 06:45:56 -05:00
Cristian 572b46cecf lint: Remove unused imports 2020-10-23 06:45:56 -05:00
Cristian ae1484b8bf feat: Remove index.json and index.html generation from the regular process 2020-10-23 06:45:56 -05:00
Nick Sweeting 494af5f2e1
Merge pull request #507 from ehainry/master
Add parser for Wallabag Atom feeds
2020-10-22 14:04:57 -04:00
Cristian 14f56a868a refactor: Change typing for new stubs 2020-10-22 08:46:16 -05:00
Cristian c12fe0e3d7 feat: Use CURL_ARGS on title extractor 2020-10-22 08:46:16 -05:00
Cristian 563d0f94ec feat: Use CURL_ARGS in favicon extractor 2020-10-22 08:46:16 -05:00
Cristian 2e1cdca789 feat: Use CURL_ARGS on header extractor 2020-10-22 08:46:16 -05:00
Cristian 972d57bd08 feat: Add CURL_ARGS to control curl arguments 2020-10-22 08:46:16 -05:00
Cristian 24e7a74855 feat: Add WGET_ARGS to control wget arguments 2020-10-22 08:46:16 -05:00
Cristian 65530e1e5b refactor: Use json.loads instead of split for list arguments 2020-10-22 08:46:16 -05:00
Cristian bc02e0ffe3 feat: Add config for youtubedl (YOUTUBEDL_ARGS) 2020-10-22 08:46:16 -05:00
Cristian Vargas a850b4a9d9
Merge branch 'master' into tags 2020-10-20 08:23:25 -05:00
Emmanuel Hainry aebc83659d Add parser for Wallabag Atom feeds 2020-10-18 11:20:07 +02:00
Cristian 62c78e1d10 refactor: Remove django-taggit and replace it with a local tags setup 2020-10-12 13:47:03 -05:00
Nick Sweeting 6c704fa8cf
Merge pull request #498 from adamwolf/bookmarklet
Add a bookmarklet
2020-10-09 21:59:07 -04:00
Cristian 10384a8a6f style: Improve look of tags in admin list 2020-10-07 10:15:56 -05:00
Cristian b9e5b781a7 fix: Avoid creating empty tag on migration 2020-10-07 09:59:49 -05:00
Cristian 62f3d648d4 fix: reverse_func functional 2020-10-07 09:46:10 -05:00
Adam Wolf 8d3295458c Add a bookmarklet
The bookmarklet lets you quickly open the Add page
with the URL already populated in the URLs box.
2020-10-03 14:57:55 -05:00
Angel Rey 3e26ab3ce3 Replaced os.path in clic tests 2020-10-02 15:46:39 -05:00
Angel Rey 01461a98a7 Replaced os.path in logging_util.py 2020-10-02 15:46:39 -05:00
Angel Rey 25ac18c8b7 Replaced os.path in system.py 2020-10-02 15:46:39 -05:00
Angel Rey 16b5ca3207 Replaced os.path in init config 2020-10-02 15:46:39 -05:00
Angel Rey 897bace84d Fixed paths in settings 2020-10-02 15:46:39 -05:00
Angel Rey 0e7c337dcb Replaced os.path in settings.py 2020-10-02 15:46:39 -05:00
Angel Rey ce71747538 replaced os.path in init extractors 2020-10-02 15:46:39 -05:00
Angel Rey fa364ed728 Replaced od.path in init cli 2020-10-02 15:46:39 -05:00
Angel Rey 3fb410a604 Replaced os.path in favicon.py 2020-10-02 15:46:39 -05:00
Angel Rey ad04fb5300 Replaced os.path in init index 2020-10-02 15:46:39 -05:00
Angel Rey 78f7062761 Replaced os.path in html.py 2020-10-02 15:46:39 -05:00
Angel Rey 8b03c37fbb Replaced os.path in json.py 2020-10-02 15:46:39 -05:00
Angel Rey 9264ad88e0 Fixed string casting 2020-10-02 15:46:39 -05:00
Angel Rey 7d513b9b19 Replaced os.path in schema.py 2020-10-02 15:46:39 -05:00
Angel Rey 2c62abb270 Replaced os.path in init parsers 2020-10-02 15:46:39 -05:00
ttimasdf eda3836dee feat: add og:title metadata as alternative title 2020-09-27 12:54:52 +08:00
Cristian 5975c27a6a fix: Remove trailing slash from public index 2020-09-25 13:48:19 -05:00
Cristian abde871a3c fix: Wget absolute path generating issues 2020-09-25 08:24:06 -05:00
Angel Rey 4581ea956f Fixed empty tags 2020-09-24 15:34:23 -05:00
Angel Rey 533ae7413c Removed comments 2020-09-24 15:34:23 -05:00
Angel Rey e06d3f9128 Fixed Link schema 2020-09-24 15:34:23 -05:00
Angel Rey 45775c607c Fixed empty tags 2020-09-24 15:34:23 -05:00
Angel Rey f26c0c6cd8 Fix serialization 2020-09-24 15:34:23 -05:00
Angel Rey 62c9028212 Improved tags 2020-09-24 15:34:23 -05:00
Cristian 7d3767b882 fix: oneshot command not running extractors 2020-09-24 12:56:16 -05:00
Cristian 62ed11a5ca fix: Improve headers handling 2020-09-24 12:55:51 -05:00
Angel Rey a40af98ced removed static file check 2020-09-24 12:55:51 -05:00
Angel Rey f0915a56aa Replaced get method 2020-09-24 12:55:51 -05:00
Cristian e0939d7fe4 fix: Syntax issue on config module 2020-09-24 08:48:58 -05:00
Nick Sweeting a7cd01ad4f
Merge pull request #480 from apkallum/master 2020-09-23 17:30:11 -04:00
Nick Sweeting 38c1f96e2c
Update archivebox/config/__init__.py 2020-09-23 17:29:57 -04:00
Karim 2b987421fb
simpler check for CHROME_USER_DATA_DIR 2020-09-23 17:23:53 -04:00
apkallum 508984c941 fix: ensure chrome data dir is none when appropiate 2020-09-23 13:22:10 -04:00
Angel Rey dc160daba8 Fixed lint 2020-09-23 11:07:00 -05:00
Angel Rey 7fd7dced9a Added curl params 2020-09-23 11:07:00 -05:00
Angel Rey a8a8fd14ac Fixed indent headers.json 2020-09-23 11:07:00 -05:00
Angel Rey 852e3c9cff Added headers extractor 2020-09-23 11:07:00 -05:00
Cristian eb34a6af62 lint: Fix mercury extractor lint issues 2020-09-23 10:35:39 -05:00
Cristian 46b9e3d536 fix: Fix mercury extractor test 2020-09-23 10:34:05 -05:00
ttimasdf 2bf496e7e9 feat: Add mercury-parsed content to summary page 2020-09-22 18:44:12 -05:00
ttimasdf 357b677363 fix: add mercury-parser to extractors list 2020-09-22 18:44:12 -05:00
ttimasdf 706bd895e0 feat: Add mercury-parser 2020-09-22 18:44:12 -05:00
Cristian b18bbf8874 test: Fix tests post-rebase 2020-09-17 09:09:52 -05:00
apkallum 422664079a fix test type casting for folder['path'] 2020-09-17 09:09:52 -05:00
apkallum 0144f19227 fix github action folder listing 2020-09-17 09:09:52 -05:00
apkallum 1aa7bac85b fix oneshot command type signature 2020-09-17 09:09:52 -05:00
apkallum 95157427c2 update stubs file 2020-09-17 09:09:52 -05:00
apkallum 008769d296 add support for Paths in json encoder 2020-09-17 09:09:52 -05:00
apkallum abf68e5437 no home() in Paths 2020-09-17 09:09:52 -05:00
apkallum b99784b919 pathlib with / syntax for config, index 2020-09-17 09:09:52 -05:00
apkallum 594d9e49ce first attempt to migrate to Pathlib 2020-09-17 09:09:52 -05:00
Cristian b2ed96c35a feat: Redirect old add view to the main one 2020-09-17 09:08:20 -05:00
Cristian b3ec170e39 fix: Remove unused imports 2020-09-16 08:50:56 -05:00
Cristian bc116c25f8 refactor: Change View to FormView 2020-09-16 08:50:56 -05:00
apkallum a06bd715a9 remove reference to old home 2020-09-16 08:50:56 -05:00
apkallum 1cdaad00a8 no more oldhome, cbvs uniform across views 2020-09-16 08:50:56 -05:00
apkallum 94a590b31a factor out a base.html template 2020-09-16 08:50:56 -05:00
apkallum 5e8c115f3f unify public archive view 2020-09-16 08:50:56 -05:00
apkallum 3288f8579b add public add view + toggle setting 2020-09-16 08:50:56 -05:00
apkallum 6f7cc2b3ef ensure results have icons 2020-09-16 08:50:56 -05:00
apkallum 3048c0f6dc add icons to new public view 2020-09-16 08:50:56 -05:00
apkallum c50af04cce search view inherits from modified public view 2020-09-16 08:50:56 -05:00
apkallum 948b2469f6 no files count in public view 2020-09-16 08:50:56 -05:00
apkallum 5c4ac3cf3d new public view derived from django 2020-09-16 08:50:56 -05:00
Cristian 50f3f16203 lint: Remove unused import 2020-09-15 08:05:46 -05:00
Cristian 0a83392cbf fix: Replace any typing with Union[Iterable[Link], QuerySet] in archive_links 2020-09-15 08:05:46 -05:00
Cristian 779a446085 feat: Make title and tags editable in admin 2020-09-15 08:05:46 -05:00
Cristian 5348f4735a fix: Change check to avoid issues with empty querysets 2020-09-15 08:05:46 -05:00
Cristian cf18130f85 feat: Add deprecation warning for index.json 2020-09-15 08:05:46 -05:00
Cristian 018bd91745 refactor: Remove get_iter lambda from archive_links 2020-09-15 08:05:46 -05:00
Cristian Vargas 5e9b3099c6 Update fix_duplicate_links_in_index docstring
Co-authored-by: Nick Sweeting <git@sweeting.me>
2020-09-15 08:05:46 -05:00
Cristian 01fb44fd40 refactor: Change archive_links check to focus on queryset, so it allows other iterables and not just lists 2020-09-15 08:05:46 -05:00
Cristian fa622d3e14 refactor: Replace --index with --with-headers in the list command to make it more explicit. Change it so it affects the csv output too. 2020-09-15 08:05:46 -05:00