1
0
Fork 0
mirror of synced 2024-06-12 07:25:01 +12:00
Commit graph

37 commits

Author SHA1 Message Date
Nick Sweeting c5fc3e1e65 --ammend 2022-05-09 23:59:27 -07:00
Nick Sweeting a6767671fb append content of referenced files to imports 2022-05-09 21:21:39 -07:00
Nick Sweeting f6d6a06c78 always show all totals in log output 2022-05-09 21:21:26 -07:00
Nick Sweeting 38e54b93fe allow parsing to continue even when fetching URL contents fails 2022-05-09 19:56:24 -07:00
Nick Sweeting acd53c854d handle new wallabag export format with newlines mid-tag attributes 2022-05-09 19:07:48 -07:00
Nick Sweeting 44f5338470
fix typo in pocket_api articl variable name 2021-11-12 19:23:47 -05:00
Bruno Tavares bb2a2e758a
Avoid KeyError on Pocket API parser
When trying to import my pocket library I got a lot of ` KeyError`  on Python. Pocket API has a few idiosyncrasies, such as sometimes returning the keys on json, sometimes not.

` ` ` sh
archivebox add --parser pocket_api pocket://my_username
` ` ` 

Gave me this errors
` ` ` 
  File "/app/archivebox/parsers/pocket_api.py", line 54, in link_from_article
    title = article['resolved_title'] or article['given_title'] or url
KeyError: 'resolved_title'
` ` ` 

This commit are the patches I've changed to successfully import my library
2021-09-07 21:53:36 -03:00
Ross Williams f6cf35a45d Fix Pinboard RSS parsing valid links as None
`item.find(p)` returns either an `ElementTree.Element` or `None`.  The
[lambda on line 24][lambda] coerces the return value to a bool, which is
`False` if the `<link>` element has no children (see
[`ElementTree.py` line 207][etbooldef]), so the lambda returns `None`.

Further, returning a `Link` with `url=None` violates
[an assertion in `index/schema.py`][assertion], which crashes
the `archivebox add` command.

[lambda]: 3d54b1321b/archivebox/parsers/pinboard_rss.py (L24)
[etbooldef]: 3d8993a744/Lib/xml/etree/ElementTree.py (L207)
[assertion]: 3d54b1321b/archivebox/index/schema.py (L165)
2021-08-04 10:13:37 -04:00
Nick Sweeting a9986f1f05 add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support 2021-04-10 04:21:36 -04:00
Nick Sweeting f59b6d4189 only add url-list lines that are real urls 2021-04-01 14:00:07 -04:00
Nick Sweeting 5d3a03b299 use stderr and hint in case of parser returning no urls instead of bare exception 2021-03-31 01:39:01 -04:00
Nick Sweeting 8ce93ff787 use KEY, NAME, and PARSER to define parsers instead of hardcoding in init 2021-03-31 01:05:49 -04:00
Nick Sweeting 36f0646501
Merge pull request #669 from FliegendeWurst/fix-issue-235
add command: --parser option (fixes #235)
2021-03-31 00:53:47 -04:00
FliegendeWurst 60bd9a902e add command: --parser option 2021-03-28 10:09:11 +02:00
Nick Sweeting 5fb9ca389f check more url parsing invariants on startup 2021-03-27 03:57:22 -04:00
Nick Sweeting d6de04a83a fix lgtm errors 2021-01-30 06:07:35 -05:00
Nick Sweeting a0a79cead8 move utils and vendored libs into subfolders 2020-12-06 02:01:18 +02:00
mAAdhaTTah ac7ad9e942
Add parser for Pocket API
Pass a url like `pocket://Username` to import that username's archived Pocket
library. Tokens need to be stored in ArchveBox.conf with the following keys:

```
POCKET_CONSUMER_KEY = key-from-custom-pocket-app
POCKET_ACCESS_TOKENS = {"YourUsername": "pocket-token-for-app"}
```

`POCKET_ACCESS_TOKENS` MUST be on a single line, or the JSON will be
misinterpreted by the parser as a new key/value pair.
2020-12-04 22:54:39 -05:00
Emmanuel Hainry aebc83659d Add parser for Wallabag Atom feeds 2020-10-18 11:20:07 +02:00
Angel Rey 2c62abb270 Replaced os.path in init parsers 2020-10-02 15:46:39 -05:00
apkallum 594d9e49ce first attempt to migrate to Pathlib 2020-09-17 09:09:52 -05:00
Nick Sweeting 61ab952dab fix parser docstring 2020-08-18 09:20:05 -04:00
Nick Sweeting 15efb2d5ed new generic_html parser for extracting hrefs 2020-08-18 08:29:05 -04:00
Nick Sweeting a682a9c478 make all parsers accept arbitrary meta kwargs 2020-08-18 08:27:47 -04:00
Nick Sweeting 2e2b4f8150 fix url is too long to be a path error 2020-08-18 08:23:57 -04:00
Nick Sweeting e3ac4c2405 htmldecode downloaded sources before parsing for links 2020-08-18 08:23:20 -04:00
Cristian c073ea141d feat: Initial oneshot command proposal 2020-07-29 11:19:06 -05:00
Nick Sweeting 3fe7a9b70c also parse and archive sub-urls in generic_txt input 2020-07-27 18:52:57 -04:00
Cristian 6006b4f93b refactor: Organize code to remove flake8 issues 2020-07-24 12:25:25 -05:00
Cristian a5550b2105 fix: Rename logging folder to avoid naming conflicts (and circular import issues) 2020-07-22 11:02:13 -05:00
Cristian f4d1b5121e refactor: Move logging.py to main module to avoid circular import issues 2020-07-17 18:00:04 -05:00
Nick Sweeting d3bfa98a91 fix depth flag and tweak logging 2020-07-13 11:26:34 -04:00
Nick Sweeting 96b1e4a8ec accept local paths as valid link URLs when parsing 2020-07-13 11:22:58 -04:00
Nick Sweeting cb67b09f9d Merge branch 'master' into django 2020-06-25 21:30:29 -04:00
Nick Sweeting 204de37eb9 fix parsing errors for older archive index formats 2019-05-01 02:28:48 -04:00
Nick Sweeting 95007d9137 split up utils into separate files 2019-04-30 23:13:04 -04:00
Nick Sweeting 1b8abc0961 move everything out of legacy folder 2019-04-27 17:26:24 -04:00