more troubleshooting help

2024-06-01 10:09:49 +12:00 · 2017-07-02 12:02:29 -05:00 · 2017-07-02 12:02:29 -05:00 · 1431cc3290
parent 9f5fe726dd
commit 1431cc3290
1 changed files with 8 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -253,6 +253,10 @@ env CHROME_BINARY=/path/from/step/1/chromium-browser ./archive.py bookmarks_expo
 If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice.
 See the "Manual Setup" instructions for more details.

+If wget times out or randomly fails to download some sites that you have confirmed are online,
+upgrade wget to the most recent version with `brew upgrade wget` or `apt upgrade wget`.  There is
+a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid sites.
+
 ### Archiving

 **No links parsed from export file:**
@ -285,6 +289,10 @@ If you're having issues trying to host the archive via nginx, make sure you alre
 If you don't, google around, there are plenty of tutorials to help get that set up.  Open an [issue](https://github.com/pirate/bookmark-archiver/issues)
 if you have problem with a particular nginx config.

+If you're getting many 404s when trying to visit links from the index, this is caused by `wget` appending `.html`
+to the end of all downloaded content if it doesn't already have it.  I will be correcting the index links to
+account for this soon, but in the meantime use the nginx config above which automatically appends .html to links before 404-ing.
+
 ## TODO

 - body text extraction using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/)