1
0
Fork 0
mirror of synced 2024-06-25 01:20:30 +12:00

better title regex to match titles surrounded by newlines

This commit is contained in:
Nick Sweeting 2019-03-19 18:09:06 -04:00
parent 1b5201fd58
commit 914750c453

View file

@ -66,9 +66,9 @@ URL_REGEX = re.compile(
re.IGNORECASE,
)
HTML_TITLE_REGEX = re.compile(
r'<title>' # start matching text after <title> tag
r'<title.*?>' # start matching text after <title> tag
r'(.[^<>]+)', # get everything up to these symbols
re.IGNORECASE,
re.IGNORECASE | re.MULTILINE | re.DOTALL | re.UNICODE,
)
### Checks & Tests