bulk-downloader-for-reddit/bdfr/site_downloaders/imgur.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
import re
from typing import Optional

from praw.models import Submission

from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.base_downloader import BaseDownloader


class Imgur(BaseDownloader):
    def __init__(self, post: Submission):
        super().__init__(post)
        self.raw_data = {}

    def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
        self.raw_data = self._get_data(self.post.url)

        out = []
        if "is_album" in self.raw_data:
            for image in self.raw_data["images"]:
                if "mp4" in image:
                    out.append(Resource(self.post, image["mp4"], Resource.retry_download(image["mp4"])))
                else:
                    out.append(Resource(self.post, image["link"], Resource.retry_download(image["link"])))
        else:
            if "mp4" in self.raw_data:
                out.append(Resource(self.post, self.raw_data["mp4"], Resource.retry_download(self.raw_data["mp4"])))
            else:
                out.append(Resource(self.post, self.raw_data["link"], Resource.retry_download(self.raw_data["link"])))
        return out

    @staticmethod
    def _get_data(link: str) -> dict:
        try:
            if link.endswith("/"):
                link = link.removesuffix("/")
            if re.search(r".*/(.*?)(gallery/|a/)", link):
                imgur_id = re.match(r".*/(?:gallery/|a/)(.*?)(?:/.*)?$", link).group(1)
                link = f"https://api.imgur.com/3/album/{imgur_id}"
            else:
                imgur_id = re.match(r".*/(.*?)(?:_d)?(?:\..{0,})?$", link).group(1)
                link = f"https://api.imgur.com/3/image/{imgur_id}"
        except AttributeError:
            raise SiteDownloaderError(f"Could not extract Imgur ID from {link}")

        headers = {
            "referer": "https://imgur.com/",
            "origin": "https://imgur.com",
            "content-type": "application/json",
            "Authorization": "Client-ID 546c25a59c58ad7",
        }
        res = Imgur.retrieve_url(link, headers=headers)

        try:
            image_dict = json.loads(res.text)
        except json.JSONDecodeError as e:
            raise SiteDownloaderError(f"Could not parse received response as JSON: {e}")

        return image_dict["data"]
Move to inheritance system for downloaders 2021-02-07 17:46:20 +13:00			`#!/usr/bin/env python3`
Standardize shebang and coding declaration Standardizes shebang and coding declarations. Coding matches what's used by install tools such as pip(x). Removes a few init files that were not needed. 2022-12-20 12:32:37 +13:00			`# -- coding: utf-8 --`
Move to inheritance system for downloaders 2021-02-07 17:46:20 +13:00
v1.9.0 (#114) * IMGUR API is no longer used * --skip now accepts file types instead of domain * --skip-domain added * --no-download added * --no-dupe now supports YouTube * Duplicates of older posts will not be dowloaded if --no-dupe and --downloaded-posts options are given together * Invalid characters in MacOS and Linux platforms are removed from filenames * Bug fixes 2020-06-04 03:10:25 +12:00			`import json`
Refactor Imgur class to be hardier 2021-03-21 14:10:06 +13:00			`import re`
Integrate new base_downloader class 2021-02-25 23:40:08 +13:00			`from typing import Optional`
Add tentative typing 2021-02-07 01:29:13 +13:00
Move to different program structure 2021-02-11 12:10:40 +13:00			`from praw.models import Submission`
v1.7.0 (#97) * tools file name change to utils * Seperate downloaders (#94) * Seperated the downloaders * Remove redundant code * Changed file names * refactor * Redgifs (#95) * Init commit * Init commit * GifDeliveryNetwork (#96) * Initial commit * Gfycat forwarding to GDN bug fixed 2020-05-29 06:42:11 +12:00
Remove unused imports 2021-04-23 23:06:16 +12:00			`from bdfr.exceptions import SiteDownloaderError`
Rename module 2021-04-12 19:58:32 +12:00			`from bdfr.resource import Resource`
			`from bdfr.site_authenticator import SiteAuthenticator`
			`from bdfr.site_downloaders.base_downloader import BaseDownloader`
v1.7.0 (#97) * tools file name change to utils * Seperate downloaders (#94) * Seperated the downloaders * Remove redundant code * Changed file names * refactor * Redgifs (#95) * Init commit * Init commit * GifDeliveryNetwork (#96) * Initial commit * Gfycat forwarding to GDN bug fixed 2020-05-29 06:42:11 +12:00
(maint) code clean up (#187) ## bdfr - Add the bound instance as method parameter - Change methods not using its bound instance to staticmethods - Fix dangerous default argument - Refactor the comparison involving `not` - Refactor unnecessary `else` / `elif` when `if` block has a `raise` statement - Refactor unnecessary `else` / `elif` when `if` block has a `return` statement - Refactor useless `else` block in the loop - Remove implicit `object` from the base class - Remove reimported module - Remove unnecessary generator - Remove unnecessary return statement - Remove unnecessary use of comprehension - Remove unused imports - Use `is` to compare type of objects - Using not x can cause unwanted results ## Dockerfile - use a pinned Python version tag instead of latest - leverage cached requirements Signed-off-by: Vladislav Doster <mvdoster@gmail.com> Co-authored-by: Ali Parlakçı <parlakciali@gmail.com> 2021-02-25 22:32:06 +13:00
Remove utils module for downloaders 2021-02-07 14:33:19 +13:00			`class Imgur(BaseDownloader):`
Remove unused parameter 2021-02-15 18:12:27 +13:00			`def __init__(self, post: Submission):`
			`super().__init__(post)`
Move to inheritance system for downloaders 2021-02-07 17:46:20 +13:00			`self.raw_data = {}`

Rename file and class 2021-02-26 21:57:05 +13:00			`def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:`
Refactor Imgur class to be hardier 2021-03-21 14:10:06 +13:00			`self.raw_data = self._get_data(self.post.url)`
v1.7.0 (#97) * tools file name change to utils * Seperate downloaders (#94) * Seperated the downloaders * Remove redundant code * Changed file names * refactor * Redgifs (#95) * Init commit * Init commit * GifDeliveryNetwork (#96) * Initial commit * Gfycat forwarding to GDN bug fixed 2020-05-29 06:42:11 +12:00
Refactor Imgur class to be hardier 2021-03-21 14:10:06 +13:00			`out = []`
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`if "is_album" in self.raw_data:`
			`for image in self.raw_data["images"]:`
			`if "mp4" in image:`
			`out.append(Resource(self.post, image["mp4"], Resource.retry_download(image["mp4"])))`
			`else:`
			`out.append(Resource(self.post, image["link"], Resource.retry_download(image["link"])))`
v1.9.0 (#114) * IMGUR API is no longer used * --skip now accepts file types instead of domain * --skip-domain added * --no-download added * --no-dupe now supports YouTube * Duplicates of older posts will not be dowloaded if --no-dupe and --downloaded-posts options are given together * Invalid characters in MacOS and Linux platforms are removed from filenames * Bug fixes 2020-06-04 03:10:25 +12:00			`else:`
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`if "mp4" in self.raw_data:`
			`out.append(Resource(self.post, self.raw_data["mp4"], Resource.retry_download(self.raw_data["mp4"])))`
			`else:`
			`out.append(Resource(self.post, self.raw_data["link"], Resource.retry_download(self.raw_data["link"])))`
Move to different program structure 2021-02-11 12:10:40 +13:00			`return out`
v1.8.0 (#105) ## Change log - Youtube support added - Custom filenames feature added - Custom folder structure feature added - Unsaving downloaded posts option added - Remove duplicate posts on different subreddits option added - Skipping given domains option added - Keeping track of already downloaded posts on a separate file option added (See --dowloaded-posts in README) - No audio on v.redd.it videos bug fixed (see README for details about ffmpeg) - --default-directory option is added - --default-options is added - --use-local-config option is added - Bug fixes 2020-06-02 00:05:02 +12:00
Refactor Imgur class to be hardier 2021-03-21 14:10:06 +13:00			`@staticmethod`
			`def _get_data(link: str) -> dict:`
Imgur updates Update Imgur logic to cover malformed links that cause a redirect leading to the html of the page being saved as an image. 2022-09-19 15:27:17 +12:00			`try:`
Gfycat/Redgifs coverage Coverage for direct gfycat links that redirect to redgifs. The redirect through the sites themselves are broken but this fixes that. Coverage for o.imgur links and incorrect capitalisation of domains in download_factory. Changed tests for direct as gfycat is handled by the gfycat downloader. fix pornhub test as the previous video was removed. 2023-01-31 08:52:08 +13:00			`if link.endswith("/"):`
			`link = link.removesuffix("/")`
Account for new gallery url Coverage for gallery urls 2023-01-10 06:48:24 +13:00			`if re.search(r"./(.?)(gallery/\|a/)", link):`
			`imgur_id = re.match(r"./(?:gallery/\|a/)(.?)(?:/.*)?$", link).group(1)`
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`link = f"https://api.imgur.com/3/album/{imgur_id}"`
Account for new gallery url Coverage for gallery urls 2023-01-10 06:48:24 +13:00			`else:`
Imgur thumbnail coverage Coverage for links posted to thumbnail variations. 2023-01-10 09:34:14 +13:00			`imgur_id = re.match(r"./(.?)(?:_d)?(?:\..{0,})?$", link).group(1)`
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`link = f"https://api.imgur.com/3/image/{imgur_id}"`
Imgur updates Update Imgur logic to cover malformed links that cause a redirect leading to the html of the page being saved as an image. 2022-09-19 15:27:17 +12:00			`except AttributeError:`
Format according to the black standard 2022-12-03 18:11:17 +13:00			`raise SiteDownloaderError(f"Could not extract Imgur ID from {link}")`
Remove splice and fix quotes 2021-03-28 13:15:21 +13:00
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`headers = {`
			`"referer": "https://imgur.com/",`
			`"origin": "https://imgur.com",`
			`"content-type": "application/json",`
			`"Authorization": "Client-ID 546c25a59c58ad7",`
			`}`
			`res = Imgur.retrieve_url(link, headers=headers)`
Add defensive programming to site downloaders 2021-04-06 13:04:08 +12:00
			`try:`
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`image_dict = json.loads(res.text)`
Add defensive programming to site downloaders 2021-04-06 13:04:08 +12:00			`except json.JSONDecodeError as e:`
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`raise SiteDownloaderError(f"Could not parse received response as JSON: {e}")`
Refactor Imgur class to be hardier 2021-03-21 14:10:06 +13:00
Move Imgur to API Moves Imgur to use API with public Client-ID. 2023-01-22 11:36:56 +13:00			`return image_dict["data"]`