1
0
Fork 0
mirror of synced 2024-06-11 14:54:38 +12:00

Merge pull request #562 from aliparlakci/development

This commit is contained in:
Serene 2021-11-24 13:17:04 +10:00 committed by GitHub
commit 8718295ee5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
19 changed files with 192 additions and 41 deletions

View file

@ -7,6 +7,8 @@ This is a tool to download submissions or submission data from Reddit. It can be
If you wish to open an issue, please read [the guide on opening issues](docs/CONTRIBUTING.md#opening-an-issue) to ensure that your issue is clear and contains everything it needs to for the developers to investigate. If you wish to open an issue, please read [the guide on opening issues](docs/CONTRIBUTING.md#opening-an-issue) to ensure that your issue is clear and contains everything it needs to for the developers to investigate.
Included in this README are a few example Bash tricks to get certain behaviour. For that, see [Common Command Tricks](#common-command-tricks).
## Installation ## Installation
*Bulk Downloader for Reddit* needs Python version 3.9 or above. Please update Python before installation to meet the requirement. Then, you can install it as such: *Bulk Downloader for Reddit* needs Python version 3.9 or above. Please update Python before installation to meet the requirement. Then, you can install it as such:
```bash ```bash
@ -76,6 +78,9 @@ The following options are common between both the `archive` and `download` comma
- Can be specified multiple times - Can be specified multiple times
- Disables certain modules from being used - Disables certain modules from being used
- See [Disabling Modules](#disabling-modules) for more information and a list of module names - See [Disabling Modules](#disabling-modules) for more information and a list of module names
- `--ignore-user`
- This will add a user to ignore
- Can be specified multiple times
- `--include-id-file` - `--include-id-file`
- This will add any submission with the IDs in the files provided - This will add any submission with the IDs in the files provided
- Can be specified multiple times - Can be specified multiple times
@ -208,6 +213,16 @@ The following options are for the `archive` command specifically.
The `clone` command can take all the options listed above for both the `archive` and `download` commands since it performs the functions of both. The `clone` command can take all the options listed above for both the `archive` and `download` commands since it performs the functions of both.
## Common Command Tricks
A common use case is for subreddits/users to be loaded from a file. The BDFR doesn't support this directly but it is simple enough to do through the command-line. Consider a list of usernames to download; they can be passed through to the BDFR with the following command, assuming that the usernames are in a text file:
```bash
cat users.txt | xargs -L 1 echo --user | xargs -L 50 python3 -m bdfr download <ARGS>
```
The part `-L 50` is to make sure that the character limit for a single line isn't exceeded, but may not be necessary. This can also be used to load subreddits from a file, simply exchange `--user` with `--subreddit` and so on.
## Authentication and Security ## Authentication and Security
The BDFR uses OAuth2 authentication to connect to Reddit if authentication is required. This means that it is a secure, token-based system for making requests. This also means that the BDFR only has access to specific parts of the account authenticated, by default only saved posts, upvoted posts, and the identity of the authenticated account. Note that authentication is not required unless accessing private things like upvoted posts, saved posts, and private multireddits. The BDFR uses OAuth2 authentication to connect to Reddit if authentication is required. This means that it is a secure, token-based system for making requests. This also means that the BDFR only has access to specific parts of the account authenticated, by default only saved posts, upvoted posts, and the identity of the authenticated account. Note that authentication is not required unless accessing private things like upvoted posts, saved posts, and private multireddits.
@ -320,10 +335,14 @@ The BDFR can be run in multiple instances with multiple configurations, either c
Running these scenarios consecutively is done easily, like any single run. Configuration files that differ may be specified with the `--config` option to switch between tokens, for example. Otherwise, almost all configuration for data sources can be specified per-run through the command line. Running these scenarios consecutively is done easily, like any single run. Configuration files that differ may be specified with the `--config` option to switch between tokens, for example. Otherwise, almost all configuration for data sources can be specified per-run through the command line.
Running scenarious concurrently (at the same time) however, is more complicated. The BDFR will look to a single, static place to put the detailed log files, in a directory with the configuration file specified above. If there are multiple instances, or processes, of the BDFR running at the same time, they will all be trying to write to a single file. On Linux and other UNIX based operating systems, this will succeed, though there is a substantial risk that the logfile will be useless due to garbled and jumbled data. On Windows however, attempting this will raise an error that crashes the program as Windows forbids multiple processes from accessing the same file. Running scenarios concurrently (at the same time) however, is more complicated. The BDFR will look to a single, static place to put the detailed log files, in a directory with the configuration file specified above. If there are multiple instances, or processes, of the BDFR running at the same time, they will all be trying to write to a single file. On Linux and other UNIX based operating systems, this will succeed, though there is a substantial risk that the logfile will be useless due to garbled and jumbled data. On Windows however, attempting this will raise an error that crashes the program as Windows forbids multiple processes from accessing the same file.
The way to fix this is to use the `--log` option to manually specify where the logfile is to be stored. If the given location is unique to each instance of the BDFR, then it will run fine. The way to fix this is to use the `--log` option to manually specify where the logfile is to be stored. If the given location is unique to each instance of the BDFR, then it will run fine.
## Manipulating Logfiles
The logfiles that the BDFR outputs are consistent and quite detailed and in a format that is amenable to regex. To this end, a number of bash scripts have been [included here](./scripts). They show examples for how to extract successfully downloaded IDs, failed IDs, and more besides.
## List of currently supported sources ## List of currently supported sources
- Direct links (links leading to a file) - Direct links (links leading to a file)

View file

@ -17,6 +17,7 @@ _common_options = [
click.option('--authenticate', is_flag=True, default=None), click.option('--authenticate', is_flag=True, default=None),
click.option('--config', type=str, default=None), click.option('--config', type=str, default=None),
click.option('--disable-module', multiple=True, default=None, type=str), click.option('--disable-module', multiple=True, default=None, type=str),
click.option('--ignore-user', type=str, multiple=True, default=None),
click.option('--include-id-file', multiple=True, default=None), click.option('--include-id-file', multiple=True, default=None),
click.option('--log', type=str, default=None), click.option('--log', type=str, default=None),
click.option('--saved', is_flag=True, default=None), click.option('--saved', is_flag=True, default=None),

View file

@ -28,6 +28,11 @@ class Archiver(RedditConnector):
def download(self): def download(self):
for generator in self.reddit_lists: for generator in self.reddit_lists:
for submission in generator: for submission in generator:
if submission.author.name in self.args.ignore_user:
logger.debug(
f'Submission {submission.id} in {submission.subreddit.display_name} skipped'
f' due to {submission.author.name} being an ignored user')
continue
logger.debug(f'Attempting to archive submission {submission.id}') logger.debug(f'Attempting to archive submission {submission.id}')
self.write_entry(submission) self.write_entry(submission)

View file

@ -18,6 +18,7 @@ class Configuration(Namespace):
self.exclude_id_file = [] self.exclude_id_file = []
self.file_scheme: str = '{REDDITOR}_{TITLE}_{POSTID}' self.file_scheme: str = '{REDDITOR}_{TITLE}_{POSTID}'
self.folder_scheme: str = '{SUBREDDIT}' self.folder_scheme: str = '{SUBREDDIT}'
self.ignore_user = []
self.include_id_file = [] self.include_id_file = []
self.limit: Optional[int] = None self.limit: Optional[int] = None
self.link: list[str] = [] self.link: list[str] = []

View file

@ -51,6 +51,11 @@ class RedditDownloader(RedditConnector):
elif submission.subreddit.display_name.lower() in self.args.skip_subreddit: elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
logger.debug(f'Submission {submission.id} in {submission.subreddit.display_name} in skip list') logger.debug(f'Submission {submission.id} in {submission.subreddit.display_name} in skip list')
return return
elif submission.author.name in self.args.ignore_user:
logger.debug(
f'Submission {submission.id} in {submission.subreddit.display_name} skipped'
f' due to {submission.author.name} being an ignored user')
return
elif not isinstance(submission, praw.models.Submission): elif not isinstance(submission, praw.models.Submission):
logger.warning(f'{submission.id} is not a submission') logger.warning(f'{submission.id} is not a submission')
return return

View file

@ -110,31 +110,39 @@ class FileNameFormatter:
index = f'_{str(index)}' if index else '' index = f'_{str(index)}' if index else ''
if not resource.extension: if not resource.extension:
raise BulkDownloaderException(f'Resource from {resource.url} has no extension') raise BulkDownloaderException(f'Resource from {resource.url} has no extension')
ending = index + resource.extension
file_name = str(self._format_name(resource.source_submission, self.file_format_string)) file_name = str(self._format_name(resource.source_submission, self.file_format_string))
if not re.match(r'.*\.$', file_name) and not re.match(r'^\..*', resource.extension):
ending = index + '.' + resource.extension
else:
ending = index + resource.extension
try: try:
file_path = self._limit_file_name_length(file_name, ending, subfolder) file_path = self.limit_file_name_length(file_name, ending, subfolder)
except TypeError: except TypeError:
raise BulkDownloaderException(f'Could not determine path name: {subfolder}, {index}, {resource.extension}') raise BulkDownloaderException(f'Could not determine path name: {subfolder}, {index}, {resource.extension}')
return file_path return file_path
@staticmethod @staticmethod
def _limit_file_name_length(filename: str, ending: str, root: Path) -> Path: def limit_file_name_length(filename: str, ending: str, root: Path) -> Path:
root = root.resolve().expanduser() root = root.resolve().expanduser()
possible_id = re.search(r'((?:_\w{6})?$)', filename) possible_id = re.search(r'((?:_\w{6})?$)', filename)
if possible_id: if possible_id:
ending = possible_id.group(1) + ending ending = possible_id.group(1) + ending
filename = filename[:possible_id.start()] filename = filename[:possible_id.start()]
max_path = FileNameFormatter.find_max_path_length() max_path = FileNameFormatter.find_max_path_length()
max_length_chars = 255 - len(ending) max_file_part_length_chars = 255 - len(ending)
max_length_bytes = 255 - len(ending.encode('utf-8')) max_file_part_length_bytes = 255 - len(ending.encode('utf-8'))
max_path_length = max_path - len(ending) - len(str(root)) - 1 max_path_length = max_path - len(ending) - len(str(root)) - 1
while len(filename) > max_length_chars or \
len(filename.encode('utf-8')) > max_length_bytes or \ out = Path(root, filename + ending)
len(filename) > max_path_length: while any([len(filename) > max_file_part_length_chars,
len(filename.encode('utf-8')) > max_file_part_length_bytes,
len(str(out)) > max_path_length,
]):
filename = filename[:-1] filename = filename[:-1]
return Path(root, filename + ending) out = Path(root, filename + ending)
return out
@staticmethod @staticmethod
def find_max_path_length() -> int: def find_max_path_length() -> int:

View file

@ -9,7 +9,7 @@ from bdfr.exceptions import NotADownloadableLinkError
from bdfr.site_downloaders.base_downloader import BaseDownloader from bdfr.site_downloaders.base_downloader import BaseDownloader
from bdfr.site_downloaders.direct import Direct from bdfr.site_downloaders.direct import Direct
from bdfr.site_downloaders.erome import Erome from bdfr.site_downloaders.erome import Erome
from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import YoutubeDlFallback from bdfr.site_downloaders.fallback_downloaders.ytdlp_fallback import YtdlpFallback
from bdfr.site_downloaders.gallery import Gallery from bdfr.site_downloaders.gallery import Gallery
from bdfr.site_downloaders.gfycat import Gfycat from bdfr.site_downloaders.gfycat import Gfycat
from bdfr.site_downloaders.imgur import Imgur from bdfr.site_downloaders.imgur import Imgur
@ -24,7 +24,7 @@ class DownloadFactory:
@staticmethod @staticmethod
def pull_lever(url: str) -> Type[BaseDownloader]: def pull_lever(url: str) -> Type[BaseDownloader]:
sanitised_url = DownloadFactory.sanitise_url(url) sanitised_url = DownloadFactory.sanitise_url(url)
if re.match(r'(i\.)?imgur.*\.gifv$', sanitised_url): if re.match(r'(i\.)?imgur.*\.gif.+$', sanitised_url):
return Imgur return Imgur
elif re.match(r'.*/.*\.\w{3,4}(\?[\w;&=]*)?$', sanitised_url) and \ elif re.match(r'.*/.*\.\w{3,4}(\?[\w;&=]*)?$', sanitised_url) and \
not DownloadFactory.is_web_resource(sanitised_url): not DownloadFactory.is_web_resource(sanitised_url):
@ -49,8 +49,8 @@ class DownloadFactory:
return PornHub return PornHub
elif re.match(r'vidble\.com', sanitised_url): elif re.match(r'vidble\.com', sanitised_url):
return Vidble return Vidble
elif YoutubeDlFallback.can_handle_link(sanitised_url): elif YtdlpFallback.can_handle_link(sanitised_url):
return YoutubeDlFallback return YtdlpFallback
else: else:
raise NotADownloadableLinkError(f'No downloader module exists for url {url}') raise NotADownloadableLinkError(f'No downloader module exists for url {url}')

View file

@ -6,6 +6,7 @@ from typing import Optional
from praw.models import Submission from praw.models import Submission
from bdfr.exceptions import NotADownloadableLinkError
from bdfr.resource import Resource from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.fallback_downloaders.fallback_downloader import BaseFallbackDownloader from bdfr.site_downloaders.fallback_downloaders.fallback_downloader import BaseFallbackDownloader
@ -14,9 +15,9 @@ from bdfr.site_downloaders.youtube import Youtube
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class YoutubeDlFallback(BaseFallbackDownloader, Youtube): class YtdlpFallback(BaseFallbackDownloader, Youtube):
def __init__(self, post: Submission): def __init__(self, post: Submission):
super(YoutubeDlFallback, self).__init__(post) super(YtdlpFallback, self).__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]: def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
out = Resource( out = Resource(
@ -29,8 +30,9 @@ class YoutubeDlFallback(BaseFallbackDownloader, Youtube):
@staticmethod @staticmethod
def can_handle_link(url: str) -> bool: def can_handle_link(url: str) -> bool:
attributes = YoutubeDlFallback.get_video_attributes(url) try:
attributes = YtdlpFallback.get_video_attributes(url)
except NotADownloadableLinkError:
return False
if attributes: if attributes:
return True return True
else:
return False

View file

@ -42,9 +42,9 @@ class Imgur(BaseDownloader):
@staticmethod @staticmethod
def _get_data(link: str) -> dict: def _get_data(link: str) -> dict:
link = link.rstrip('?') link = link.rstrip('?')
if re.match(r'(?i).*\.gifv$', link): if re.match(r'(?i).*\.gif.+$', link):
link = link.replace('i.imgur', 'imgur') link = link.replace('i.imgur', 'imgur')
link = re.sub('(?i)\\.gifv$', '', link) link = re.sub('(?i)\\.gif.+$', '', link)
res = Imgur.retrieve_url(link, cookies={'over18': '1', 'postpagebeta': '0'}) res = Imgur.retrieve_url(link, cookies={'over18': '1', 'postpagebeta': '0'})

View file

@ -6,6 +6,7 @@ from typing import Optional
from praw.models import Submission from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.youtube import Youtube from bdfr.site_downloaders.youtube import Youtube
@ -22,10 +23,15 @@ class PornHub(Youtube):
'format': 'best', 'format': 'best',
'nooverwrites': True, 'nooverwrites': True,
} }
if video_attributes := super().get_video_attributes(self.post.url):
extension = video_attributes['ext']
else:
raise SiteDownloaderError()
out = Resource( out = Resource(
self.post, self.post,
self.post.url, self.post.url,
super()._download_video(ytdl_options), super()._download_video(ytdl_options),
super().get_video_attributes(self.post.url)['ext'], extension,
) )
return [out] return [out]

View file

@ -27,10 +27,7 @@ class Youtube(BaseDownloader):
'nooverwrites': True, 'nooverwrites': True,
} }
download_function = self._download_video(ytdl_options) download_function = self._download_video(ytdl_options)
try: extension = self.get_video_attributes(self.post.url)['ext']
extension = self.get_video_attributes(self.post.url)['ext']
except KeyError:
raise NotADownloadableLinkError(f'Youtube-DL cannot download URL {self.post.url}')
res = Resource(self.post, self.post.url, download_function, extension) res = Resource(self.post, self.post.url, download_function, extension)
return [res] return [res]
@ -67,6 +64,10 @@ class Youtube(BaseDownloader):
with yt_dlp.YoutubeDL({'logger': yt_logger, }) as ydl: with yt_dlp.YoutubeDL({'logger': yt_logger, }) as ydl:
try: try:
result = ydl.extract_info(url, download=False) result = ydl.extract_info(url, download=False)
return result
except Exception as e: except Exception as e:
logger.exception(e) logger.exception(e)
raise NotADownloadableLinkError(f'Video info extraction failed for {url}')
if 'ext' in result:
return result
else:
raise NotADownloadableLinkError(f'Video info extraction failed for {url}')

View file

@ -4,7 +4,7 @@ description_file = README.md
description_content_type = text/markdown description_content_type = text/markdown
home_page = https://github.com/aliparlakci/bulk-downloader-for-reddit home_page = https://github.com/aliparlakci/bulk-downloader-for-reddit
keywords = reddit, download, archive keywords = reddit, download, archive
version = 2.4.2 version = 2.5.0
author = Ali Parlakci author = Ali Parlakci
author_email = parlakciali@gmail.com author_email = parlakciali@gmail.com
maintainer = Serene Arc maintainer = Serene Arc

View file

@ -106,3 +106,18 @@ def test_cli_archive_long(test_args: list[str], tmp_path: Path):
result = runner.invoke(cli, test_args) result = runner.invoke(cli, test_args)
assert result.exit_code == 0 assert result.exit_code == 0
assert re.search(r'Writing entry .*? to file in .*? format', result.output) assert re.search(r'Writing entry .*? to file in .*? format', result.output)
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.skipif(not does_test_config_exist, reason='A test config file is required for integration tests')
@pytest.mark.parametrize('test_args', (
['--ignore-user', 'ArjanEgges', '-l', 'm3hxzd'],
))
def test_cli_archive_ignore_user(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = create_basic_args_for_archive_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'being an ignored user' in result.output
assert 'Attempting to archive submission' not in result.output

View file

@ -337,3 +337,18 @@ def test_cli_download_include_id_file(tmp_path: Path):
result = runner.invoke(cli, test_args) result = runner.invoke(cli, test_args)
assert result.exit_code == 0 assert result.exit_code == 0
assert 'Downloaded submission' in result.output assert 'Downloaded submission' in result.output
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.skipif(not does_test_config_exist, reason='A test config file is required for integration tests')
@pytest.mark.parametrize('test_args', (
['--ignore-user', 'ArjanEgges', '-l', 'm3hxzd'],
))
def test_cli_download_ignore_user(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Downloaded submission' not in result.output
assert 'being an ignored user' in result.output

View file

@ -4,8 +4,9 @@ from unittest.mock import MagicMock
import pytest import pytest
from bdfr.exceptions import NotADownloadableLinkError
from bdfr.resource import Resource from bdfr.resource import Resource
from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import YoutubeDlFallback from bdfr.site_downloaders.fallback_downloaders.ytdlp_fallback import YtdlpFallback
@pytest.mark.online @pytest.mark.online
@ -13,12 +14,22 @@ from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import Youtub
('https://www.reddit.com/r/specializedtools/comments/n2nw5m/bamboo_splitter/', True), ('https://www.reddit.com/r/specializedtools/comments/n2nw5m/bamboo_splitter/', True),
('https://www.youtube.com/watch?v=P19nvJOmqCc', True), ('https://www.youtube.com/watch?v=P19nvJOmqCc', True),
('https://www.example.com/test', False), ('https://www.example.com/test', False),
('https://milesmatrix.bandcamp.com/album/la-boum/', False),
)) ))
def test_can_handle_link(test_url: str, expected: bool): def test_can_handle_link(test_url: str, expected: bool):
result = YoutubeDlFallback.can_handle_link(test_url) result = YtdlpFallback.can_handle_link(test_url)
assert result == expected assert result == expected
@pytest.mark.online
@pytest.mark.parametrize('test_url', (
'https://milesmatrix.bandcamp.com/album/la-boum/',
))
def test_info_extraction_bad(test_url: str):
with pytest.raises(NotADownloadableLinkError):
YtdlpFallback.get_video_attributes(test_url)
@pytest.mark.online @pytest.mark.online
@pytest.mark.slow @pytest.mark.slow
@pytest.mark.parametrize(('test_url', 'expected_hash'), ( @pytest.mark.parametrize(('test_url', 'expected_hash'), (
@ -30,7 +41,7 @@ def test_can_handle_link(test_url: str, expected: bool):
def test_find_resources(test_url: str, expected_hash: str): def test_find_resources(test_url: str, expected_hash: str):
test_submission = MagicMock() test_submission = MagicMock()
test_submission.url = test_url test_submission.url = test_url
downloader = YoutubeDlFallback(test_submission) downloader = YtdlpFallback(test_submission)
resources = downloader.find_resources() resources = downloader.find_resources()
assert len(resources) == 1 assert len(resources) == 1
assert isinstance(resources[0], Resource) assert isinstance(resources[0], Resource)

View file

@ -9,7 +9,7 @@ from bdfr.site_downloaders.base_downloader import BaseDownloader
from bdfr.site_downloaders.direct import Direct from bdfr.site_downloaders.direct import Direct
from bdfr.site_downloaders.download_factory import DownloadFactory from bdfr.site_downloaders.download_factory import DownloadFactory
from bdfr.site_downloaders.erome import Erome from bdfr.site_downloaders.erome import Erome
from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import YoutubeDlFallback from bdfr.site_downloaders.fallback_downloaders.ytdlp_fallback import YtdlpFallback
from bdfr.site_downloaders.gallery import Gallery from bdfr.site_downloaders.gallery import Gallery
from bdfr.site_downloaders.gfycat import Gfycat from bdfr.site_downloaders.gfycat import Gfycat
from bdfr.site_downloaders.imgur import Imgur from bdfr.site_downloaders.imgur import Imgur
@ -30,6 +30,7 @@ from bdfr.site_downloaders.youtube import Youtube
('https://imgur.com/BuzvZwb.gifv', Imgur), ('https://imgur.com/BuzvZwb.gifv', Imgur),
('https://i.imgur.com/6fNdLst.gif', Direct), ('https://i.imgur.com/6fNdLst.gif', Direct),
('https://imgur.com/a/MkxAzeg', Imgur), ('https://imgur.com/a/MkxAzeg', Imgur),
('https://i.imgur.com/OGeVuAe.giff', Imgur),
('https://www.reddit.com/gallery/lu93m7', Gallery), ('https://www.reddit.com/gallery/lu93m7', Gallery),
('https://gfycat.com/concretecheerfulfinwhale', Gfycat), ('https://gfycat.com/concretecheerfulfinwhale', Gfycat),
('https://www.erome.com/a/NWGw0F09', Erome), ('https://www.erome.com/a/NWGw0F09', Erome),
@ -41,10 +42,10 @@ from bdfr.site_downloaders.youtube import Youtube
('https://i.imgur.com/3SKrQfK.jpg?1', Direct), ('https://i.imgur.com/3SKrQfK.jpg?1', Direct),
('https://dynasty-scans.com/system/images_images/000/017/819/original/80215103_p0.png?1612232781', Direct), ('https://dynasty-scans.com/system/images_images/000/017/819/original/80215103_p0.png?1612232781', Direct),
('https://m.imgur.com/a/py3RW0j', Imgur), ('https://m.imgur.com/a/py3RW0j', Imgur),
('https://v.redd.it/9z1dnk3xr5k61', YoutubeDlFallback), ('https://v.redd.it/9z1dnk3xr5k61', YtdlpFallback),
('https://streamable.com/dt46y', YoutubeDlFallback), ('https://streamable.com/dt46y', YtdlpFallback),
('https://vimeo.com/channels/31259/53576664', YoutubeDlFallback), ('https://vimeo.com/channels/31259/53576664', YtdlpFallback),
('http://video.pbs.org/viralplayer/2365173446/', YoutubeDlFallback), ('http://video.pbs.org/viralplayer/2365173446/', YtdlpFallback),
('https://www.pornhub.com/view_video.php?viewkey=ph5a2ee0461a8d0', PornHub), ('https://www.pornhub.com/view_video.php?viewkey=ph5a2ee0461a8d0', PornHub),
)) ))
def test_factory_lever_good(test_submission_url: str, expected_class: BaseDownloader, reddit_instance: praw.Reddit): def test_factory_lever_good(test_submission_url: str, expected_class: BaseDownloader, reddit_instance: praw.Reddit):

View file

@ -150,6 +150,14 @@ def test_imgur_extension_validation_bad(test_extension: str):
'https://i.imgur.com/uTvtQsw.gifv', 'https://i.imgur.com/uTvtQsw.gifv',
('46c86533aa60fc0e09f2a758513e3ac2',), ('46c86533aa60fc0e09f2a758513e3ac2',),
), ),
(
'https://i.imgur.com/OGeVuAe.giff',
('77389679084d381336f168538793f218',)
),
(
'https://i.imgur.com/OGeVuAe.gift',
('77389679084d381336f168538793f218',)
),
)) ))
def test_find_resources(test_url: str, expected_hashes: list[str]): def test_find_resources(test_url: str, expected_hashes: list[str]):
mock_download = Mock() mock_download = Mock()

View file

@ -5,6 +5,7 @@ from unittest.mock import MagicMock
import pytest import pytest
from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource from bdfr.resource import Resource
from bdfr.site_downloaders.pornhub import PornHub from bdfr.site_downloaders.pornhub import PornHub
@ -13,6 +14,7 @@ from bdfr.site_downloaders.pornhub import PornHub
@pytest.mark.slow @pytest.mark.slow
@pytest.mark.parametrize(('test_url', 'expected_hash'), ( @pytest.mark.parametrize(('test_url', 'expected_hash'), (
('https://www.pornhub.com/view_video.php?viewkey=ph6074c59798497', 'd9b99e4ebecf2d8d67efe5e70d2acf8a'), ('https://www.pornhub.com/view_video.php?viewkey=ph6074c59798497', 'd9b99e4ebecf2d8d67efe5e70d2acf8a'),
('https://www.pornhub.com/view_video.php?viewkey=ph5ede121f0d3f8', ''),
)) ))
def test_find_resources_good(test_url: str, expected_hash: str): def test_find_resources_good(test_url: str, expected_hash: str):
test_submission = MagicMock() test_submission = MagicMock()
@ -23,3 +25,15 @@ def test_find_resources_good(test_url: str, expected_hash: str):
assert isinstance(resources[0], Resource) assert isinstance(resources[0], Resource)
resources[0].download() resources[0].download()
assert resources[0].hash.hexdigest() == expected_hash assert resources[0].hash.hexdigest() == expected_hash
@pytest.mark.online
@pytest.mark.parametrize('test_url', (
'https://www.pornhub.com/view_video.php?viewkey=ph5ede121f0d3f8',
))
def test_find_resources_good(test_url: str):
test_submission = MagicMock()
test_submission.url = test_url
downloader = PornHub(test_submission)
with pytest.raises(SiteDownloaderError):
downloader.find_resources()

View file

@ -2,6 +2,7 @@
# coding=utf-8 # coding=utf-8
import platform import platform
import sys
import unittest.mock import unittest.mock
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
@ -13,6 +14,8 @@ import pytest
from bdfr.file_name_formatter import FileNameFormatter from bdfr.file_name_formatter import FileNameFormatter
from bdfr.resource import Resource from bdfr.resource import Resource
from bdfr.site_downloaders.base_downloader import BaseDownloader
from bdfr.site_downloaders.fallback_downloaders.ytdlp_fallback import YtdlpFallback
@pytest.fixture() @pytest.fixture()
@ -185,7 +188,7 @@ def test_format_multiple_resources():
('😍💕✨' * 100, '_1.png'), ('😍💕✨' * 100, '_1.png'),
)) ))
def test_limit_filename_length(test_filename: str, test_ending: str): def test_limit_filename_length(test_filename: str, test_ending: str):
result = FileNameFormatter._limit_file_name_length(test_filename, test_ending, Path('.')) result = FileNameFormatter.limit_file_name_length(test_filename, test_ending, Path('.'))
assert len(result.name) <= 255 assert len(result.name) <= 255
assert len(result.name.encode('utf-8')) <= 255 assert len(result.name.encode('utf-8')) <= 255
assert len(str(result)) <= FileNameFormatter.find_max_path_length() assert len(str(result)) <= FileNameFormatter.find_max_path_length()
@ -204,15 +207,16 @@ def test_limit_filename_length(test_filename: str, test_ending: str):
('😍💕✨' * 100 + '_aaa1aa', '_1.png', '_aaa1aa_1.png'), ('😍💕✨' * 100 + '_aaa1aa', '_1.png', '_aaa1aa_1.png'),
)) ))
def test_preserve_id_append_when_shortening(test_filename: str, test_ending: str, expected_end: str): def test_preserve_id_append_when_shortening(test_filename: str, test_ending: str, expected_end: str):
result = FileNameFormatter._limit_file_name_length(test_filename, test_ending, Path('.')) result = FileNameFormatter.limit_file_name_length(test_filename, test_ending, Path('.'))
assert len(result.name) <= 255 assert len(result.name) <= 255
assert len(result.name.encode('utf-8')) <= 255 assert len(result.name.encode('utf-8')) <= 255
assert result.name.endswith(expected_end) assert result.name.endswith(expected_end)
assert len(str(result)) <= FileNameFormatter.find_max_path_length() assert len(str(result)) <= FileNameFormatter.find_max_path_length()
def test_shorten_filenames(submission: MagicMock, tmp_path: Path): @pytest.mark.skipif(sys.platform == 'win32', reason='Test broken on windows github')
submission.title = 'A' * 300 def test_shorten_filename_real(submission: MagicMock, tmp_path: Path):
submission.title = 'A' * 500
submission.author.name = 'test' submission.author.name = 'test'
submission.subreddit.display_name = 'test' submission.subreddit.display_name = 'test'
submission.id = 'BBBBBB' submission.id = 'BBBBBB'
@ -223,6 +227,21 @@ def test_shorten_filenames(submission: MagicMock, tmp_path: Path):
result.touch() result.touch()
@pytest.mark.parametrize(('test_name', 'test_ending'), (
('a', 'b'),
('a', '_bbbbbb.jpg'),
('a' * 20, '_bbbbbb.jpg'),
('a' * 50, '_bbbbbb.jpg'),
('a' * 500, '_bbbbbb.jpg'),
))
def test_shorten_path(test_name: str, test_ending: str, tmp_path: Path):
result = FileNameFormatter.limit_file_name_length(test_name, test_ending, tmp_path)
assert len(str(result.name)) <= 255
assert len(str(result.name).encode('UTF-8')) <= 255
assert len(str(result.name).encode('cp1252')) <= 255
assert len(str(result)) <= FileNameFormatter.find_max_path_length()
@pytest.mark.parametrize(('test_string', 'expected'), ( @pytest.mark.parametrize(('test_string', 'expected'), (
('test', 'test'), ('test', 'test'),
('test😍', 'test'), ('test😍', 'test'),
@ -377,6 +396,26 @@ def test_get_max_path_length():
def test_windows_max_path(tmp_path: Path): def test_windows_max_path(tmp_path: Path):
with unittest.mock.patch('platform.system', return_value='Windows'): with unittest.mock.patch('platform.system', return_value='Windows'):
with unittest.mock.patch('bdfr.file_name_formatter.FileNameFormatter.find_max_path_length', return_value=260): with unittest.mock.patch('bdfr.file_name_formatter.FileNameFormatter.find_max_path_length', return_value=260):
result = FileNameFormatter._limit_file_name_length('test' * 100, '_1.png', tmp_path) result = FileNameFormatter.limit_file_name_length('test' * 100, '_1.png', tmp_path)
assert len(str(result)) <= 260 assert len(str(result)) <= 260
assert len(result.name) <= (260 - len(str(tmp_path))) assert len(result.name) <= (260 - len(str(tmp_path)))
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize(('test_reddit_id', 'test_downloader', 'expected_names'), (
('gphmnr', YtdlpFallback, {'He has a lot to say today.mp4'}),
('d0oir2', YtdlpFallback, {"Crunk's finest moment. Welcome to the new subreddit!.mp4"}),
))
def test_name_submission(
test_reddit_id: str,
test_downloader: type(BaseDownloader),
expected_names: set[str],
reddit_instance: praw.reddit.Reddit,
):
test_submission = reddit_instance.submission(id=test_reddit_id)
test_resources = test_downloader(test_submission).find_resources()
test_formatter = FileNameFormatter('{TITLE}', '', '')
results = test_formatter.format_resource_paths(test_resources, Path('.'))
results = set([r[0].name for r in results])
assert expected_names == results