1
0
Fork 0
mirror of synced 2024-06-22 04:00:20 +12:00

Merge pull request #361 from aliparlakci/development

Release v2.1.0
This commit is contained in:
Ali Parlakçı 2021-05-11 13:29:26 +03:00 committed by GitHub
commit 6c086e70f7
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
43 changed files with 634 additions and 275 deletions

View file

@ -9,7 +9,7 @@ assignees: ''
- [ ] I am reporting a bug.
- [ ] I am running the latest version of BDfR
- [ ] I have read the [Opening an issue](README.md#configuration)
- [ ] I have read the [Opening an issue](../../README.md#configuration)
## Description
A clear and concise description of what the bug is.

View file

@ -9,7 +9,7 @@ assignees: ''
- [ ] I am requesting a feature.
- [ ] I am running the latest version of BDfR
- [ ] I have read the [Opening an issue](README.md#configuration)
- [ ] I have read the [Opening an issue](../../README.md#configuration)
## Description
Clearly state the current situation and issues you experience. Then, explain how this feature would solve these issues and make life easier. Also, explain the feature with as many detail as possible.

View file

@ -9,7 +9,7 @@ assignees: ''
- [ ] I am requesting a site support.
- [ ] I am running the latest version of BDfR
- [ ] I have read the [Opening an issue](README.md#configuration)
- [ ] I have read the [Opening an issue](../../README.md#configuration)
## Site
Provide a URL to domain of the site.

View file

@ -8,15 +8,17 @@ on:
jobs:
test:
runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
python-version: [3.9]
ext: [.sh]
include:
- os: windows-latest
python-version: 3.9
ext: .ps1
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
@ -26,19 +28,19 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip flake8 pytest pytest-cov
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install -r requirements.txt
- name: Setup test configuration
- name: Make configuration for tests
env:
REDDIT_TOKEN: ${{ secrets.REDDIT_TEST_TOKEN }}
run: |
cp bdfr/default_config.cfg ./test_config.cfg
echo -e "\nuser_token = ${{ secrets.REDDIT_TEST_TOKEN }}" >> ./test_config.cfg
./devscripts/configure${{ matrix.ext }}
- name: Lint w/ flake8
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
- name: Test w/ PyTest
- name: Test with pytest
run: |
pytest -m 'not slow' --verbose --cov=./bdfr/ --cov-report term:skip-covered --cov-report html

View file

@ -12,6 +12,11 @@ If you wish to open an issue, please read [the guide on opening issues](docs/CON
python3 -m pip install bdfr
```
If on Arch Linux or derivative operating systems such as Manjaro, the BDFR can be installed through the AUR.
- Latest Release: https://aur.archlinux.org/packages/python-bdfr/
- Latest Development Build: https://aur.archlinux.org/packages/python-bdfr-git/
If you want to use the source code or make contributions, refer to [CONTRIBUTING](docs/CONTRIBUTING.md#preparing-the-environment-for-development)
## Usage
@ -55,6 +60,9 @@ The following options are common between both the `archive` and `download` comma
- `--config`
- If the path to a configuration file is supplied with this option, the BDFR will use the specified config
- See [Configuration Files](#configuration) for more details
- `--log`
- This allows one to specify the location of the logfile
- This must be done when running multiple instances of the BDFR, see [Multiple Instances](#multiple-instances) below
- `--saved`
- This option will make the BDFR use the supplied user's saved posts list as a download source
- This requires an authenticated Reddit instance, using the `--authenticate` flag, as well as `--user` set to `me`
@ -106,6 +114,9 @@ The following options are common between both the `archive` and `download` comma
- `week`
- `month`
- `year`
- `--time-format`
- This specifies the format of the datetime string that replaces `{DATE}` in file and folder naming schemes
- See [Time Formatting Customisation](#time-formatting-customisation) for more details, and the formatting scheme
- `-u, --user`
- This specifies the user to scrape in concert with other options
- When using `--authenticate`, `--user me` can be used to refer to the authenticated user
@ -208,13 +219,20 @@ It is highly recommended that the file name scheme contain the parameter `{POSTI
## Configuration
The configuration files are, by default, stored in the configuration directory for the user. This differs depending on the OS that the BDFR is being run on. For Windows, this will be:
- `C:\Users\<User>\AppData\Local\BDFR\bdfr`
If Python has been installed through the Windows Store, the folder will appear in a different place. Note that the hash included in the file path may change from installation to installation.
- `C:\Users\<User>\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\Local\BDFR\bdfr`
On Mac OSX, this will be:
- `~/Library/Application Support/bdfr`.
Lastly, on a Linux system, this will be:
- `~/.local/share/bdfr`
- `~/.config/bdfr/`
The logging output for each run of the BDFR will be saved to this directory in the file `log_output.txt`. If you need to submit a bug, it is this file that you will need to submit with the report.
@ -222,16 +240,26 @@ The logging output for each run of the BDFR will be saved to this directory in t
The `config.cfg` is the file that supplies the BDFR with the configuration to use. At the moment, the following keys **must** be included in the configuration file supplied.
- `backup_log_count`
- `max_wait_time`
- `client_id`
- `client_secret`
- `scopes`
The following keys are optional, and defaults will be used if they cannot be found.
- `backup_log_count`
- `max_wait_time`
- `time_format`
All of these should not be modified unless you know what you're doing, as the default values will enable the BDFR to function just fine. A configuration is included in the BDFR when it is installed, and this will be placed in the configuration directory as the default.
Most of these values have to do with OAuth2 configuration and authorisation. The key `backup_log_count` however has to do with the log rollover. The logs in the configuration directory can be verbose and for long runs of the BDFR, can grow quite large. To combat this, the BDFR will overwrite previous logs. This value determines how many previous run logs will be kept. The default is 3, which means that the BDFR will keep at most three past logs plus the current one. Any runs past this will overwrite the oldest log file, called "rolling over". If you want more records of past runs, increase this number.
#### Time Formatting Customisation
The option `time_format` will specify the format of the timestamp that replaces `{DATE}` in filename and folder name schemes. By default, this is the [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format which is highly recommended due to its standardised nature. If you don't **need** to change it, it is recommended that you do not. However, you can specify it to anything required with this option. The `--time-format` option supersedes any specification in the configuration file
The format can be specified through the [format codes](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior) that are standard in the Python `datetime` library.
### Rate Limiting
The option `max_wait_time` has to do with retrying downloads. There are certain HTTP errors that mean that no amount of requests will return the wanted data, but some errors are from rate-limiting. This is when a single client is making so many requests that the remote website cuts the client off to preserve the function of the site. This is a common situation when downloading many resources from the same site. It is polite and best practice to obey the website's wishes in these cases.
@ -240,6 +268,16 @@ To this end, the BDFR will sleep for a time before retrying the download, giving
The option `--max-wait-time` and the configuration option `max_wait_time` both specify the maximum time the BDFR will wait. If both are present, the command-line option takes precedence. For instance, the default is 120, so the BDFR will wait for 60 seconds, then 120 seconds, and then move one. **Note that this results in a total time of 180 seconds trying the same download**. If you wish to try to bypass the rate-limiting system on the remote site, increasing the maximum wait time may help. However, note that the actual wait times increase exponentially if the resource is not downloaded i.e. specifying a max value of 300 (5 minutes), can make the BDFR pause for 15 minutes on one submission, not 5, in the worst case.
## Multiple Instances
The BDFR can be run in multiple instances with multiple configurations, either concurrently or consecutively. The use of scripting files facilitates this the easiest, either Powershell on Windows operating systems or Bash elsewhere. This allows multiple scenarios to be run with data being scraped from different sources, as any two sets of scenarios might be mutually exclusive i.e. it is not possible to download any combination of data from a single run of the BDFR. To download from multiple users for example, multiple runs of the BDFR are required.
Running these scenarios consecutively is done easily, like any single run. Configuration files that differ may be specified with the `--config` option to switch between tokens, for example. Otherwise, almost all configuration for data sources can be specified per-run through the command line.
Running scenarious concurrently (at the same time) however, is more complicated. The BDFR will look to a single, static place to put the detailed log files, in a directory with the configuration file specified above. If there are multiple instances, or processes, of the BDFR running at the same time, they will all be trying to write to a single file. On Linux and other UNIX based operating systems, this will succeed, though there is a substantial risk that the logfile will be useless due to garbled and jumbled data. On Windows however, attempting this will raise an error that crashes the program as Windows forbids multiple processes from accessing the same file.
The way to fix this is to use the `--log` option to manually specify where the logfile is to be stored. If the given location is unique to each instance of the BDFR, then it will run fine.
## List of currently supported sources
- Direct links (links leading to a file)
@ -252,6 +290,7 @@ The option `--max-wait-time` and the configuration option `max_wait_time` both s
- Reddit Videos
- Redgifs
- YouTube
- Streamable
## Contributing

View file

@ -20,10 +20,12 @@ _common_options = [
click.option('-m', '--multireddit', multiple=True, default=None, type=str),
click.option('-L', '--limit', default=None, type=int),
click.option('--authenticate', is_flag=True, default=None),
click.option('--log', type=str, default=None),
click.option('--submitted', is_flag=True, default=None),
click.option('--upvoted', is_flag=True, default=None),
click.option('--saved', is_flag=True, default=None),
click.option('--search', default=None, type=str),
click.option('--time-format', type=str, default=None),
click.option('-u', '--user', type=str, default=None),
click.option('-t', '--time', type=click.Choice(('all', 'hour', 'day', 'week', 'month', 'year')), default=None),
click.option('-S', '--sort', type=click.Choice(('hot', 'top', 'new',
@ -73,7 +75,7 @@ def cli_download(context: click.Context, **_):
@cli.command('archive')
@_add_common_options
@click.option('--all-comments', is_flag=True, default=None)
@click.option('-f,', '--format', type=click.Choice(('xml', 'json', 'yaml')), default=None)
@click.option('-f', '--format', type=click.Choice(('xml', 'json', 'yaml')), default=None)
@click.pass_context
def cli_archive(context: click.Context, **_):
config = Configuration()

View file

@ -89,7 +89,7 @@ class Archiver(RedditDownloader):
def _write_content_to_disk(self, resource: Resource, content: str):
file_path = self.file_name_formatter.format_path(resource, self.download_directory)
file_path.parent.mkdir(exist_ok=True, parents=True)
with open(file_path, 'w') as file:
with open(file_path, 'w', encoding="utf-8") as file:
logger.debug(
f'Writing entry {resource.source_submission.id} to file in {resource.extension[1:].upper()}'
f' format at {file_path}')

View file

@ -17,6 +17,7 @@ class Configuration(Namespace):
self.exclude_id_file = []
self.limit: Optional[int] = None
self.link: list[str] = []
self.log: Optional[str] = None
self.max_wait_time = None
self.multireddit: list[str] = []
self.no_dupes: bool = False
@ -32,6 +33,7 @@ class Configuration(Namespace):
self.submitted: bool = False
self.subreddit: list[str] = []
self.time: str = 'all'
self.time_format = None
self.upvoted: bool = False
self.user: Optional[str] = None
self.verbose: int = 0

View file

@ -3,4 +3,5 @@ client_id = U-6gk4ZCh3IeNQ
client_secret = 7CZHY6AmKweZME5s50SfDGylaPg
scopes = identity, history, read, save
backup_log_count = 3
max_wait_time = 120
max_wait_time = 120
time_format = ISO

View file

@ -4,6 +4,8 @@
import logging
import re
from bdfr.resource import Resource
logger = logging.getLogger(__name__)
@ -21,13 +23,20 @@ class DownloadFilter:
else:
return True
def _check_extension(self, url: str) -> bool:
def check_resource(self, res: Resource) -> bool:
if not self._check_extension(res.extension):
return False
elif not self._check_domain(res.url):
return False
return True
def _check_extension(self, resource_extension: str) -> bool:
if not self.excluded_extensions:
return True
combined_extensions = '|'.join(self.excluded_extensions)
pattern = re.compile(r'.*({})$'.format(combined_extensions))
if re.match(pattern, url):
logger.log(9, f'Url "{url}" matched with "{str(pattern)}"')
if re.match(pattern, resource_extension):
logger.log(9, f'Url "{resource_extension}" matched with "{str(pattern)}"')
return False
else:
return True

View file

@ -14,7 +14,7 @@ from datetime import datetime
from enum import Enum, auto
from multiprocessing import Pool
from pathlib import Path
from typing import Iterator
from typing import Callable, Iterator
import appdirs
import praw
@ -105,6 +105,12 @@ class RedditDownloader:
logger.log(9, 'Wrote default download wait time download to config file')
self.args.max_wait_time = self.cfg_parser.getint('DEFAULT', 'max_wait_time')
logger.debug(f'Setting maximum download wait time to {self.args.max_wait_time} seconds')
if self.args.time_format is None:
option = self.cfg_parser.get('DEFAULT', 'time_format', fallback='ISO')
if re.match(r'^[ \'\"]*$', option):
option = 'ISO'
logger.debug(f'Setting datetime format string to {option}')
self.args.time_format = option
# Update config on disk
with open(self.config_location, 'w') as file:
self.cfg_parser.write(file)
@ -190,7 +196,12 @@ class RedditDownloader:
def _create_file_logger(self):
main_logger = logging.getLogger()
log_path = Path(self.config_directory, 'log_output.txt')
if self.args.log is None:
log_path = Path(self.config_directory, 'log_output.txt')
else:
log_path = Path(self.args.log).resolve().expanduser()
if not log_path.parent.exists():
raise errors.BulkDownloaderException(f'Designated location for logfile does not exist')
backup_count = self.cfg_parser.getint('DEFAULT', 'backup_log_count', fallback=3)
file_handler = logging.handlers.RotatingFileHandler(
log_path,
@ -198,7 +209,13 @@ class RedditDownloader:
backupCount=backup_count,
)
if log_path.exists():
file_handler.doRollover()
try:
file_handler.doRollover()
except PermissionError as e:
logger.critical(
'Cannot rollover logfile, make sure this is the only '
'BDFR process or specify alternate logfile location')
raise
formatter = logging.Formatter('[%(asctime)s - %(name)s - %(levelname)s] - %(message)s')
file_handler.setFormatter(formatter)
file_handler.setLevel(0)
@ -207,7 +224,7 @@ class RedditDownloader:
@staticmethod
def _sanitise_subreddit_name(subreddit: str) -> str:
pattern = re.compile(r'^(?:https://www\.reddit\.com/)?(?:r/)?(.*?)(?:/)?$')
pattern = re.compile(r'^(?:https://www\.reddit\.com/)?(?:r/)?(.*?)/?$')
match = re.match(pattern, subreddit)
if not match:
raise errors.BulkDownloaderException(f'Could not find subreddit name in string {subreddit}')
@ -225,10 +242,14 @@ class RedditDownloader:
def _get_subreddits(self) -> list[praw.models.ListingGenerator]:
if self.args.subreddit:
out = []
sort_function = self._determine_sort_function()
for reddit in self._split_args_input(self.args.subreddit):
try:
reddit = self.reddit_instance.subreddit(reddit)
try:
self._check_subreddit_status(reddit)
except errors.BulkDownloaderException as e:
logger.error(e)
continue
if self.args.search:
out.append(reddit.search(
self.args.search,
@ -265,7 +286,7 @@ class RedditDownloader:
supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
return [supplied_submissions]
def _determine_sort_function(self):
def _determine_sort_function(self) -> Callable:
if self.sort_filter is RedditTypes.SortType.NEW:
sort_function = praw.models.Subreddit.new
elif self.sort_filter is RedditTypes.SortType.RISING:
@ -304,8 +325,10 @@ class RedditDownloader:
def _get_user_data(self) -> list[Iterator]:
if any([self.args.submitted, self.args.upvoted, self.args.saved]):
if self.args.user:
if not self._check_user_existence(self.args.user):
logger.error(f'User {self.args.user} does not exist')
try:
self._check_user_existence(self.args.user)
except errors.BulkDownloaderException as e:
logger.error(e)
return []
generators = []
if self.args.submitted:
@ -329,17 +352,19 @@ class RedditDownloader:
else:
return []
def _check_user_existence(self, name: str) -> bool:
def _check_user_existence(self, name: str):
user = self.reddit_instance.redditor(name=name)
try:
if not user.id:
return False
if user.id:
return
except prawcore.exceptions.NotFound:
return False
return True
raise errors.BulkDownloaderException(f'Could not find user {name}')
except AttributeError:
if hasattr(user, 'is_suspended'):
raise errors.BulkDownloaderException(f'User {name} is banned')
def _create_file_name_formatter(self) -> FileNameFormatter:
return FileNameFormatter(self.args.file_scheme, self.args.folder_scheme)
return FileNameFormatter(self.args.file_scheme, self.args.folder_scheme, self.args.time_format)
def _create_time_filter(self) -> RedditTypes.TimeType:
try:
@ -375,9 +400,6 @@ class RedditDownloader:
if not isinstance(submission, praw.models.Submission):
logger.warning(f'{submission.id} is not a submission')
return
if not self.download_filter.check_url(submission.url):
logger.debug(f'Download filter removed submission {submission.id} with URL {submission.url}')
return
try:
downloader_class = DownloadFactory.pull_lever(submission.url)
downloader = downloader_class(submission)
@ -394,12 +416,14 @@ class RedditDownloader:
for destination, res in self.file_name_formatter.format_resource_paths(content, self.download_directory):
if destination.exists():
logger.debug(f'File {destination} already exists, continuing')
elif not self.download_filter.check_resource(res):
logger.debug(f'Download filter removed {submission.id} with URL {submission.url}')
else:
try:
res.download(self.args.max_wait_time)
except errors.BulkDownloaderException as e:
logger.error(
f'Failed to download resource {res.url} with downloader {downloader_class.__name__}: {e}')
logger.error(f'Failed to download resource {res.url} in submission {submission.id} '
f'with downloader {downloader_class.__name__}: {e}')
return
resource_hash = res.hash.hexdigest()
destination.parent.mkdir(parents=True, exist_ok=True)
@ -446,3 +470,14 @@ class RedditDownloader:
for line in file:
out.append(line.strip())
return set(out)
@staticmethod
def _check_subreddit_status(subreddit: praw.models.Subreddit):
if subreddit.display_name == 'all':
return
try:
assert subreddit.id
except prawcore.NotFound:
raise errors.BulkDownloaderException(f'Source {subreddit.display_name} does not exist or cannot be found')
except prawcore.Forbidden:
raise errors.BulkDownloaderException(f'Source {subreddit.display_name} is private and cannot be scraped')

View file

@ -1,6 +1,6 @@
#!/usr/bin/env python3
# coding=utf-8
import datetime
import logging
import platform
import re
@ -26,18 +26,18 @@ class FileNameFormatter:
'upvotes',
)
def __init__(self, file_format_string: str, directory_format_string: str):
def __init__(self, file_format_string: str, directory_format_string: str, time_format_string: str):
if not self.validate_string(file_format_string):
raise BulkDownloaderException(f'"{file_format_string}" is not a valid format string')
self.file_format_string = file_format_string
self.directory_format_string: list[str] = directory_format_string.split('/')
self.time_format_string = time_format_string
@staticmethod
def _format_name(submission: (Comment, Submission), format_string: str) -> str:
def _format_name(self, submission: (Comment, Submission), format_string: str) -> str:
if isinstance(submission, Submission):
attributes = FileNameFormatter._generate_name_dict_from_submission(submission)
attributes = self._generate_name_dict_from_submission(submission)
elif isinstance(submission, Comment):
attributes = FileNameFormatter._generate_name_dict_from_comment(submission)
attributes = self._generate_name_dict_from_comment(submission)
else:
raise BulkDownloaderException(f'Cannot name object {type(submission).__name__}')
result = format_string
@ -65,8 +65,7 @@ class FileNameFormatter:
in_string = in_string.replace(match, converted_match)
return in_string
@staticmethod
def _generate_name_dict_from_submission(submission: Submission) -> dict:
def _generate_name_dict_from_submission(self, submission: Submission) -> dict:
submission_attributes = {
'title': submission.title,
'subreddit': submission.subreddit.display_name,
@ -74,12 +73,18 @@ class FileNameFormatter:
'postid': submission.id,
'upvotes': submission.score,
'flair': submission.link_flair_text,
'date': submission.created_utc
'date': self._convert_timestamp(submission.created_utc),
}
return submission_attributes
@staticmethod
def _generate_name_dict_from_comment(comment: Comment) -> dict:
def _convert_timestamp(self, timestamp: float) -> str:
input_time = datetime.datetime.fromtimestamp(timestamp)
if self.time_format_string.upper().strip() == 'ISO':
return input_time.isoformat()
else:
return input_time.strftime(self.time_format_string)
def _generate_name_dict_from_comment(self, comment: Comment) -> dict:
comment_attributes = {
'title': comment.submission.title,
'subreddit': comment.subreddit.display_name,
@ -87,7 +92,7 @@ class FileNameFormatter:
'postid': comment.id,
'upvotes': comment.score,
'flair': '',
'date': comment.created_utc,
'date': self._convert_timestamp(comment.created_utc),
}
return comment_attributes
@ -155,9 +160,8 @@ class FileNameFormatter:
result = any([f'{{{key}}}' in test_string.lower() for key in FileNameFormatter.key_terms])
if result:
if 'POSTID' not in test_string:
logger.warning(
'Some files might not be downloaded due to name conflicts as filenames are'
' not guaranteed to be be unique without {POSTID}')
logger.warning('Some files might not be downloaded due to name conflicts as filenames are'
' not guaranteed to be be unique without {POSTID}')
return True
else:
return False

View file

@ -81,7 +81,7 @@ class OAuth2Authenticator:
return client
@staticmethod
def send_message(client: socket.socket, message: str):
def send_message(client: socket.socket, message: str = ''):
client.send(f'HTTP/1.1 200 OK\r\n\r\n{message}'.encode('utf-8'))
client.close()

View file

@ -8,7 +8,7 @@ from typing import Optional
import requests
from praw.models import Submission
from bdfr.exceptions import ResourceNotFound
from bdfr.exceptions import ResourceNotFound, SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
@ -27,7 +27,11 @@ class BaseDownloader(ABC):
@staticmethod
def retrieve_url(url: str, cookies: dict = None, headers: dict = None) -> requests.Response:
res = requests.get(url, cookies=cookies, headers=headers)
try:
res = requests.get(url, cookies=cookies, headers=headers)
except requests.exceptions.RequestException as e:
logger.exception(e)
raise SiteDownloaderError(f'Failed to get page {url}')
if res.status_code != 200:
raise ResourceNotFound(f'Server responded with {res.status_code} to {url}')
return res

View file

@ -9,13 +9,12 @@ from bdfr.exceptions import NotADownloadableLinkError
from bdfr.site_downloaders.base_downloader import BaseDownloader
from bdfr.site_downloaders.direct import Direct
from bdfr.site_downloaders.erome import Erome
from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import YoutubeDlFallback
from bdfr.site_downloaders.gallery import Gallery
from bdfr.site_downloaders.gfycat import Gfycat
from bdfr.site_downloaders.gif_delivery_network import GifDeliveryNetwork
from bdfr.site_downloaders.imgur import Imgur
from bdfr.site_downloaders.redgifs import Redgifs
from bdfr.site_downloaders.self_post import SelfPost
from bdfr.site_downloaders.vreddit import VReddit
from bdfr.site_downloaders.youtube import Youtube
@ -33,22 +32,21 @@ class DownloadFactory:
return Gallery
elif re.match(r'gfycat\.', sanitised_url):
return Gfycat
elif re.match(r'gifdeliverynetwork', sanitised_url):
return GifDeliveryNetwork
elif re.match(r'(m\.)?imgur.*', sanitised_url):
return Imgur
elif re.match(r'redgifs.com', sanitised_url):
elif re.match(r'(redgifs|gifdeliverynetwork)', sanitised_url):
return Redgifs
elif re.match(r'reddit\.com/r/', sanitised_url):
return SelfPost
elif re.match(r'v\.redd\.it', sanitised_url):
return VReddit
elif re.match(r'(m\.)?youtu\.?be', sanitised_url):
return Youtube
elif re.match(r'i\.redd\.it.*', sanitised_url):
return Direct
elif YoutubeDlFallback.can_handle_link(sanitised_url):
return YoutubeDlFallback
else:
raise NotADownloadableLinkError(f'No downloader module exists for url {url}')
raise NotADownloadableLinkError(
f'No downloader module exists for url {url}')
@staticmethod
def _sanitise_url(url: str) -> str:

View file

@ -0,0 +1,15 @@
#!/usr/bin/env python3
# coding=utf-8
from abc import ABC, abstractmethod
from bdfr.site_downloaders.base_downloader import BaseDownloader
class BaseFallbackDownloader(BaseDownloader, ABC):
@staticmethod
@abstractmethod
def can_handle_link(url: str) -> bool:
"""Returns whether the fallback downloader can download this link"""
raise NotImplementedError

View file

@ -0,0 +1,40 @@
#!/usr/bin/env python3
# coding=utf-8
import logging
from typing import Optional
import youtube_dl
from praw.models import Submission
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.fallback_downloaders.fallback_downloader import BaseFallbackDownloader
from bdfr.site_downloaders.youtube import Youtube
logger = logging.getLogger(__name__)
class YoutubeDlFallback(BaseFallbackDownloader, Youtube):
def __init__(self, post: Submission):
super(YoutubeDlFallback, self).__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
out = super()._download_video({})
return [out]
@staticmethod
def can_handle_link(url: str) -> bool:
yt_logger = logging.getLogger('youtube-dl')
yt_logger.setLevel(logging.CRITICAL)
with youtube_dl.YoutubeDL({
'logger': yt_logger,
}) as ydl:
try:
result = ydl.extract_info(url, download=False)
if result:
return True
except Exception as e:
logger.exception(e)
return False
return False

View file

@ -10,10 +10,10 @@ from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.gif_delivery_network import GifDeliveryNetwork
from bdfr.site_downloaders.redgifs import Redgifs
class Gfycat(GifDeliveryNetwork):
class Gfycat(Redgifs):
def __init__(self, post: Submission):
super().__init__(post)
@ -26,15 +26,15 @@ class Gfycat(GifDeliveryNetwork):
url = 'https://gfycat.com/' + gfycat_id
response = Gfycat.retrieve_url(url)
if 'gifdeliverynetwork' in response.url:
return GifDeliveryNetwork._get_link(url)
if re.search(r'(redgifs|gifdeliverynetwork)', response.url):
return Redgifs._get_link(url)
soup = BeautifulSoup(response.text, 'html.parser')
content = soup.find('script', attrs={'data-react-helmet': 'true', 'type': 'application/ld+json'})
try:
out = json.loads(content.contents[0])['video']['contentUrl']
except (IndexError, KeyError) as e:
except (IndexError, KeyError, AttributeError) as e:
raise SiteDownloaderError(f'Failed to download Gfycat link {url}: {e}')
except json.JSONDecodeError as e:
raise SiteDownloaderError(f'Did not receive valid JSON data: {e}')

View file

@ -1,36 +0,0 @@
#!/usr/bin/env python3
from typing import Optional
import json
from bs4 import BeautifulSoup
from praw.models import Submission
from bdfr.exceptions import NotADownloadableLinkError, SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.base_downloader import BaseDownloader
class GifDeliveryNetwork(BaseDownloader):
def __init__(self, post: Submission):
super().__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
media_url = self._get_link(self.post.url)
return [Resource(self.post, media_url, '.mp4')]
@staticmethod
def _get_link(url: str) -> str:
page = GifDeliveryNetwork.retrieve_url(url)
soup = BeautifulSoup(page.text, 'html.parser')
content = soup.find('script', attrs={'data-react-helmet': 'true', 'type': 'application/ld+json'})
try:
content = json.loads(content.string)
out = content['video']['contentUrl']
except (json.JSONDecodeError, KeyError, TypeError):
raise SiteDownloaderError('Could not find source link')
return out

View file

@ -7,7 +7,7 @@ from typing import Optional
import bs4
from praw.models import Submission
from bdfr.exceptions import NotADownloadableLinkError, SiteDownloaderError
from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.base_downloader import BaseDownloader
@ -26,12 +26,12 @@ class Imgur(BaseDownloader):
if 'album_images' in self.raw_data:
images = self.raw_data['album_images']
for image in images['images']:
out.append(self._download_image(image))
out.append(self._compute_image_url(image))
else:
out.append(self._download_image(self.raw_data))
out.append(self._compute_image_url(self.raw_data))
return out
def _download_image(self, image: dict) -> Resource:
def _compute_image_url(self, image: dict) -> Resource:
image_url = 'https://i.imgur.com/' + image['hash'] + self._validate_extension(image['ext'])
return Resource(self.post, image_url)

View file

@ -7,18 +7,19 @@ from typing import Optional
from bs4 import BeautifulSoup
from praw.models import Submission
from bdfr.exceptions import NotADownloadableLinkError, SiteDownloaderError
from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.gif_delivery_network import GifDeliveryNetwork
from bdfr.site_downloaders.base_downloader import BaseDownloader
class Redgifs(GifDeliveryNetwork):
class Redgifs(BaseDownloader):
def __init__(self, post: Submission):
super().__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
return super().find_resources(authenticator)
media_url = self._get_link(self.post.url)
return [Resource(self.post, media_url, '.mp4')]
@staticmethod
def _get_link(url: str) -> str:
@ -27,24 +28,19 @@ class Redgifs(GifDeliveryNetwork):
except AttributeError:
raise SiteDownloaderError(f'Could not extract Redgifs ID from {url}')
url = 'https://redgifs.com/watch/' + redgif_id
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'
' Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/90.0.4430.93 Safari/537.36',
}
page = Redgifs.retrieve_url(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
content = soup.find('script', attrs={'data-react-helmet': 'true', 'type': 'application/ld+json'})
content = Redgifs.retrieve_url(f'https://api.redgifs.com/v1/gfycats/{redgif_id}', headers=headers)
if content is None:
raise SiteDownloaderError('Could not read the page source')
try:
out = json.loads(content.contents[0])['video']['contentUrl']
except (IndexError, KeyError):
out = json.loads(content.text)['gfyItem']['mp4Url']
except (KeyError, AttributeError):
raise SiteDownloaderError('Failed to find JSON data in page')
except json.JSONDecodeError as e:
raise SiteDownloaderError(f'Received data was not valid JSON: {e}')

View file

@ -1,21 +0,0 @@
#!/usr/bin/env python3
import logging
from typing import Optional
from praw.models import Submission
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.youtube import Youtube
logger = logging.getLogger(__name__)
class VReddit(Youtube):
def __init__(self, post: Submission):
super().__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
out = super()._download_video({})
return [out]

View file

@ -30,7 +30,10 @@ class Youtube(BaseDownloader):
return [out]
def _download_video(self, ytdl_options: dict) -> Resource:
yt_logger = logging.getLogger('youtube-dl')
yt_logger.setLevel(logging.CRITICAL)
ytdl_options['quiet'] = True
ytdl_options['logger'] = yt_logger
with tempfile.TemporaryDirectory() as temp_dir:
download_path = Path(temp_dir).resolve()
ytdl_options['outtmpl'] = str(download_path) + '/' + 'test.%(ext)s'

2
devscripts/configure.ps1 Normal file
View file

@ -0,0 +1,2 @@
copy .\\bdfr\\default_config.cfg .\\test_config.cfg
echo "`nuser_token = $env:REDDIT_TOKEN" >> ./test_config.cfg

2
devscripts/configure.sh Executable file
View file

@ -0,0 +1,2 @@
cp ./bdfr/default_config.cfg ./test_config.cfg
echo -e "\nuser_token = $REDDIT_TOKEN" >> ./test_config.cfg

69
scripts/README.md Normal file
View file

@ -0,0 +1,69 @@
# Useful Scripts
Due to the verboseness of the logs, a great deal of information can be gathered quite easily from the BDFR's logfiles. In this folder, there is a selection of scripts that parse these logs, scraping useful bits of information. Since the logfiles are recurring patterns of strings, it is a fairly simple matter to write scripts that utilise tools included on most Linux systems.
- [Script to extract all successfully downloaded IDs](#extract-all-successfully-downloaded-ids)
- [Script to extract all failed download IDs](#extract-all-failed-ids)
- [Timestamp conversion](#converting-bdfrv1-timestamps-to-bdfrv2-timestamps)
- [Printing summary statistics for a run](#printing-summary-statistics)
## Extract all Successfully Downloaded IDs
This script is contained [here](extract_successful_ids.sh) and will result in a file that contains the IDs of everything that was successfully downloaded without an error. That is, a list will be created of submissions that, with the `--exclude-id-file` option, can be used so that the BDFR will not attempt to redownload these submissions/comments. This is likely to cause a performance increase, especially when the BDFR run finds many resources.
The script can be used with the following signature:
```bash
./extract_successful_ids.sh LOGFILE_LOCATION <OUTPUT_FILE>
```
By default, if the second argument is not supplied, the script will write the results to `successful.txt`.
An example of the script being run on a Linux machine is the following:
```bash
./extract_successful_ids.sh ~/.config/bdfr/log_output.txt
```
## Extract all Failed IDs
[This script](extract_failed_ids.sh) will output a file of all IDs that failed to be downloaded from the logfile in question. This may be used to prevent subsequent runs of the BDFR from re-attempting those submissions if that is desired, potentially increasing performance.
The script can be used with the following signature:
```bash
./extract_failed_ids.sh LOGFILE_LOCATION <OUTPUT_FILE>
```
By default, if the second argument is not supplied, the script will write the results to `failed.txt`.
An example of the script being run on a Linux machine is the following:
```bash
./extract_failed_ids.sh ~/.config/bdfr/log_output.txt
```
## Converting BDFRv1 Timestamps to BDFRv2 Timestamps
BDFRv2 uses an internationally recognised and standardised format for timestamps, namely ISO 8601. This is highly recommended due to the nature of using such a widespread and understood standard. However, the BDFRv1 does not use this standard. Due to this, if you've used the old timestamp in filenames or folders, the BDFR will no longer recognise them as the same file and potentially redownload duplicate resources.
To prevent this, it is recommended that you rename existing files to ISO 8601 standard. This can be done using the [timestamp-converter](https://github.com/Serene-Arc/timestamp-converter) tool made for this purpose. Instructions specifically for the BDFR are available in that project.
## Printing Summary Statistics
A simple script has been included to print sumamry statistics for a run of the BDFR. This is mainly to showcase how easy it is to extract statistics from the logfiles. You can extend this quite easily. For example, you can print how often the Imgur module is used, or how many 404 errors there are in the last run, or which module has caused the most errors. The possibilities really are endless.
```bash
./print_summary.sh LOGFILE_LOCATION
```
This will create an output like the following:
```
Downloaded submissions: 250
Failed downloads: 103
Files already downloaded: 20073
Hard linked submissions: 30
Excluded submissions: 1146
Files with existing hash skipped: 0
Submissions from excluded subreddits: 0
```

18
scripts/extract_failed_ids.sh Executable file
View file

@ -0,0 +1,18 @@
#!/bin/bash
if [ -e "$1" ]; then
file="$1"
else
echo 'CANNOT FIND LOG FILE'
exit 1
fi
if [ -n "$2" ]; then
output="$2"
echo "Outputting IDs to $output"
else
output="failed.txt"
fi
grep 'Could not download submission' "$file" | awk '{ print $12 }' | rev | cut -c 2- | rev >>"$output"
grep 'Failed to download resource' "$file" | awk '{ print $15 }' >>"$output"

View file

@ -0,0 +1,17 @@
#!/bin/bash
if [ -e "$1" ]; then
file="$1"
else
echo 'CANNOT FIND LOG FILE'
exit 1
fi
if [ -n "$2" ]; then
output="$2"
echo "Outputting IDs to $output"
else
output="successful.txt"
fi
grep 'Downloaded submission' "$file" | awk '{ print $(NF-2) }' >> "$output"

16
scripts/print_summary.sh Executable file
View file

@ -0,0 +1,16 @@
#!/bin/bash
if [ -e "$1" ]; then
file="$1"
else
echo 'CANNOT FIND LOG FILE'
exit 1
fi
echo "Downloaded submissions: $( grep -c 'Downloaded submission' "$file" )"
echo "Failed downloads: $( grep -c 'failed to download submission' "$file" )"
echo "Files already downloaded: $( grep -c 'already exists, continuing' "$file" )"
echo "Hard linked submissions: $( grep -c 'Hard link made' "$file" )"
echo "Excluded submissions: $( grep -c 'in exclusion list' "$file" )"
echo "Files with existing hash skipped: $( grep -c 'downloaded elsewhere' "$file" )"
echo "Submissions from excluded subreddits: $( grep -c 'in skip list' "$file" )"

View file

@ -4,7 +4,7 @@ description_file = README.md
description_content_type = text/markdown
home_page = https://github.com/aliparlakci/bulk-downloader-for-reddit
keywords = reddit, download, archive
version = 2.0.3
version = 2.1.0
author = Ali Parlakci
author_email = parlakciali@gmail.com
maintainer = Serene Arc
@ -16,7 +16,6 @@ classifiers =
Natural Language :: English
Environment :: Console
Operating System :: OS Independent
requires_python = >=3.9
platforms = any
[files]

View file

@ -3,4 +3,4 @@
from setuptools import setup
setup(setup_requires=['pbr', 'appdirs'], pbr=True, data_files=[('config', ['bdfr/default_config.cfg'])])
setup(setup_requires=['pbr', 'appdirs'], pbr=True, data_files=[('config', ['bdfr/default_config.cfg'])], python_requires='>=3.9.0')

View file

@ -13,7 +13,11 @@ from bdfr.oauth2 import OAuth2TokenManager
@pytest.fixture(scope='session')
def reddit_instance():
rd = praw.Reddit(client_id='U-6gk4ZCh3IeNQ', client_secret='7CZHY6AmKweZME5s50SfDGylaPg', user_agent='test')
rd = praw.Reddit(
client_id='U-6gk4ZCh3IeNQ',
client_secret='7CZHY6AmKweZME5s50SfDGylaPg',
user_agent='test',
)
return rd
@ -27,8 +31,10 @@ def authenticated_reddit_instance():
if not cfg_parser.has_option('DEFAULT', 'user_token'):
pytest.skip('Refresh token must be provided to authenticate with OAuth2')
token_manager = OAuth2TokenManager(cfg_parser, test_config_path)
reddit_instance = praw.Reddit(client_id=cfg_parser.get('DEFAULT', 'client_id'),
client_secret=cfg_parser.get('DEFAULT', 'client_secret'),
user_agent=socket.gethostname(),
token_manager=token_manager)
reddit_instance = praw.Reddit(
client_id=cfg_parser.get('DEFAULT', 'client_id'),
client_secret=cfg_parser.get('DEFAULT', 'client_secret'),
user_agent=socket.gethostname(),
token_manager=token_manager,
)
return reddit_instance

View file

@ -0,0 +1,37 @@
#!/usr/bin/env python3
from unittest.mock import MagicMock
import pytest
from bdfr.resource import Resource
from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import YoutubeDlFallback
@pytest.mark.online
@pytest.mark.parametrize(('test_url', 'expected'), (
('https://www.reddit.com/r/specializedtools/comments/n2nw5m/bamboo_splitter/', True),
('https://www.youtube.com/watch?v=P19nvJOmqCc', True),
('https://www.example.com/test', False),
))
def test_can_handle_link(test_url: str, expected: bool):
result = YoutubeDlFallback.can_handle_link(test_url)
assert result == expected
@pytest.mark.online
@pytest.mark.slow
@pytest.mark.parametrize(('test_url', 'expected_hash'), (
('https://streamable.com/dt46y', '1e7f4928e55de6e3ca23d85cc9246bbb'),
('https://streamable.com/t8sem', '49b2d1220c485455548f1edbc05d4ecf'),
('https://www.reddit.com/r/specializedtools/comments/n2nw5m/bamboo_splitter/', '21968d3d92161ea5e0abdcaf6311b06c'),
('https://v.redd.it/9z1dnk3xr5k61', '351a2b57e888df5ccbc508056511f38d'),
))
def test_find_resources(test_url: str, expected_hash: str):
test_submission = MagicMock()
test_submission.url = test_url
downloader = YoutubeDlFallback(test_submission)
resources = downloader.find_resources()
assert len(resources) == 1
assert isinstance(resources[0], Resource)
assert resources[0].hash.hexdigest() == expected_hash

View file

@ -9,18 +9,17 @@ from bdfr.site_downloaders.base_downloader import BaseDownloader
from bdfr.site_downloaders.direct import Direct
from bdfr.site_downloaders.download_factory import DownloadFactory
from bdfr.site_downloaders.erome import Erome
from bdfr.site_downloaders.fallback_downloaders.youtubedl_fallback import YoutubeDlFallback
from bdfr.site_downloaders.gallery import Gallery
from bdfr.site_downloaders.gfycat import Gfycat
from bdfr.site_downloaders.gif_delivery_network import GifDeliveryNetwork
from bdfr.site_downloaders.imgur import Imgur
from bdfr.site_downloaders.redgifs import Redgifs
from bdfr.site_downloaders.self_post import SelfPost
from bdfr.site_downloaders.vreddit import VReddit
from bdfr.site_downloaders.youtube import Youtube
@pytest.mark.online
@pytest.mark.parametrize(('test_submission_url', 'expected_class'), (
('https://v.redd.it/9z1dnk3xr5k61', VReddit),
('https://www.reddit.com/r/TwoXChromosomes/comments/lu29zn/i_refuse_to_live_my_life'
'_in_anything_but_comfort/', SelfPost),
('https://i.imgur.com/bZx1SJQ.jpg', Direct),
@ -35,12 +34,16 @@ from bdfr.site_downloaders.youtube import Youtube
('https://www.erome.com/a/NWGw0F09', Erome),
('https://youtube.com/watch?v=Gv8Wz74FjVA', Youtube),
('https://redgifs.com/watch/courageousimpeccablecanvasback', Redgifs),
('https://www.gifdeliverynetwork.com/repulsivefinishedandalusianhorse', GifDeliveryNetwork),
('https://www.gifdeliverynetwork.com/repulsivefinishedandalusianhorse', Redgifs),
('https://youtu.be/DevfjHOhuFc', Youtube),
('https://m.youtube.com/watch?v=kr-FeojxzUM', Youtube),
('https://i.imgur.com/3SKrQfK.jpg?1', Direct),
('https://dynasty-scans.com/system/images_images/000/017/819/original/80215103_p0.png?1612232781', Direct),
('https://m.imgur.com/a/py3RW0j', Imgur),
('https://v.redd.it/9z1dnk3xr5k61', YoutubeDlFallback),
('https://streamable.com/dt46y', YoutubeDlFallback),
('https://vimeo.com/channels/31259/53576664', YoutubeDlFallback),
('http://video.pbs.org/viralplayer/2365173446/', YoutubeDlFallback),
))
def test_factory_lever_good(test_submission_url: str, expected_class: BaseDownloader, reddit_instance: praw.Reddit):
result = DownloadFactory.pull_lever(test_submission_url)

View file

@ -1,37 +0,0 @@
#!/usr/bin/env python3
# coding=utf-8
from unittest.mock import Mock
import pytest
from bdfr.resource import Resource
from bdfr.site_downloaders.gif_delivery_network import GifDeliveryNetwork
@pytest.mark.online
@pytest.mark.parametrize(('test_url', 'expected'), (
('https://www.gifdeliverynetwork.com/regalshoddyhorsechestnutleafminer',
'https://thumbs2.redgifs.com/RegalShoddyHorsechestnutleafminer.mp4'),
('https://www.gifdeliverynetwork.com/maturenexthippopotamus',
'https://thumbs2.redgifs.com/MatureNextHippopotamus.mp4'),
))
def test_get_link(test_url: str, expected: str):
result = GifDeliveryNetwork._get_link(test_url)
assert result == expected
@pytest.mark.online
@pytest.mark.parametrize(('test_url', 'expected_hash'), (
('https://www.gifdeliverynetwork.com/maturenexthippopotamus', '9bec0a9e4163a43781368ed5d70471df'),
('https://www.gifdeliverynetwork.com/regalshoddyhorsechestnutleafminer', '8afb4e2c090a87140230f2352bf8beba'),
))
def test_download_resource(test_url: str, expected_hash: str):
mock_submission = Mock()
mock_submission.url = test_url
test_site = GifDeliveryNetwork(mock_submission)
resources = test_site.find_resources()
assert len(resources) == 1
assert isinstance(resources[0], Resource)
resources[0].download(120)
assert resources[0].hash.hexdigest() == expected_hash

View file

@ -15,6 +15,10 @@ from bdfr.site_downloaders.redgifs import Redgifs
'https://thumbs2.redgifs.com/FrighteningVictoriousSalamander.mp4'),
('https://redgifs.com/watch/springgreendecisivetaruca',
'https://thumbs2.redgifs.com/SpringgreenDecisiveTaruca.mp4'),
('https://www.gifdeliverynetwork.com/regalshoddyhorsechestnutleafminer',
'https://thumbs2.redgifs.com/RegalShoddyHorsechestnutleafminer.mp4'),
('https://www.gifdeliverynetwork.com/maturenexthippopotamus',
'https://thumbs2.redgifs.com/MatureNextHippopotamus.mp4'),
))
def test_get_link(test_url: str, expected: str):
result = Redgifs._get_link(test_url)
@ -25,6 +29,8 @@ def test_get_link(test_url: str, expected: str):
@pytest.mark.parametrize(('test_url', 'expected_hash'), (
('https://redgifs.com/watch/frighteningvictorioussalamander', '4007c35d9e1f4b67091b5f12cffda00a'),
('https://redgifs.com/watch/springgreendecisivetaruca', '8dac487ac49a1f18cc1b4dabe23f0869'),
('https://www.gifdeliverynetwork.com/maturenexthippopotamus', '9bec0a9e4163a43781368ed5d70471df'),
('https://www.gifdeliverynetwork.com/regalshoddyhorsechestnutleafminer', '8afb4e2c090a87140230f2352bf8beba'),
))
def test_download_resource(test_url: str, expected_hash: str):
mock_submission = Mock()

View file

@ -1,23 +0,0 @@
#!/usr/bin/env python3
# coding=utf-8
import praw
import pytest
from bdfr.resource import Resource
from bdfr.site_downloaders.vreddit import VReddit
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize(('test_submission_id'), (
('lu8l8g'),
))
def test_find_resources(test_submission_id: str, reddit_instance: praw.Reddit):
test_submission = reddit_instance.submission(id=test_submission_id)
downloader = VReddit(test_submission)
resources = downloader.find_resources()
assert len(resources) == 1
assert isinstance(resources[0], Resource)
resources[0].download(120)
assert resources[0].content is not None

View file

@ -1,9 +1,12 @@
#!/usr/bin/env python3
# coding=utf-8
from unittest.mock import MagicMock
import pytest
from bdfr.download_filter import DownloadFilter
from bdfr.resource import Resource
@pytest.fixture()
@ -11,13 +14,14 @@ def download_filter() -> DownloadFilter:
return DownloadFilter(['mp4', 'mp3'], ['test.com', 'reddit.com'])
@pytest.mark.parametrize(('test_url', 'expected'), (
('test.mp4', False),
('test.avi', True),
('test.random.mp3', False),
@pytest.mark.parametrize(('test_extension', 'expected'), (
('.mp4', False),
('.avi', True),
('.random.mp3', False),
('mp4', False),
))
def test_filter_extension(test_url: str, expected: bool, download_filter: DownloadFilter):
result = download_filter._check_extension(test_url)
def test_filter_extension(test_extension: str, expected: bool, download_filter: DownloadFilter):
result = download_filter._check_extension(test_extension)
assert result == expected
@ -42,7 +46,8 @@ def test_filter_domain(test_url: str, expected: bool, download_filter: DownloadF
('http://reddit.com/test.gif', False),
))
def test_filter_all(test_url: str, expected: bool, download_filter: DownloadFilter):
result = download_filter.check_url(test_url)
test_resource = Resource(MagicMock(), test_url)
result = download_filter.check_resource(test_resource)
assert result == expected
@ -54,5 +59,6 @@ def test_filter_all(test_url: str, expected: bool, download_filter: DownloadFilt
))
def test_filter_empty_filter(test_url: str):
download_filter = DownloadFilter()
result = download_filter.check_url(test_url)
test_resource = Resource(MagicMock(), test_url)
result = download_filter.check_resource(test_resource)
assert result is True

View file

@ -22,6 +22,7 @@ from bdfr.site_authenticator import SiteAuthenticator
@pytest.fixture()
def args() -> Configuration:
args = Configuration()
args.time_format = 'ISO'
return args
@ -458,3 +459,75 @@ def test_read_excluded_submission_ids_from_file(downloader_mock: MagicMock, tmp_
downloader_mock.args.exclude_id_file = [test_file]
results = RedditDownloader._read_excluded_ids(downloader_mock)
assert results == {'aaaaaa', 'bbbbbb'}
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize('test_redditor_name', (
'Paracortex',
'crowdstrike',
'HannibalGoddamnit',
))
def test_check_user_existence_good(
test_redditor_name: str,
reddit_instance: praw.Reddit,
downloader_mock: MagicMock,
):
downloader_mock.reddit_instance = reddit_instance
RedditDownloader._check_user_existence(downloader_mock, test_redditor_name)
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize('test_redditor_name', (
'lhnhfkuhwreolo',
'adlkfmnhglojh',
))
def test_check_user_existence_nonexistent(
test_redditor_name: str,
reddit_instance: praw.Reddit,
downloader_mock: MagicMock,
):
downloader_mock.reddit_instance = reddit_instance
with pytest.raises(BulkDownloaderException, match='Could not find'):
RedditDownloader._check_user_existence(downloader_mock, test_redditor_name)
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize('test_redditor_name', (
'Bree-Boo',
))
def test_check_user_existence_banned(
test_redditor_name: str,
reddit_instance: praw.Reddit,
downloader_mock: MagicMock,
):
downloader_mock.reddit_instance = reddit_instance
with pytest.raises(BulkDownloaderException, match='is banned'):
RedditDownloader._check_user_existence(downloader_mock, test_redditor_name)
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize(('test_subreddit_name', 'expected_message'), (
('donaldtrump', 'cannot be found'),
('submitters', 'private and cannot be scraped')
))
def test_check_subreddit_status_bad(test_subreddit_name: str, expected_message: str, reddit_instance: praw.Reddit):
test_subreddit = reddit_instance.subreddit(test_subreddit_name)
with pytest.raises(BulkDownloaderException, match=expected_message):
RedditDownloader._check_subreddit_status(test_subreddit)
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize('test_subreddit_name', (
'Python',
'Mindustry',
'TrollXChromosomes',
'all',
))
def test_check_subreddit_status_good(test_subreddit_name: str, reddit_instance: praw.Reddit):
test_subreddit = reddit_instance.subreddit(test_subreddit_name)
RedditDownloader._check_subreddit_status(test_subreddit)

View file

@ -1,9 +1,11 @@
#!/usr/bin/env python3
# coding=utf-8
from datetime import datetime
from pathlib import Path
from typing import Optional
from unittest.mock import MagicMock
import platform
import praw.models
import pytest
@ -21,29 +23,45 @@ def submission() -> MagicMock:
test.id = '12345'
test.score = 1000
test.link_flair_text = 'test_flair'
test.created_utc = 123456789
test.created_utc = datetime(2021, 4, 21, 9, 30, 0).timestamp()
test.__class__ = praw.models.Submission
return test
def do_test_string_equality(result: str, expected: str) -> bool:
if platform.system() == 'Windows':
expected = FileNameFormatter._format_for_windows(expected)
return expected == result
def do_test_path_equality(result: Path, expected: str) -> bool:
if platform.system() == 'Windows':
expected = expected.split('/')
expected = [FileNameFormatter._format_for_windows(part) for part in expected]
expected = Path(*expected)
else:
expected = Path(expected)
return result == expected
@pytest.fixture(scope='session')
def reddit_submission(reddit_instance: praw.Reddit) -> praw.models.Submission:
return reddit_instance.submission(id='lgilgt')
@pytest.mark.parametrize(('format_string', 'expected'), (
@pytest.mark.parametrize(('test_format_string', 'expected'), (
('{SUBREDDIT}', 'randomreddit'),
('{REDDITOR}', 'person'),
('{POSTID}', '12345'),
('{UPVOTES}', '1000'),
('{FLAIR}', 'test_flair'),
('{DATE}', '123456789'),
('{DATE}', '2021-04-21T09:30:00'),
('{REDDITOR}_{TITLE}_{POSTID}', 'person_name_12345'),
('{RANDOM}', '{RANDOM}'),
))
def test_format_name_mock(format_string: str, expected: str, submission: MagicMock):
result = FileNameFormatter._format_name(submission, format_string)
assert result == expected
def test_format_name_mock(test_format_string: str, expected: str, submission: MagicMock):
test_formatter = FileNameFormatter(test_format_string, '', 'ISO')
result = test_formatter._format_name(submission, test_format_string)
assert do_test_string_equality(result, expected)
@pytest.mark.parametrize(('test_string', 'expected'), (
@ -62,7 +80,7 @@ def test_check_format_string_validity(test_string: str, expected: bool):
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.parametrize(('format_string', 'expected'), (
@pytest.mark.parametrize(('test_format_string', 'expected'), (
('{SUBREDDIT}', 'Mindustry'),
('{REDDITOR}', 'Gamer_player_boi'),
('{POSTID}', 'lgilgt'),
@ -70,9 +88,10 @@ def test_check_format_string_validity(test_string: str, expected: bool):
('{SUBREDDIT}_{TITLE}', 'Mindustry_Toxopid that is NOT humane >:('),
('{REDDITOR}_{TITLE}_{POSTID}', 'Gamer_player_boi_Toxopid that is NOT humane >:(_lgilgt')
))
def test_format_name_real(format_string: str, expected: str, reddit_submission: praw.models.Submission):
result = FileNameFormatter._format_name(reddit_submission, format_string)
assert result == expected
def test_format_name_real(test_format_string: str, expected: str, reddit_submission: praw.models.Submission):
test_formatter = FileNameFormatter(test_format_string, '', '')
result = test_formatter._format_name(reddit_submission, test_format_string)
assert do_test_string_equality(result, expected)
@pytest.mark.online
@ -100,9 +119,9 @@ def test_format_full(
expected: str,
reddit_submission: praw.models.Submission):
test_resource = Resource(reddit_submission, 'i.reddit.com/blabla.png')
test_formatter = FileNameFormatter(format_string_file, format_string_directory)
test_formatter = FileNameFormatter(format_string_file, format_string_directory, 'ISO')
result = test_formatter.format_path(test_resource, Path('test'))
assert str(result) == expected
assert do_test_path_equality(result, expected)
@pytest.mark.online
@ -117,7 +136,7 @@ def test_format_full_conform(
format_string_file: str,
reddit_submission: praw.models.Submission):
test_resource = Resource(reddit_submission, 'i.reddit.com/blabla.png')
test_formatter = FileNameFormatter(format_string_file, format_string_directory)
test_formatter = FileNameFormatter(format_string_file, format_string_directory, 'ISO')
test_formatter.format_path(test_resource, Path('test'))
@ -137,9 +156,9 @@ def test_format_full_with_index_suffix(
reddit_submission: praw.models.Submission,
):
test_resource = Resource(reddit_submission, 'i.reddit.com/blabla.png')
test_formatter = FileNameFormatter(format_string_file, format_string_directory)
test_formatter = FileNameFormatter(format_string_file, format_string_directory, 'ISO')
result = test_formatter.format_path(test_resource, Path('test'), index)
assert str(result) == expected
assert do_test_path_equality(result, expected)
def test_format_multiple_resources():
@ -151,7 +170,7 @@ def test_format_multiple_resources():
new_mock.source_submission.title = 'test'
new_mock.source_submission.__class__ = praw.models.Submission
mocks.append(new_mock)
test_formatter = FileNameFormatter('{TITLE}', '')
test_formatter = FileNameFormatter('{TITLE}', '', 'ISO')
results = test_formatter.format_resource_paths(mocks, Path('.'))
results = set([str(res[0]) for res in results])
assert results == {'test_1.png', 'test_2.png', 'test_3.png', 'test_4.png'}
@ -195,7 +214,7 @@ def test_shorten_filenames(submission: MagicMock, tmp_path: Path):
submission.subreddit.display_name = 'test'
submission.id = 'BBBBBB'
test_resource = Resource(submission, 'www.example.com/empty', '.jpeg')
test_formatter = FileNameFormatter('{REDDITOR}_{TITLE}_{POSTID}', '{SUBREDDIT}')
test_formatter = FileNameFormatter('{REDDITOR}_{TITLE}_{POSTID}', '{SUBREDDIT}', 'ISO')
result = test_formatter.format_path(test_resource, tmp_path)
result.parent.mkdir(parents=True)
result.touch()
@ -236,7 +255,8 @@ def test_strip_emojies(test_string: str, expected: str):
))
def test_generate_dict_for_submission(test_submission_id: str, expected: dict, reddit_instance: praw.Reddit):
test_submission = reddit_instance.submission(id=test_submission_id)
result = FileNameFormatter._generate_name_dict_from_submission(test_submission)
test_formatter = FileNameFormatter('{TITLE}', '', 'ISO')
result = test_formatter._generate_name_dict_from_submission(test_submission)
assert all([result.get(key) == expected[key] for key in expected.keys()])
@ -252,7 +272,8 @@ def test_generate_dict_for_submission(test_submission_id: str, expected: dict, r
))
def test_generate_dict_for_comment(test_comment_id: str, expected: dict, reddit_instance: praw.Reddit):
test_comment = reddit_instance.comment(id=test_comment_id)
result = FileNameFormatter._generate_name_dict_from_comment(test_comment)
test_formatter = FileNameFormatter('{TITLE}', '', 'ISO')
result = test_formatter._generate_name_dict_from_comment(test_comment)
assert all([result.get(key) == expected[key] for key in expected.keys()])
@ -271,10 +292,10 @@ def test_format_archive_entry_comment(
reddit_instance: praw.Reddit,
):
test_comment = reddit_instance.comment(id=test_comment_id)
test_formatter = FileNameFormatter(test_file_scheme, test_folder_scheme)
test_formatter = FileNameFormatter(test_file_scheme, test_folder_scheme, 'ISO')
test_entry = Resource(test_comment, '', '.json')
result = test_formatter.format_path(test_entry, tmp_path)
assert result.name == expected_name
assert do_test_string_equality(result.name, expected_name)
@pytest.mark.parametrize(('test_folder_scheme', 'expected'), (
@ -287,13 +308,13 @@ def test_multilevel_folder_scheme(
tmp_path: Path,
submission: MagicMock,
):
test_formatter = FileNameFormatter('{POSTID}', test_folder_scheme)
test_formatter = FileNameFormatter('{POSTID}', test_folder_scheme, 'ISO')
test_resource = MagicMock()
test_resource.source_submission = submission
test_resource.extension = '.png'
result = test_formatter.format_path(test_resource, tmp_path)
result = result.relative_to(tmp_path)
assert str(result.parent) == expected
assert do_test_path_equality(result.parent, expected)
assert len(result.parents) == (len(expected.split('/')) + 1)
@ -307,8 +328,9 @@ def test_multilevel_folder_scheme(
))
def test_preserve_emojis(test_name_string: str, expected: str, submission: MagicMock):
submission.title = test_name_string
result = FileNameFormatter._format_name(submission, '{TITLE}')
assert result == expected
test_formatter = FileNameFormatter('{TITLE}', '', 'ISO')
result = test_formatter._format_name(submission, '{TITLE}')
assert do_test_string_equality(result, expected)
@pytest.mark.parametrize(('test_string', 'expected'), (
@ -318,3 +340,27 @@ def test_preserve_emojis(test_name_string: str, expected: str, submission: Magic
def test_convert_unicode_escapes(test_string: str, expected: str):
result = FileNameFormatter._convert_unicode_escapes(test_string)
assert result == expected
@pytest.mark.parametrize(('test_datetime', 'expected'), (
(datetime(2020, 1, 1, 8, 0, 0), '2020-01-01T08:00:00'),
(datetime(2020, 1, 1, 8, 0), '2020-01-01T08:00:00'),
(datetime(2021, 4, 21, 8, 30, 21), '2021-04-21T08:30:21'),
))
def test_convert_timestamp(test_datetime: datetime, expected: str):
test_timestamp = test_datetime.timestamp()
test_formatter = FileNameFormatter('{POSTID}', '', 'ISO')
result = test_formatter._convert_timestamp(test_timestamp)
assert result == expected
@pytest.mark.parametrize(('test_time_format', 'expected'), (
('ISO', '2021-05-02T13:33:00'),
('%Y_%m', '2021_05'),
('%Y-%m-%d', '2021-05-02'),
))
def test_time_string_formats(test_time_format: str, expected: str):
test_time = datetime(2021, 5, 2, 13, 33)
test_formatter = FileNameFormatter('{TITLE}', '', test_time_format)
result = test_formatter._convert_timestamp(test_time.timestamp())
assert result == expected

View file

@ -11,6 +11,28 @@ from bdfr.__main__ import cli
does_test_config_exist = Path('test_config.cfg').exists()
def create_basic_args_for_download_runner(test_args: list[str], tmp_path: Path):
out = [
'download', str(tmp_path),
'-v',
'--config', 'test_config.cfg',
'--log', str(Path(tmp_path, 'test_log.txt')),
] + test_args
return out
def create_basic_args_for_archive_runner(test_args: list[str], tmp_path: Path):
out = [
'archive',
str(tmp_path),
'-v',
'--config', 'test_config.cfg',
'--log', str(Path(tmp_path, 'test_log.txt')),
] + test_args
return out
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.skipif(not does_test_config_exist, reason='A test config file is required for integration tests')
@ -35,7 +57,7 @@ does_test_config_exist = Path('test_config.cfg').exists()
))
def test_cli_download_subreddits(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Added submissions from subreddit ' in result.output
@ -53,7 +75,7 @@ def test_cli_download_subreddits(test_args: list[str], tmp_path: Path):
))
def test_cli_download_links(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
@ -69,7 +91,7 @@ def test_cli_download_links(test_args: list[str], tmp_path: Path):
))
def test_cli_download_multireddit(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Added submissions from multireddit ' in result.output
@ -83,7 +105,7 @@ def test_cli_download_multireddit(test_args: list[str], tmp_path: Path):
))
def test_cli_download_multireddit_nonexistent(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Failed to get submissions for multireddit' in result.output
@ -104,7 +126,7 @@ def test_cli_download_multireddit_nonexistent(test_args: list[str], tmp_path: Pa
))
def test_cli_download_user_data_good(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Downloaded submission ' in result.output
@ -119,7 +141,7 @@ def test_cli_download_user_data_good(test_args: list[str], tmp_path: Path):
))
def test_cli_download_user_data_bad_me_unauthenticated(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'To use "me" as a user, an authenticated Reddit instance must be used' in result.output
@ -134,7 +156,7 @@ def test_cli_download_user_data_bad_me_unauthenticated(test_args: list[str], tmp
def test_cli_download_search_existing(test_args: list[str], tmp_path: Path):
Path(tmp_path, 'test.txt').touch()
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Calculating hashes for' in result.output
@ -145,13 +167,14 @@ def test_cli_download_search_existing(test_args: list[str], tmp_path: Path):
@pytest.mark.skipif(not does_test_config_exist, reason='A test config file is required for integration tests')
@pytest.mark.parametrize('test_args', (
['--subreddit', 'tumblr', '-L', '25', '--skip', 'png', '--skip', 'jpg'],
['--subreddit', 'MaliciousCompliance', '-L', '25', '--skip', 'txt'],
))
def test_cli_download_download_filters(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Download filter removed submission' in result.output
assert 'Download filter removed ' in result.output
@pytest.mark.online
@ -163,7 +186,7 @@ def test_cli_download_download_filters(test_args: list[str], tmp_path: Path):
))
def test_cli_download_long(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
@ -173,11 +196,12 @@ def test_cli_download_long(test_args: list[str], tmp_path: Path):
@pytest.mark.skipif(not does_test_config_exist, reason='A test config file is required for integration tests')
@pytest.mark.parametrize('test_args', (
['-l', 'gstd4hk'],
['-l', 'm2601g'],
['-l', 'm2601g', '-f', 'yaml'],
['-l', 'n60t4c', '-f', 'xml'],
))
def test_cli_archive_single(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['archive', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_archive_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert re.search(r'Writing entry .*? to file in .*? format', result.output)
@ -196,7 +220,7 @@ def test_cli_archive_single(test_args: list[str], tmp_path: Path):
))
def test_cli_archive_subreddit(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['archive', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_archive_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert re.search(r'Writing entry .*? to file in .*? format', result.output)
@ -210,7 +234,7 @@ def test_cli_archive_subreddit(test_args: list[str], tmp_path: Path):
))
def test_cli_archive_all_user_comments(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['archive', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_archive_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
@ -225,7 +249,7 @@ def test_cli_archive_all_user_comments(test_args: list[str], tmp_path: Path):
))
def test_cli_archive_long(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['archive', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_archive_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert re.search(r'Writing entry .*? to file in .*? format', result.output)
@ -239,10 +263,12 @@ def test_cli_archive_long(test_args: list[str], tmp_path: Path):
['--user', 'sdclhgsolgjeroij', '--submitted', '-L', 10],
['--user', 'me', '--upvoted', '-L', 10],
['--user', 'sdclhgsolgjeroij', '--upvoted', '-L', 10],
['--subreddit', 'submitters', '-L', 10], # Private subreddit
['--subreddit', 'donaldtrump', '-L', 10], # Banned subreddit
))
def test_cli_download_soft_fail(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
@ -257,7 +283,7 @@ def test_cli_download_soft_fail(test_args: list[str], tmp_path: Path):
))
def test_cli_download_hard_fail(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code != 0
@ -277,7 +303,7 @@ def test_cli_download_use_default_config(tmp_path: Path):
))
def test_cli_download_links_exclusion(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'in exclusion list' in result.output
@ -293,7 +319,7 @@ def test_cli_download_links_exclusion(test_args: list[str], tmp_path: Path):
))
def test_cli_download_subreddit_exclusion(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'in skip list' in result.output
@ -309,7 +335,7 @@ def test_cli_download_subreddit_exclusion(test_args: list[str], tmp_path: Path):
))
def test_cli_file_scheme_warning(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = ['download', str(tmp_path), '-v', '--config', 'test_config.cfg'] + test_args
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'Some files might not be downloaded due to name conflicts' in result.output