1
0
Fork 0
mirror of synced 2024-05-29 00:19:59 +12:00

Merge pull request #30 from aliparlakci/SelfDownloader

- Added self post download feature
- Made the searching process quicker by writing posts to file at the end of the search
- Added long file bug solution to remaining download classes
- Updated the README file to make it minimal
This commit is contained in:
aliparlakci 2018-07-10 02:46:37 +03:00 committed by GitHub
commit 975246c7f0
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 113 additions and 48 deletions

View file

@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p
## Table of Contents
- [What it can do?](#what-it-can-do)
- [Requirements](#requirements)
- [Setting up the script](#setting-up-the-script)
- [Creating an imgur app](#creating-an-imgur-app)
- [Program Modes](#program-modes)
- [saved mode](#saved-mode)
- [submitted mode](#submitted-mode)
- [upvoted mode](#upvoted-mode)
- [subreddit mode](#subreddit-mode)
- [multireddit mode](#multireddit-mode)
- [link mode](#link-mode)
- [log read mode](#log-read-mode)
- [Running the script](#running-the-script)
- [Using the command line arguments](#using-the-command-line-arguments)
- [Examples](#examples)
- [FAQ](#faq)
- [Changelog](#changelog)
- [release-1.0.0](#release-100)
## What it can do?
### It...
- can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
- sorts post by hot, top, new and so on
- downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image
- skips the existing ones
- puts post titles to file's name
- puts every post to its subreddit's folder
- saves reusable a copy of posts' details that are found so that they can be re-downloaded again
- logs failed ones in a file to so that you can try to download them later
- can be run with double-clicking on Windows (but I don't recommend it)
## Requirements
- Python 3.x*
@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
## Program Modes
All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)
### saved mode
In saved mode, the program gets posts from given user's saved posts.
### submitted mode
In submitted mode, the program gets posts from given user's submitted posts.
### upvoted mode
In submitted mode, the program gets posts from given user's upvoted posts.
### subreddit mode
In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.
Multiple subreddits can be given
*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
### multireddit mode
In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.
### link mode
In link mode, the program gets posts from given reddit link.
You may customize the behaviour with `--sort`, `--time`, `--limit`.
*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
## log read mode
Two log files are created each time *script.py* runs.
- **POSTS** Saves all the posts without filtering.
- **FAILED** Keeps track of posts that are tried to be downloaded but failed.
In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again.
Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
- **saved mode**
- Gets posts from given user's saved posts.
- **submitted mode**
- Gets posts from given user's submitted posts.
- **upvoted mode**
- Gets posts from given user's upvoted posts.
- **subreddit mode**
- Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number.
- You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
- **multireddit mode**
- Gets posts from given user's given multireddit that is sorted by given type and limited by given number.
- **link mode**
- Gets posts from given reddit link.
- You may customize the behaviour with `--sort`, `--time`, `--limit`.
- You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
- **log read mode**
- Takes a log file which created by itself (json files), reads posts and tries downloading them again.
- Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
## Running the script
**WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate.
**DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.
### Using the command line arguments
If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure).
@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m
Run the script.py file from terminal with command-line arguments. Here is the help page:
**ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
```console
$ py -3 script.py --help
@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json
### I can't startup the script no matter what.
- Try `python3` or `python` or `py -3` as python have real issues about naming their program
### I can't open the self posts.
- Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome.
## Changelog
### v1.0.0
- Initial release
### 10/07/2018
- Added support for *self* post
- Now getting posts is quicker

View file

@ -11,7 +11,7 @@ import sys
import time
from pathlib import Path, PurePath
from src.downloader import Direct, Gfycat, Imgur
from src.downloader import Direct, Gfycat, Imgur, Self
from src.parser import LinkDesigner
from src.searcher import getPosts
from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
@ -451,7 +451,22 @@ def download(submissions):
print(exception)
FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
downloadedCount -= 1
elif submissions[i]['postType'] == 'self':
print("SELF")
try:
Self(directory,submissions[i])
except FileAlreadyExistsError:
print("It already exists")
downloadedCount -= 1
duplicates += 1
except Exception as exception:
print(exception)
FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
downloadedCount -= 1
else:
print("No match found, skipping...")
downloadedCount -= 1

View file

@ -1,3 +1,4 @@
import io
import os
import sys
import urllib.request
@ -16,7 +17,7 @@ except ModuleNotFoundError:
install("imgurpython")
from imgurpython import *
VanillaPrint = print
print = printToFile
def dlProgress(count, blockSize, totalSize):
@ -294,3 +295,45 @@ class Direct:
tempDir = directory / (POST['postId']+".tmp")
getFile(fileDir,tempDir,POST['postURL'])
class Self:
def __init__(self,directory,post):
if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(post['postTitle'])
print(title+"_"+post['postId']+".md")
fileDir = title+"_"+post['postId']+".md"
fileDir = directory / fileDir
if Path.is_file(fileDir):
raise FileAlreadyExistsError
try:
self.writeToFile(fileDir,post)
except FileNotFoundError:
fileDir = post['postId']+".md"
fileDir = directory / fileDir
self.writeToFile(fileDir,post)
@staticmethod
def writeToFile(directory,post):
content = ("## ["
+ post["postTitle"]
+ "]("
+ post["postURL"]
+ ")\n"
+ post["postContent"]
+ "\n\n---\n\n"
+ "submitted by [u/"
+ post["postSubmitter"]
+ "](https://www.reddit.com/user/"
+ post["postSubmitter"]
+ ")")
with io.open(directory,"w",encoding="utf-8") as FILE:
VanillaPrint(content,file=FILE)
print("Downloaded")

View file

@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False):
imgurCount = 0
global directCount
directCount = 0
global selfCount
selfCount = 0
allPosts = {}
postsFile = createLogFile("POSTS")
@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False):
printSubmission(submission,subCount,orderCount)
subList.append(details)
postsFile.add({subCount:[details]})
allPosts = {**allPosts,**details}
postsFile.add(allPosts)
if not len(subList) == 0:
print(
"\nTotal of {} submissions found!\n"\
"{} GFYCATs, {} IMGURs and {} DIRECTs\n"
.format(len(subList),gfycatCount,imgurCount,directCount)
"{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n"
.format(len(subList),gfycatCount,imgurCount,directCount,selfCount)
)
return subList
else:
@ -372,6 +378,7 @@ def checkIfMatching(submission):
global gfycatCount
global imgurCount
global directCount
global selfCount
try:
details = {'postId':submission.id,
@ -397,13 +404,15 @@ def checkIfMatching(submission):
imgurCount += 1
return details
elif isDirectLink(submission.url) is True:
elif isDirectLink(submission.url):
details['postType'] = 'direct'
directCount += 1
return details
elif submission.is_self:
details['postType'] = 'self'
details['postContent'] = submission.selftext
selfCount += 1
return details
def printSubmission(SUB,validNumber,totalNumber):