Merge pull request #30 from aliparlakci/SelfDownloader
- Added self post download feature - Made the searching process quicker by writing posts to file at the end of the search - Added long file bug solution to remaining download classes - Updated the README file to make it minimal
This commit is contained in:
commit
975246c7f0
80
README.md
80
README.md
|
@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p
|
|||
|
||||
## Table of Contents
|
||||
|
||||
- [What it can do?](#what-it-can-do)
|
||||
- [Requirements](#requirements)
|
||||
- [Setting up the script](#setting-up-the-script)
|
||||
- [Creating an imgur app](#creating-an-imgur-app)
|
||||
- [Program Modes](#program-modes)
|
||||
- [saved mode](#saved-mode)
|
||||
- [submitted mode](#submitted-mode)
|
||||
- [upvoted mode](#upvoted-mode)
|
||||
- [subreddit mode](#subreddit-mode)
|
||||
- [multireddit mode](#multireddit-mode)
|
||||
- [link mode](#link-mode)
|
||||
- [log read mode](#log-read-mode)
|
||||
- [Running the script](#running-the-script)
|
||||
- [Using the command line arguments](#using-the-command-line-arguments)
|
||||
- [Examples](#examples)
|
||||
- [FAQ](#faq)
|
||||
- [Changelog](#changelog)
|
||||
- [release-1.0.0](#release-100)
|
||||
|
||||
## What it can do?
|
||||
### It...
|
||||
- can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
|
||||
- sorts post by hot, top, new and so on
|
||||
- downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image
|
||||
- skips the existing ones
|
||||
- puts post titles to file's name
|
||||
- puts every post to its subreddit's folder
|
||||
- saves reusable a copy of posts' details that are found so that they can be re-downloaded again
|
||||
- logs failed ones in a file to so that you can try to download them later
|
||||
- can be run with double-clicking on Windows (but I don't recommend it)
|
||||
|
||||
## Requirements
|
||||
- Python 3.x*
|
||||
|
@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
|
|||
|
||||
## Program Modes
|
||||
All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)
|
||||
### saved mode
|
||||
In saved mode, the program gets posts from given user's saved posts.
|
||||
### submitted mode
|
||||
In submitted mode, the program gets posts from given user's submitted posts.
|
||||
### upvoted mode
|
||||
In submitted mode, the program gets posts from given user's upvoted posts.
|
||||
### subreddit mode
|
||||
In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.
|
||||
|
||||
Multiple subreddits can be given
|
||||
|
||||
*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
|
||||
### multireddit mode
|
||||
In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.
|
||||
### link mode
|
||||
In link mode, the program gets posts from given reddit link.
|
||||
|
||||
You may customize the behaviour with `--sort`, `--time`, `--limit`.
|
||||
|
||||
*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
|
||||
|
||||
## log read mode
|
||||
Two log files are created each time *script.py* runs.
|
||||
- **POSTS** Saves all the posts without filtering.
|
||||
- **FAILED** Keeps track of posts that are tried to be downloaded but failed.
|
||||
|
||||
In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again.
|
||||
|
||||
Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
|
||||
- **saved mode**
|
||||
- Gets posts from given user's saved posts.
|
||||
- **submitted mode**
|
||||
- Gets posts from given user's submitted posts.
|
||||
- **upvoted mode**
|
||||
- Gets posts from given user's upvoted posts.
|
||||
- **subreddit mode**
|
||||
- Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number.
|
||||
- You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
|
||||
- **multireddit mode**
|
||||
- Gets posts from given user's given multireddit that is sorted by given type and limited by given number.
|
||||
- **link mode**
|
||||
- Gets posts from given reddit link.
|
||||
- You may customize the behaviour with `--sort`, `--time`, `--limit`.
|
||||
- You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
|
||||
- **log read mode**
|
||||
- Takes a log file which created by itself (json files), reads posts and tries downloading them again.
|
||||
- Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
|
||||
|
||||
## Running the script
|
||||
**WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate.
|
||||
**DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.
|
||||
|
||||
### Using the command line arguments
|
||||
If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure).
|
||||
|
@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m
|
|||
|
||||
Run the script.py file from terminal with command-line arguments. Here is the help page:
|
||||
|
||||
**ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
|
||||
Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
|
||||
|
||||
```console
|
||||
$ py -3 script.py --help
|
||||
|
@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json
|
|||
### I can't startup the script no matter what.
|
||||
- Try `python3` or `python` or `py -3` as python have real issues about naming their program
|
||||
|
||||
### I can't open the self posts.
|
||||
- Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome.
|
||||
|
||||
## Changelog
|
||||
### v1.0.0
|
||||
- Initial release
|
||||
### 10/07/2018
|
||||
- Added support for *self* post
|
||||
- Now getting posts is quicker
|
||||
|
|
19
script.py
19
script.py
|
@ -11,7 +11,7 @@ import sys
|
|||
import time
|
||||
from pathlib import Path, PurePath
|
||||
|
||||
from src.downloader import Direct, Gfycat, Imgur
|
||||
from src.downloader import Direct, Gfycat, Imgur, Self
|
||||
from src.parser import LinkDesigner
|
||||
from src.searcher import getPosts
|
||||
from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
|
||||
|
@ -451,7 +451,22 @@ def download(submissions):
|
|||
print(exception)
|
||||
FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
|
||||
downloadedCount -= 1
|
||||
|
||||
|
||||
elif submissions[i]['postType'] == 'self':
|
||||
print("SELF")
|
||||
try:
|
||||
Self(directory,submissions[i])
|
||||
|
||||
except FileAlreadyExistsError:
|
||||
print("It already exists")
|
||||
downloadedCount -= 1
|
||||
duplicates += 1
|
||||
|
||||
except Exception as exception:
|
||||
print(exception)
|
||||
FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
|
||||
downloadedCount -= 1
|
||||
|
||||
else:
|
||||
print("No match found, skipping...")
|
||||
downloadedCount -= 1
|
||||
|
|
|
@ -1,3 +1,4 @@
|
|||
import io
|
||||
import os
|
||||
import sys
|
||||
import urllib.request
|
||||
|
@ -16,7 +17,7 @@ except ModuleNotFoundError:
|
|||
install("imgurpython")
|
||||
from imgurpython import *
|
||||
|
||||
|
||||
VanillaPrint = print
|
||||
print = printToFile
|
||||
|
||||
def dlProgress(count, blockSize, totalSize):
|
||||
|
@ -294,3 +295,45 @@ class Direct:
|
|||
tempDir = directory / (POST['postId']+".tmp")
|
||||
|
||||
getFile(fileDir,tempDir,POST['postURL'])
|
||||
|
||||
class Self:
|
||||
def __init__(self,directory,post):
|
||||
if not os.path.exists(directory): os.makedirs(directory)
|
||||
|
||||
title = nameCorrector(post['postTitle'])
|
||||
print(title+"_"+post['postId']+".md")
|
||||
|
||||
fileDir = title+"_"+post['postId']+".md"
|
||||
fileDir = directory / fileDir
|
||||
|
||||
if Path.is_file(fileDir):
|
||||
raise FileAlreadyExistsError
|
||||
|
||||
try:
|
||||
self.writeToFile(fileDir,post)
|
||||
except FileNotFoundError:
|
||||
fileDir = post['postId']+".md"
|
||||
fileDir = directory / fileDir
|
||||
|
||||
self.writeToFile(fileDir,post)
|
||||
|
||||
@staticmethod
|
||||
def writeToFile(directory,post):
|
||||
|
||||
content = ("## ["
|
||||
+ post["postTitle"]
|
||||
+ "]("
|
||||
+ post["postURL"]
|
||||
+ ")\n"
|
||||
+ post["postContent"]
|
||||
+ "\n\n---\n\n"
|
||||
+ "submitted by [u/"
|
||||
+ post["postSubmitter"]
|
||||
+ "](https://www.reddit.com/user/"
|
||||
+ post["postSubmitter"]
|
||||
+ ")")
|
||||
|
||||
with io.open(directory,"w",encoding="utf-8") as FILE:
|
||||
VanillaPrint(content,file=FILE)
|
||||
|
||||
print("Downloaded")
|
||||
|
|
|
@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False):
|
|||
imgurCount = 0
|
||||
global directCount
|
||||
directCount = 0
|
||||
global selfCount
|
||||
selfCount = 0
|
||||
|
||||
allPosts = {}
|
||||
|
||||
postsFile = createLogFile("POSTS")
|
||||
|
||||
|
@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False):
|
|||
printSubmission(submission,subCount,orderCount)
|
||||
subList.append(details)
|
||||
|
||||
postsFile.add({subCount:[details]})
|
||||
allPosts = {**allPosts,**details}
|
||||
|
||||
postsFile.add(allPosts)
|
||||
|
||||
if not len(subList) == 0:
|
||||
print(
|
||||
"\nTotal of {} submissions found!\n"\
|
||||
"{} GFYCATs, {} IMGURs and {} DIRECTs\n"
|
||||
.format(len(subList),gfycatCount,imgurCount,directCount)
|
||||
"{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n"
|
||||
.format(len(subList),gfycatCount,imgurCount,directCount,selfCount)
|
||||
)
|
||||
return subList
|
||||
else:
|
||||
|
@ -372,6 +378,7 @@ def checkIfMatching(submission):
|
|||
global gfycatCount
|
||||
global imgurCount
|
||||
global directCount
|
||||
global selfCount
|
||||
|
||||
try:
|
||||
details = {'postId':submission.id,
|
||||
|
@ -397,13 +404,15 @@ def checkIfMatching(submission):
|
|||
imgurCount += 1
|
||||
return details
|
||||
|
||||
elif isDirectLink(submission.url) is True:
|
||||
elif isDirectLink(submission.url):
|
||||
details['postType'] = 'direct'
|
||||
directCount += 1
|
||||
return details
|
||||
|
||||
elif submission.is_self:
|
||||
details['postType'] = 'self'
|
||||
details['postContent'] = submission.selftext
|
||||
selfCount += 1
|
||||
return details
|
||||
|
||||
def printSubmission(SUB,validNumber,totalNumber):
|
||||
|
|
Loading…
Reference in a new issue