Merge pull request #30 from aliparlakci/SelfDownloader

- Added self post download feature - Made the searching process quicker by writing posts to file at the end of the search - Added long file bug solution to remaining download classes - Updated the README file to make it minimal
2024-05-29 00:19:59 +12:00 · 2018-07-10 02:46:37 +03:00 · 2018-07-10 02:46:37 +03:00 · 975246c7f0
parent b6487e4bb1 4155ec1255
commit 975246c7f0
4 changed files with 113 additions and 48 deletions
--- a/README.md
+++ b/README.md
@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p

 ## Table of Contents

+- [What it can do?](#what-it-can-do)
 - [Requirements](#requirements)
 - [Setting up the script](#setting-up-the-script)
  - [Creating an imgur app](#creating-an-imgur-app)
 - [Program Modes](#program-modes)
-  - [saved mode](#saved-mode)
-  - [submitted mode](#submitted-mode)
-  - [upvoted mode](#upvoted-mode)
-  - [subreddit mode](#subreddit-mode)
-  - [multireddit mode](#multireddit-mode)
-  - [link mode](#link-mode)
-  - [log read mode](#log-read-mode)
 - [Running the script](#running-the-script)
  - [Using the command line arguments](#using-the-command-line-arguments)
  - [Examples](#examples)
 - [FAQ](#faq)
 - [Changelog](#changelog)
-  - [release-1.0.0](#release-100)
+
+## What it can do?
+### It...
+- can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
+- sorts post by hot, top, new and so on
+- downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image
+- skips the existing ones
+- puts post titles to file's name
+- puts every post to its subreddit's folder
+- saves reusable a copy of posts' details that are found so that they can be re-downloaded again
+- logs failed ones in a file to so that you can try to download them later
+- can be run with double-clicking on Windows (but I don't recommend it)

 ## Requirements
 - Python 3.x*
@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl

 ## Program Modes
 All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)  
-### saved mode
-In saved mode, the program gets posts from given user's saved posts.
-### submitted mode
-In submitted mode, the program gets posts from given user's submitted posts.
-### upvoted mode
-In submitted mode, the program gets posts from given user's upvoted posts.
-### subreddit mode
-In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.  
-  
-Multiple subreddits can be given
-  
-*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
-### multireddit mode
-In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.  
-### link mode
-In link mode, the program gets posts from given reddit link.  
-  
-You may customize the behaviour with `--sort`, `--time`, `--limit`.
-  
-*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
-  
-## log read mode
-Two log files are created each time *script.py* runs.
- **POSTS** Saves all the posts without filtering.
- **FAILED** Keeps track of posts that are tried to be downloaded but failed.
-  
-In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again.
-
-Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
+- **saved mode**
+  - Gets posts from given user's saved posts.
+- **submitted mode**
+  - Gets posts from given user's submitted posts.
+- **upvoted mode**
+  - Gets posts from given user's upvoted posts.
+- **subreddit mode**
+  - Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number.
+  - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
+- **multireddit mode**
+  - Gets posts from given user's given multireddit that is sorted by given type and limited by given number.  
+- **link mode**
+  - Gets posts from given reddit link.  
+  - You may customize the behaviour with `--sort`, `--time`, `--limit`.
+  - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
+- **log read mode**
+  - Takes a log file which created by itself (json files), reads posts and tries downloading them again.
+  - Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.

 ## Running the script
-**WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate.  
+**DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.  
  
 ### Using the command line arguments
 If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure).
@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m
  
 Run the script.py file from terminal with command-line arguments. Here is the help page:  
  
-**ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
+Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.

 ```console
 $ py -3 script.py --help
@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json
 ### I can't startup the script no matter what.
 - Try `python3` or `python` or `py -3` as python have real issues about naming their program

+### I can't open the self posts.
+- Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome.
+
 ## Changelog
-### v1.0.0
- Initial release
+### 10/07/2018
+- Added support for *self* post
+- Now getting posts is quicker
--- a/script.py
+++ b/script.py
@ -11,7 +11,7 @@ import sys
 import time
 from pathlib import Path, PurePath

-from src.downloader import Direct, Gfycat, Imgur
+from src.downloader import Direct, Gfycat, Imgur, Self
 from src.parser import LinkDesigner
 from src.searcher import getPosts
 from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
@ -451,7 +451,22 @@ def download(submissions):
                print(exception)
                FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
                downloadedCount -= 1
-                
+        
+        elif submissions[i]['postType'] == 'self':
+            print("SELF")
+            try:
+                Self(directory,submissions[i])
+
+            except FileAlreadyExistsError:
+                print("It already exists")
+                downloadedCount -= 1
+                duplicates += 1
+
+            except Exception as exception:
+                print(exception)
+                FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
+                downloadedCount -= 1
+
        else:
            print("No match found, skipping...")
            downloadedCount -= 1
--- a/src/downloader.py
+++ b/src/downloader.py
@ -1,3 +1,4 @@
+import io
 import os
 import sys
 import urllib.request
@ -16,7 +17,7 @@ except ModuleNotFoundError:
    install("imgurpython")
    from imgurpython import *

-
+VanillaPrint = print
 print = printToFile

 def dlProgress(count, blockSize, totalSize):
@ -294,3 +295,45 @@ class Direct:
            tempDir = directory / (POST['postId']+".tmp")

            getFile(fileDir,tempDir,POST['postURL'])
+
+class Self:
+    def __init__(self,directory,post):
+        if not os.path.exists(directory): os.makedirs(directory)
+
+        title = nameCorrector(post['postTitle'])
+        print(title+"_"+post['postId']+".md")
+
+        fileDir = title+"_"+post['postId']+".md"
+        fileDir = directory / fileDir
+        
+        if Path.is_file(fileDir):
+            raise FileAlreadyExistsError
+            
+        try:
+            self.writeToFile(fileDir,post)
+        except FileNotFoundError:
+            fileDir = post['postId']+".md"
+            fileDir = directory / fileDir
+
+            self.writeToFile(fileDir,post)
+    
+    @staticmethod
+    def writeToFile(directory,post):
+
+        content = ("## ["
+                   + post["postTitle"]
+                   + "]("
+                   + post["postURL"]
+                   + ")\n"
+                   + post["postContent"]
+                   + "\n\n---\n\n"
+                   + "submitted by [u/"
+                   + post["postSubmitter"]
+                   + "](https://www.reddit.com/user/"
+                   + post["postSubmitter"]
+                   + ")")
+
+        with io.open(directory,"w",encoding="utf-8") as FILE:
+            VanillaPrint(content,file=FILE)
+        
+        print("Downloaded")
--- a/src/searcher.py
+++ b/src/searcher.py
@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False):
    imgurCount = 0
    global directCount
    directCount = 0
+    global selfCount
+    selfCount = 0
+
+    allPosts = {}

    postsFile = createLogFile("POSTS")

@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False):
                printSubmission(submission,subCount,orderCount)
                subList.append(details)

-            postsFile.add({subCount:[details]})
+            allPosts = {**allPosts,**details}
+        
+        postsFile.add(allPosts)

    if not len(subList) == 0:    
        print(
            "\nTotal of {} submissions found!\n"\
-            "{} GFYCATs, {} IMGURs and {} DIRECTs\n"
-            .format(len(subList),gfycatCount,imgurCount,directCount)
+            "{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n"
+            .format(len(subList),gfycatCount,imgurCount,directCount,selfCount)
        )
        return subList
    else:
@ -372,6 +378,7 @@ def checkIfMatching(submission):
    global gfycatCount
    global imgurCount
    global directCount
+    global selfCount

    try:
        details = {'postId':submission.id,
@ -397,13 +404,15 @@ def checkIfMatching(submission):
            imgurCount += 1
            return details

-    elif isDirectLink(submission.url) is True:
+    elif isDirectLink(submission.url):
        details['postType'] = 'direct'
        directCount += 1
        return details

    elif submission.is_self:
        details['postType'] = 'self'
+        details['postContent'] = submission.selftext
+        selfCount += 1
        return details

 def printSubmission(SUB,validNumber,totalNumber):