1
0
Fork 0
mirror of synced 2024-05-14 01:12:41 +12:00

Fixing typos, rephrasing sentences for easy reading. (#384)

This commit is contained in:
Yuri Slobodyanyuk 2021-07-02 19:39:44 +03:00 committed by GitHub
parent b5f8d6b028
commit faa5460f2d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 69 additions and 68 deletions

View file

@ -13,28 +13,28 @@
- Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions - Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions
or excluded items with the `*` wildcard or excluded items with the `*` wildcard
- Multiple tools to use: - Multiple tools to use:
- Duplicates - Finds duplicates basing on file name, size, hash, first 1 MB of hash - Duplicates - Finds duplicates based on file name, size, hash, hash of just first 1 MB of a file
- Empty Folders - Finds empty folders with the help of an advanced algorithm - Empty Folders - Finds empty folders with the help of an advanced algorithm
- Big Files - Finds the provided number of the biggest files in given location - Big Files - Finds the provided number of the biggest files in given location
- Empty Files - Looks for empty files across the drive - Empty Files - Looks for empty files across the drive
- Temporary Files - Finds temporary files - Temporary Files - Finds temporary files
- Similar Images - Finds images which are not exactly the same (different resolution, watermarks) - Similar Images - Finds images which are not exactly the same (different resolution, watermarks)
- Zeroed Files - Finds files which are filled with zeros (usually corrupted) - Zeroed Files - Finds files which are filled with zeros (usually corrupted)
- Same Music - Searches for music with same artist, album etc. - Same Music - Searches for music with the same artist, album etc.
- Invalid Symbolic Links - Shows symbolic links which points to non-existent files/directories - Invalid Symbolic Links - Shows symbolic links which point to non-existent files/directories
- Broken Files - Finds files with an invalid extension or that are corrupted - Broken Files - Finds files with an invalid extension or that are corrupted
<!-- The GIF thingy --> <!-- The GIF thingy -->
![Czkawka](https://user-images.githubusercontent.com/41945903/104711404-9cbb7400-5721-11eb-904d-9677c189f7ab.gif) ![Czkawka](https://user-images.githubusercontent.com/41945903/104711404-9cbb7400-5721-11eb-904d-9677c189f7ab.gif)
## How do I use it? ## How do I use it?
You can find an instruction on how to use Czkawka [**here**](instructions/Instruction.md). You can find the instructions on how to use Czkawka [**here**](instructions/Instruction.md).
## Installation ## Installation
Installation instruction with download links you can find [**here**](instructions/Installation.md). Installation instructions with download links you can find [**here**](instructions/Installation.md).
## Compilation ## Compilation
If you want try to develop Czkawka or just use the latest available feature, you may want to look at the [**compilation instruction**](instructions/Compilation.md). If you want to try and develop Czkawka or just use the latest available feature, you may want to look at the [**compilation instruction**](instructions/Compilation.md).
## Benchmarks ## Benchmarks
@ -42,9 +42,9 @@ Since Czkawka is written in Rust and it aims to be a faster alternative to FSlin
I tested it on a 256 GB SSD and a i7-4770 CPU. I tested it on a 256 GB SSD and a i7-4770 CPU.
I prepared a disk and performed a test without any folder exceptions and with disabled ignoring of hard links which contained 363 215 files, took 221,8 GB and had 62093 duplicate files in 31790 groups which took 4,1 GB. I prepared a disk and performed a test without any folder exceptions and with disabled ignoring of hard links which contained 363 215 files, took 221,8 GB and had 62093 duplicate files in 31790 groups which occupied 4,1 GB.
Minimum file size to check I set to 1 KB on all programs. I set the minimal file size to check to 1KB on all programs.
| App | Executing Time | | App | Executing Time |
|:---------------------------:|:--------------:| |:---------------------------:|:--------------:|
@ -67,7 +67,7 @@ I used Mprof for checking memory usage of FSlint and DupeGuru, and Heaptrack for
In Dupeguru I enabled checking images with different dimensions to match Czkawka behavior. In Dupeguru I enabled checking images with different dimensions to match Czkawka behavior.
Both apps use caching mechanism, so second scan is really fast. Both apps use caching mechanism, so second scan is really fast.
Similar images which check 10949 files which took 6.6 GB Similar images which check 10949 files that occupied 6.6 GB
| App | Scan time | | App | Scan time |
|:---------------------------:|:---------:| |:---------------------------:|:---------:|
@ -76,7 +76,7 @@ Similar images which check 10949 files which took 6.6 GB
| DupeGuru 4.1.1 (First Run) | 539s | | DupeGuru 4.1.1 (First Run) | 539s |
| DupeGuru 4.1.1 (Second Run) | 1s | | DupeGuru 4.1.1 (Second Run) | 1s |
Similar images which check 349 image files which took 1.7 GB Similar images which check 349 image files that occupied 1.7 GB
| App | Scan time | | App | Scan time |
|:---------------------------:|:----------| |:---------------------------:|:----------|
@ -87,7 +87,7 @@ Similar images which check 349 image files which took 1.7 GB
## Comparison to other tools ## Comparison to other tools
Bleachbit is a master at finding and removing temporary files, while Czkawka only finds the most basic ones. So this two apps shouldn't be compared directly or be considered as an alternative to the second one. Bleachbit is a master at finding and removing temporary files, while Czkawka only finds the most basic ones. So these two apps shouldn't be compared directly or be considered as an alternative to one another.
| | Czkawka | FSlint | DupeGuru | Bleachbit | | | Czkawka | FSlint | DupeGuru | Bleachbit |
|:----------------------:|:-----------:|:----------:|:-----------------:|:-----------:| |:----------------------:|:-----------:|:----------:|:-----------------:|:-----------:|
@ -118,7 +118,7 @@ Bleachbit is a master at finding and removing temporary files, while Czkawka onl
## Contributions ## Contributions
Contributions to this repository are welcome. Contributions to this repository are welcome.
You can help by creating a: You can help by creating:
- Bug reports - memory leaks, unexpected behavior, crashes - Bug reports - memory leaks, unexpected behavior, crashes
- Feature proposals - proposal to change/add/delete some features - Feature proposals - proposal to change/add/delete some features
- Pull Requests - implementing a new feature yourself or fixing bugs. - Pull Requests - implementing a new feature yourself or fixing bugs.
@ -145,7 +145,7 @@ for a possible change of the name of the program, and the opinions were extremel
## License ## License
Code is distributed under MIT license. Code is distributed under MIT license.
Icon is created by [jannuary](https://github.com/jannuary) and licensed CC-BY-4.0. Icon was created by [jannuary](https://github.com/jannuary) and licensed CC-BY-4.0.
Windows dark theme is used from [AdMin repo](https://github.com/nrhodes91/AdMin) with MIT license. Windows dark theme is used from [AdMin repo](https://github.com/nrhodes91/AdMin) with MIT license.

View file

@ -1,12 +1,12 @@
# Compilation Czkawka from sources # Compiling Czkawka from sources
## Requirements ## Requirements
Program | Min | What for Program | Min | What for
---------|------|------------------------------------------------------------ ---------|------|------------------------------------------------------------
Rust | 1.51 | Czkawka for now, aims to support only the latest stable Rust version Rust | 1.51 | Czkawka, for now, aims to support only the latest stable Rust version
GTK | 3.22 | Only for the `GTK` backend GTK | 3.22 | Only for the `GTK` backend
If you only want the terminal version without a GUI, just skip all packages with `gtk`. If you only want the terminal version without a GUI, just skip all the packages with `gtk` in their names.
#### Debian / Ubuntu #### Debian / Ubuntu
```shell ```shell
@ -22,7 +22,7 @@ sudo yum install gtk3-devel glib2-devel alsa-lib-devel # Latest is optional
``` ```
#### macOS #### macOS
You need to install Rust Homebrew and GTK Libraries You need to install Rust via Homebrew and GTK Libraries
```shell ```shell
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install rustup brew install rustup
@ -60,7 +60,7 @@ cargo run --release --bin czkawka_cli
## Additional features ## Additional features
For now, finding broken audio files is temporary disabled by default, because it crash when not found audio libraries on computer. For now, finding broken audio files is temporary disabled by default, because it crashes when audio libraries are not found on the computer.
I'm waiting for ability to disable audio playback feature, so after that I will be able to re-enable by default this feature (https://github.com/RustAudio/rodio/issues/349) I'm waiting for ability to disable audio playback feature, so after that I will be able to re-enable by default this feature (https://github.com/RustAudio/rodio/issues/349)
To enable checking for broken audio files, just add ` --all-features` To enable checking for broken audio files, just add ` --all-features`

View file

@ -35,13 +35,13 @@ At the end execute it:
``` ```
### Windows ### Windows
By default, all needed libraries are bundled with app inside `windows_czkawka_gui.zip`, but if you compile app or just move `czkawka_gui.exe`, then you need to install the `GTK 3` By default, all needed libraries are bundled with the app, inside `windows_czkawka_gui.zip`, but if you compile the app or just move `czkawka_gui.exe`, then you will need to install the `GTK 3`
runtime from [**here**](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases). runtime from [**here**](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases).
## Installation ## Installation
### Precompiled binaries ### Precompiled binaries
Ready-to-go executables for Linux, Windows and macOS are available [**here**](https://github.com/qarmin/czkawka/releases/). Ready-to-go executables for Linux, Windows and macOS are available [**here**](https://github.com/qarmin/czkawka/releases/).
If the app does not run when clicking at a launcher, run it through a terminal. If the app does not run when clicking the launcher, run it through a terminal.
You don't need to have any additional libraries for CLI Czkawka. You don't need to have any additional libraries for CLI Czkawka.
### Nightly Builds ### Nightly Builds
@ -49,11 +49,11 @@ Artifacts from each commit can be downloaded [**here**](https://github.com/qarmi
### Appimage ### Appimage
Appimage files are available in release page - [**GitHub releases**](https://github.com/qarmin/czkawka/releases/) Appimage files are available in release page - [**GitHub releases**](https://github.com/qarmin/czkawka/releases/)
This version is bundled with own theme. This version is bundled with its own theme.
There is also a small problem with not being able to open 2 images at once. There is also a small problem with not being able to open 2 images at once.
### Cargo ### Cargo
The easiest method to install Czkawka is using the `cargo` command. For compiling it, you need to get all the The easiest method to install Czkawka is using the `cargo` command. To compile it, you need to get all the
requirements from the [compilation section](Compilation.md). requirements from the [compilation section](Compilation.md).
``` ```
cargo install czkawka_gui cargo install czkawka_gui
@ -65,7 +65,7 @@ You can update the package with the same command.
``` ```
sudo snap install czkawka sudo snap install czkawka
``` ```
By default, Snap can only access to the files in your home directory. You have to allow czkawka to access to all the drives: By default, Snap can only access the files in your home directory. You have to allow czkawka access to all the drives:
``` ```
sudo snap connect czkawka:removable-media sudo snap connect czkawka:removable-media

View file

@ -7,20 +7,20 @@
- [Tools](#tools) - [Tools](#tools)
Czkawka for now contains two independent frontends - the terminal and graphical interface which share the core module. Czkawka for now contains two independent frontends - the terminal and graphical interface which share the core module.
Using Rust language without unsafe code, helps to create safe, fast with small resource requirements. Using Rust language without unsafe code helps to create safe, fast, and low resources requirements app.
This code also has good support for multi-threading. This code also has good support for multi-threading.
## GUI GTK ## GUI GTK
<img src="https://user-images.githubusercontent.com/41945903/103002387-14d1b800-452f-11eb-967e-9d5905dd6db5.png" width="800" /> <img src="https://user-images.githubusercontent.com/41945903/103002387-14d1b800-452f-11eb-967e-9d5905dd6db5.png" width="800" />
### GUI overview ### GUI overview
The GUI are built from different pieces: The GUI is built from different pieces:
- Red - Program settings, contains info about included/excluded directories which user may want to check. Also there is a tab with allowed extensions, which allow user to choose which type of files want to check. Next category is Excluded items, which allow to discard specific path with use of `*` wildcard - `/home/*` means that e.g. `/home/rafal/` will be ignored but no `/home/czkawka/`. The last one is settings tab which allow to save configuration of program, reset it and load it when needed. - Red - Program settings, contains info about included/excluded directories which user may want to check. Also, there is a tab with allowed extensions, which allows users to choose which type of files they want to check. Next category is Excluded items, which allows to discard specific path by using `*` wildcard - so `/home/*` means that e.g. `/home/rafal/` will be ignored but not `/home/czkawka/`. The last one is settings tab which allows to save configuration of the program, reset and load it when needed.
- Green - This allow to choose which tool we want to use. - Green - This allows to choose which tool we want to use.
- Blue - Here are settings to current tool, which we want/need to configure - Blue - Here are settings for the current tool, which we want/need to configure
- Pink - Window in which result of searching are printed - Pink - Window in which results of searching are printed
- Yellow - Box with buttons like `Search`(starts searching with current selected tool), `Hide Text View`(hide text box at the bottom with white overlay), `Symlink`(create symlink to selected file), `Select`(shows options to select specific rows), `Delete`(deletes selected records), `Save`(save to file result of searching) - some buttons are only visible when at least one result is visible. - Yellow - Box with buttons like `Search`(starts searching with the currently selected tool), `Hide Text View`(hides text box at the bottom with white overlay), `Symlink`(creates symlink to selected file), `Select`(shows options to select specific rows), `Delete`(deletes selected files), `Save`(save to file the search result) - some buttons are only visible when at least one result is visible.
- Brown - Small informative field to show informations e.g. about number of found duplicates files - Brown - Small informative field to show informations e.g. about number of found duplicate files
- White - Text window to show possible errors/warnings e.g. when failed to delete folder due no permissions etc. - White - Text window to show possible errors/warnings e.g. when failed to delete folder due no permissions etc.
There is also an option to see image previews in Similar Images tool. There is also an option to see image previews in Similar Images tool.
@ -30,25 +30,25 @@ There is also an option to see image previews in Similar Images tool.
### Action Buttons ### Action Buttons
There are several buttons which do different actions: There are several buttons which do different actions:
- Search - starts searching and shows progress dialog - Search - starts searching and shows progress dialog
- Stop - button in progress dialog, allows to easily stop current task. Sometimes it may take a few seconds until all atomic operations ends and GUI will be able to use again - Stop - button in progress dialog, allows to easily stop current task. Sometimes it may take a few seconds until all atomic operations end and GUI will become responsive again
- Select - allows selecting multiple entries at once - Select - allows selecting multiple entries at once
- Delete - delete entirely all selected entries - Delete - deletes entirely all selected entries
- Symlink - create symlink to selected files(first file is threaten as original and rest will become symlinks) - Symlink - creates symlink to selected files(first file is threaten as original and rest will become symlinks)
- Save - save initial state of results - Save - save initial state of results
- Hamburger(parallel lines) - used to show/hide bottom text panel which shows warnings/errors - Hamburger(parallel lines) - used to show/hide bottom text panel which shows warnings/errors
- Add (directories) - adds directories to include or exclude - Add (directories) - adds directories to include or exclude
- Remove (directories) - remove directories to search or to exclude - Remove (directories) - removes directories to search or to exclude
- Manual Add (directories) - allows to write by hand directories(may be used to write non visible in file manager directories) - Manual Add (directories) - allows to input by typing directories (may be used to enter non visible in file manager directories)
- Save current configuration - saves current GUI configuration to configuration file - Save current configuration - saves current GUI configuration to configuration file
- Load configuration - loads configuration of file and override current GUI config - Load configuration - loads configuration of file and overrides current GUI config
- Reset configuration - reset current GUI configuration to default - Reset configuration - resets current GUI configuration to defaults
### Opening/Manipulating files ### Opening/Manipulating files
It is possible to open selected files by double clicking at them. It is possible to open selected files by double clicking on them.
To open multiple file just select desired files with CTRL key pressed and still when clicking this key, double click at selected items with left mouse button. To open multiple file just select desired files with CTRL key pressed and still when clicking this key, double click at selected items with left mouse button.
To open folder containing selected file, just click twice at it with right mouse button. To open folder containing selected file, just click twice on it with right mouse button.
## CLI ## CLI
@ -60,20 +60,20 @@ To get general info how to use it just try to open czkawka_cli in console `czkaw
You should see a lot of examples how to use this app. You should see a lot of examples how to use this app.
If you want to get more detailed info about certain tool, after its name just write at the end `-h` or `--help` to get more details about tool. If you want to get more detailed info about certain tool, just add after its name `-h` or `--help` to get more details.
<img src="https://user-images.githubusercontent.com/41945903/103018151-0a221d80-4545-11eb-97b2-d7d77b49c735.png" width="800" /> <img src="https://user-images.githubusercontent.com/41945903/103018151-0a221d80-4545-11eb-97b2-d7d77b49c735.png" width="800" />
By default all tools only write about results to console, but it is possible with specific arguments to delete some files/arguments or save it to file. By default, all tools only write about results to console, but it is possible with specific arguments to delete some files/arguments or save it to file.
## Config/Cache files ## Config/Cache files
For now Czkawka store few config and cache files on disk: Currently, Czkawka stores few config and cache files on disk:
- `czkawka_gui_config.txt` - stores configuration of GUI which may be loaded at startup - `czkawka_gui_config.txt` - stores configuration of GUI which may be loaded at startup
- `cache_similar_image.txt` - stores cache data and hashes which may be used later without needing to compute image hash again - editing this file may cause app crashes. - `cache_similar_image.txt` - stores cache data and hashes which may be used later without needing to compute image hash again - editing this file may cause app crashes.
- `cache_broken_files.txt` - stores cache data of broken files - `cache_broken_files.txt` - stores cache data of broken files
- `cache_duplicates_Blake3.txt` - stores cache data of duplicated files, to not get too big performance hit when saving/loading file, only already fully hashed files bigger than 5MB are stored. Similar files with replaced `Blake3` to e.g. `SHA256` may be shown, when support for new hashes will be introduced in Czkawka. - `cache_duplicates_Blake3.txt` - stores cache data of duplicated files, to not suffer too big of a performance hit when saving/loading file, only already fully hashed files bigger than 5MB are stored. Similar files with replaced `Blake3` to e.g. `SHA256` may be shown, when support for new hashes will be introduced in Czkawka.
Config files are located in this path Config files are located in this path:
Linux - `/home/username/.config/czkawka` Linux - `/home/username/.config/czkawka`
Mac - `/Users/username/Library/Application Support/pl.Qarmin.Czkawka` Mac - `/Users/username/Library/Application Support/pl.Qarmin.Czkawka`
@ -89,7 +89,8 @@ Windows - `C:\Users\Username\AppData\Local\Qarmin\Czkawka\cache`
- **Manually adding multiple directories** - **Manually adding multiple directories**
You can manually edit config file `czkawka_gui_config.txt` and add/remove/change directories as you want. After setting required values, configuration must be loaded to Czkawka. You can manually edit config file `czkawka_gui_config.txt` and add/remove/change directories as you want. After setting required values, configuration must be loaded to Czkawka.
- **Slow checking of little number similar images** - **Slow checking of little number similar images**
If you checked before a big amount of images(several tens of thousands) and them still exists on disk, then information's about it are loaded from cache and save to it, even if you have check now only a few images. You can rename cache file `cache_similar_image.txt`(to be able to use it again) or delete it - cache will regenerate but with lower amount of entries it should load and save a lot of faster. If you checked before a large number of images (several tens of thousands) and they are still present on the disk, then the required information about all of them is loaded from and saved to the cache, even if you are working with only few image files. You can rename cache file `cache_similar_image.txt`(to be able to use it again) or delete it - cache will then regenerate but with smaller number of entries and this way it should load and save a lot of faster.
# Tools # Tools
@ -97,17 +98,17 @@ Windows - `C:\Users\Username\AppData\Local\Qarmin\Czkawka\cache`
Duplicate Finder allows you to search for files and group them according to a predefined criterion: Duplicate Finder allows you to search for files and group them according to a predefined criterion:
- **By name** - Groups files by name e.g. `/home/john/cats.txt` will be treated like a duplicate of a file named - **By name** - Compares and groups files by name e.g. `/home/john/cats.txt` will be treated like a duplicate of a file named
`/home/lucy/cats.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know `/home/lucy/cats.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know
what you are doing. what you are doing.
- **By size** - Groups files by their size (in bytes and perfect matches only). It is as fast as the previous mode and - **By size** - Compares and groups files by their size (in bytes and perfect matches only). It is as fast as the previous mode and
usually gives better results with duplicates, but I also do not recommend using it if you do not know what you are doing. usually gives better results with duplicates, but I also do not recommend using it if you do not know what you are doing.
- **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great - **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great
probability whether the files are identical. probability whether the files are identical.
This is the slowest, but almost 100% sure way to check the files. This is the slowest, but almost 100% sure way to compare the files for being the same.
Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different
files to be considered identical. files to be considered identical.
@ -119,16 +120,16 @@ Duplicate Finder allows you to search for files and group them according to a pr
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in - PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in
the group is independent of the others). In each such group a small fragment of each file (2KB) is loaded in turn and the group is independent of the others). In each such group a small fragment of each file (2KB) is loaded in turn and
then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually
allows me to reduce the time of searching for duplicates even by half. allows me to reduce the time of searching for duplicates almost by half.
- Checking the hash - After leaving files that have the same beginning in groups, you should now check the whole contents - Checking the hash - After leaving files that have the same beginning in groups, you should now check the whole contents
of the file to make sure they are identical. of the file to make sure they are identical.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first - **By hashmb** - Works the same way as via hash, only in the last phase it does not calculate the hash of the whole file but only of its first
megabyte. It is perfect for quick search of possible duplicate files. megabyte. It is perfect for quick search of possible duplicate files.
### Empty Files ### Empty Files
Searching for empty files is easy and fast, because we only need check the file metadata and its length. Searching for empty files is easy and fast, because we only need to check the file metadata and its length.
### Empty Directories ### Empty Directories
At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder
@ -137,7 +138,7 @@ set to be potentionally empty).
First, user-defined folders are put into the pool of folders to be checked. First, user-defined folders are put into the pool of folders to be checked.
Each element is checked to see if it is Each element is checked to see if it is:
- folder - this folder is added to the check queue as possible empty - `FolderEmptiness::Maybe` - folder - this folder is added to the check queue as possible empty - `FolderEmptiness::Maybe`
- anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer - anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer
empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag. empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag.
@ -150,12 +151,12 @@ but `/cow/ear/stack/` may still be empty.
Finally, all folders with the flag `FolderEmptiness::Maybe` are defaulted to empty. Finally, all folders with the flag `FolderEmptiness::Maybe` are defaulted to empty.
### Big Files ### Big Files
From each file inside the given path its size is read and then after sorting it, e.g. 50 largest files are displayed. For each file inside the given path its size is read and then after sorting the list, e.g. 50 largest, files are displayed.
### Temporary Files ### Temporary Files
Searching for temporary files only involves comparing their extensions with a previously prepared list. Searching for temporary files only involves comparing their extensions with a previously prepared list.
Currently files with this extensions are considered as temporary files - Currently files with these extensions are considered temporary files -
``` ```
["#", "thumbs.db", ".bak", "~", ".tmp", ".temp", ".ds_store", ".crdownload", ".part", ".cache", ".dmp", ".download", ".partial"] ["#", "thumbs.db", ".bak", "~", ".tmp", ".temp", ".ds_store", ".crdownload", ".part", ".cache", ".dmp", ".download", ".partial"]
``` ```
@ -163,17 +164,17 @@ Currently files with this extensions are considered as temporary files -
### Zeroed Files ### Zeroed Files
Zeroed files very often are results of e.g. incorrect file downloads. Zeroed files very often are results of e.g. incorrect file downloads.
Their search consists of 3 parts: Their search consists of 3 steps:
- Collecting a list of all files with a size greater than 0 - Collecting a list of all files with a size greater than 0
- At start, 64 bytes of each file are checked to discard the vast majority of non-zero files without major performance losses. - At start, 64 bytes of each file are checked to discard the vast majority of non-zero files without major performance losses.
- The next step is to check the rest of the file with bigger parts(32KB) - The next step is to check the rest of the file with bigger parts(32KB)
### Invalid Symlinks ### Invalid Symlinks
To find invalid symlinks we must to find first a symlnks. To find invalid symlinks we must first find symlnks.
After searching for them you should check at which element it points to and if it does not exist, add this symlinks into the list of invalid symlinks, pointing to a non-existent path. After searching for them you should check at which element it points to and if it does not exist, add this symlinks into the list of invalid symlinks, pointing to a non-existent path.
The second mode is to detect recursive symlink. Unfortunately, this mode does not work and it display when using it, an error of a non-existent target element, but it is implemented by counting the jumps of the symlink and after exceeding a certain number (e.g. 20) it is considered that the given symlink is recursive. The second mode is to detect recursive symlink. Unfortunately, this mode does not work and it displays when using it an error of a non-existent target element, but it is implemented by counting the jumps of the symlink and after exceeding a certain number (e.g. 20) it is considered that the given symlink is recursive.
### Same Music ### Same Music
This is a mode to find identical music files through tags. This is a mode to find identical music files through tags.
@ -184,7 +185,7 @@ First, music files with one of the extensions `[".mp3", ".flac", ".m4a"]` are co
Then for each music file its tags are read. Then for each music file its tags are read.
Then, for each selected tag by which we want to search for duplicates, we perform the following steps Then, for each selected tag by which we want to search for duplicates, we perform the following steps:
- For each input file we read the value of the currently checked tag - For each input file we read the value of the currently checked tag
- If it is empty, we ignore the file, if it has a value, we throw it into an array whose key is this value - If it is empty, we ignore the file, if it has a value, we throw it into an array whose key is this value
- After checking all files, arrays containing only one element are deleted - After checking all files, arrays containing only one element are deleted
@ -196,17 +197,17 @@ It is a tool for finding similar images that differ e.g. in watermark, size etc.
The tool first collects images with specific extensions that can be checked - `[".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif", ".pnm", ".tga", ".ff", ".gif", ".jif", ".jfi", ".ico", ".webp", ".avif"]`. The tool first collects images with specific extensions that can be checked - `[".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif", ".pnm", ".tga", ".ff", ".gif", ".jif", ".jfi", ".ico", ".webp", ".avif"]`.
Next cached data are loaded from file to prevent hashing twice same file. Next cached data is loaded from file to prevent hashing twice the same file.
Automatically cache which points to non existing data is deleted. The cache which points to non existing data is deleted automatically.
Then a perceptual hash is created for each image which isn't available in cache. Then a perceptual hash is created for each image which isn't available in cache.
Cryptographic hash (used for example in ciphers) for similar inputs gives completely different outputs Cryptographic hash (used for example in ciphers) for similar inputs gives completely different outputs:
11110 ==> AAAAAB 11110 ==> AAAAAB
11111 ==> FWNTLW 11111 ==> FWNTLW
01110 ==> TWMQLA 01110 ==> TWMQLA
Perceptual hash at similar inputs, gives similar outputs Perceptual hash at similar inputs, gives similar outputs:
11110 ==> AAAAAB 11110 ==> AAAAAB
11111 ==> AABABB 11111 ==> AABABB
01110 ==> AAAACB 01110 ==> AAAACB
@ -214,14 +215,14 @@ Perceptual hash at similar inputs, gives similar outputs
Computed hash data is then thrown into a special tree that allows to compare hashes using [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance). Computed hash data is then thrown into a special tree that allows to compare hashes using [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance).
Next this hashes are saved to file, to be able to opens images without needing to hash it more times. Next these hashes are saved to file, to be able to open images without needing to hash it more times.
Finally, each hash is compared with the others and if the distance between them is less than the maximum distance specified by the user, the images are considered similar and thrown from the pool of images to be searched. Finally, each hash is compared with the others and if the distance between them is less than the maximum distance specified by the user, the images are considered similar and thrown from the pool of images to be searched.
### Broken Files ### Broken Files
This tool is created to find files which are corrupted or have invalid extension. This tool finds files which are corrupted or have an invalid extension.
At first files from specific group(image,archive,audio) are collected and then this files are opened. At first files from specific group (image,archive,audio) are collected and then these files are opened.
If an error happens when opening this file then it means that this file is corrupted or unsupported. If an error happens when opening such file it means that this file is corrupted or unsupported.
Only some file extensions are supported, because I rely on external crates. Also some false positives may be shown(e.g. https://github.com/image-rs/jpeg-decoder/issues/130) so always open file to check if it is really broken. Only some file extensions are supported, because I rely on external crates. Also, some false positives may be shown(e.g. https://github.com/image-rs/jpeg-decoder/issues/130) so always open file to check if it is really broken.