1
0
Fork 0
mirror of synced 2024-05-10 07:22:36 +12:00

Spicing up the markdown files (#222)

* Spicing up the README

- Making it more readable
- Better English, easier to read
- Hiding links
- Fixing the absolute hideous tables which were impossible to read in the raw readme

* Fixed some things, not a lot though.
This commit is contained in:
bellrise 2021-01-20 11:16:56 +01:00 committed by GitHub
parent 7fdc8ea3fc
commit 5751d8a723
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 206 additions and 150 deletions

287
README.md
View file

@ -1,37 +1,40 @@
![com github qarmin czkawka](https://user-images.githubusercontent.com/41945903/102616149-66490400-4137-11eb-9cd6-813b2b070834.png) ![com github qarmin czkawka](https://user-images.githubusercontent.com/41945903/102616149-66490400-4137-11eb-9cd6-813b2b070834.png)
**Czkawka** is written in Rust, simple, fast and easy to use app to remove unnecessary files from your computer. **Czkawka** is a simple, fast and easy to use app to remove unnecessary files from your computer.
## Features ## Features
- Written in memory safe Rust - Written in memory safe Rust
- Amazingly fast - due using more or less advanced algorithms and multithreading support - Amazingly fast - due to using more or less advanced algorithms and multithreading
- Free, Open Source without ads - Free, Open Source without ads
- Multiplatform - works on Linux, Windows and macOS - Multiplatform - works on Linux, Windows and macOS
- Cache support - second and further scans should be a lot of faster than first - Cache support - second and further scans should be a lot of faster than the first
- CLI frontend, very fast to automate tasks - CLI frontend - for easy automation
- GUI frontend - uses modern GTK 3 and looks similar to FSlint - GUI frontend - uses modern GTK 3 and looks similar to FSlint
- Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions or excluded items with * wildcard - Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions
or excluded items with the `*` wildcard
- Multiple tools to use: - Multiple tools to use:
- Duplicates - Finds duplicates basing on file name, size, hash, first 1 MB of hash - Duplicates - Finds duplicates basing on file name, size, hash, first 1 MB of hash
- Empty Folders - Finds empty folders with the help of advanced algorithm - Empty Folders - Finds empty folders with the help of an advanced algorithm
- Big Files - Finds provided number of the biggest files in given location - Big Files - Finds the provided number of the biggest files in given location
- Empty Files - Looks for empty files across disk - Empty Files - Looks for empty files across the drive
- Temporary Files - Allows finding temporary files - Temporary Files - Finds temporary files
- Similar Images - Finds images which are not exactly the same(different resolution, watermarks) - Similar Images - Finds images which are not exactly the same (different resolution, watermarks)
- Zeroed Files - Find files which are filled with zeros(usually corrupted) - Zeroed Files - Finds files which are filled with zeros (usually corrupted)
- Same Music - Search for music with same artist, album etc. - Same Music - Searches for music with same artist, album etc.
- Invalid Symbolic Links - Shows symbolic links which points to non-existent files/directories - Invalid Symbolic Links - Shows symbolic links which points to non-existent files/directories
- Broken Files - Finds files with invalid extension or corrupted - Broken Files - Finds files with an invalid extension or that are corrupted
<!-- The GIF thingy -->
![Czkawka](https://user-images.githubusercontent.com/41945903/104711404-9cbb7400-5721-11eb-904d-9677c189f7ab.gif) ![Czkawka](https://user-images.githubusercontent.com/41945903/104711404-9cbb7400-5721-11eb-904d-9677c189f7ab.gif)
## Instruction ## How do I use it?
You can find instruction how to use Czkawka [here](instructions/Instruction.md) You can find an instruction on how to use Czkawka [**here**](instructions/Instruction.md).
## Requirements ## Requirements
If you are using Windows or Mac binaries, there is no specific requirements. If you are using Windows or Mac binaries, there is no specific requirements.
Same with Appimage on Linux(except having system 18.04+ or similar). Same with Appimage on Linux (except having system 18.04+ or similar).
But compiled GUI binaries on Linux or compiling it on your own os require to install this packages:
Although, compiled GUI binaries on Linux or compiling it on your own OS requires you to install these packages:
### Ubuntu/Debian ### Ubuntu/Debian
``` ```
sudo apt install cargo libgtk-3-dev sudo apt install cargo libgtk-3-dev
@ -41,108 +44,135 @@ sudo apt install cargo libgtk-3-dev
sudo yum install gtk3-devel glib2-devel sudo yum install gtk3-devel glib2-devel
``` ```
## Usage
# Installation
### Precompiled binaries ### Precompiled binaries
Precompiled binaries are available here - https://github.com/qarmin/czkawka/releases/. Ready-to-go executables are available [**here**](https://github.com/qarmin/czkawka/releases/).
If the app does not run when clicking at a launcher, run it through a terminal. If the app does not run when clicking at a launcher, run it through a terminal.
You don't need to have any additional libraries for CLI Czkawka You don't need to have any additional libraries for CLI Czkawka
#### GUI Requirements
### GUI Requirements
##### Linux ##### Linux
For Czkawka GUI you need to have at least GTK 3.22 and also Alsa installed(for finding broken music files). For Czkawka GUI you are required to have at least `GTK 3.22` and also `Alsa` installed (for finding broken music
It should be installed by default on all the most popular distros. files). It should be installed by default on all the most popular distros.
##### Windows ##### Windows
`czkawka_gui.exe` extracted from zip file `windows_czkawka_gui.zip` needs to have all files inside around, because use them. The `czkawka_gui.exe` which is extracted from the `windows_czkawka_gui.zip` zip file needs to be in the same
If you want to move somewhere else exe binary and open it, you need to install GTK 3 runtime from site https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases file as the rest. If you want to move and open the executable somewhere else, you need to install the `GTK 3`
##### MacOS runtime from [**here**](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases).
For now you need to install manually GTK 3 libraries, because are dynamically loaded from OS(Help needed to use static linking).
To install it you need to type this commands in terminal ##### macOS
Currently you need to manually install `GTK 3` libraries, because they are dynamically loaded from the OS (*we need
help in using static linking*). Installation in the terminal:
```shell ```shell
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install gtk+3 brew install gtk+3
``` ```
Next you need to go to place where you downloaded app and add executable bit After that, go to the location where you installed this and add the `executable` permission.
```shell ```shell
chmod +x mac_czkawka_gui chmod +x mac_czkawka_gui
``` ```
At the end you can open this app Execute in the same folder with:
```shell ```shell
./mac_czkawka_gui ./mac_czkawka_gui
``` ```
### Appimage ### Appimage
Appimage files are available in release page - https://github.com/qarmin/czkawka/releases/ Appimage files are available in release page - [**Github releases**](https://github.com/qarmin/czkawka/releases/)
There is a problem with this currently, as it doesn't allow you to open two images/files at once.
For now looks that there is a bug with this format, because it doesn't allow opening two images/files at once.
### Cargo ### Cargo
The easiest method to install Czkawka is to use Cargo command, since it basically compile an entire app, you need to install required packages from `Compilation` section The easiest method to install Czkawka is using the `cargo` command. For compiling it, you need to get all the
requirements from the [compilation section](#Compilation)
``` ```
cargo install czkawka_gui cargo install czkawka_gui
cargo install czkawka_cli cargo install czkawka_cli
``` ```
You can update package by typing same command. You can update the package with the same command.
### Snap ### Snap
Snap also are available, but there is no access to external drives. Snaps also are available, but there is no access to external drives.
``` ```
sudo snap install czkawka sudo snap install czkawka
``` ```
Snap store entry - https://snapcraft.io/czkawka The Snap store entry can be found [**here**](https://snapcraft.io/czkawka).
Edgy builds are build for every commit, but it may be a little unstable(very rarely, because I'm not pushing untested code). Fresh builds are created for every commit, but they may be a little unstable, although that happenes very rarely
because I don't push untested code.
<!-- Dunno if the flatpak section should be here, because it just takes valuable
space, but if you want to keep it you do you. I'm only fixing the english
here. -->
### Flatpak ### Flatpak
Maybe someday Maybe someday
### Debian/Ubuntu repository and PPA ### Debian/Ubuntu repository and PPA
Tried to set up it, but for now I have problems described in this issue I tried to set up it, but I'm having problems described in this [**issue**](https://salsa.debian.org/rust-team/debcargo-conf/-/issues/21).
https://salsa.debian.org/rust-team/debcargo-conf/-/issues/21
### AUR - Arch Linux Package (unofficial) ### AUR - Arch Linux Package (unofficial)
Czkawka is also available in Arch Linux's AUR from which it can be easily downloaded and installed on the system. Czkawka is also available in Arch Linux's AUR from which it can be easily installed.
``` ```
yay -Syu czkawka-git yay -Syu czkawka-git
``` ```
This is unofficial package, so new versions will not be always available. *This is unofficial package, so new versions will not be always available.*
### Devel versions
Artifacts from each commit you can also download here - https://github.com/qarmin/czkawka/actions
## Compilation ### Development versions
### Requirements Artifacts from each commit can be downloaded [**here**](https://github.com/qarmin/czkawka/actions)
Rust 1.48 - Czkawka aims to support only the latest stable Rust version
GTK 3.22 - only for GTK backend
If you want to compile CLI frontend, then just skip lines which contains `gtk` word.
#### Debian/Ubuntu <!-- Note the #Compilation link if you're changing this! -->
# Compilation
The compilation section is generally not recommended, because you have multiple better sources
of this app than compiling it yourself.
## Requirements
Program | Min | What for
---------|------|------------------------------------------------------------
Rust | 1.48 | Czkawka aims to support only the latest stable Rust version
GTK | 3.22 | Only for the `GTK` backend
If you only want the terminal version without a GUI, just skip all lines about `gtk`.
#### Debian / Ubuntu
```shell ```shell
sudo apt install -y curl # Needed by Rust update tool sudo apt install -y curl # Needed by Rust update tool
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Download the latest stable Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Download the latest stable Rust
sudo apt install -y libgtk-3-dev libasound2-dev sudo apt install -y libgtk-3-dev libasound2-dev
``` ```
#### Fedora/CentOS/Rocky Linux
#### Fedora / CentOS / Rocky Linux
```shell ```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Download the latest stable Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Download the latest stable Rust
sudo yum install gtk3-devel glib2-devel alsa-lib-devel sudo yum install gtk3-devel glib2-devel alsa-lib-devel
``` ```
#### MacOS
#### macOS
You need to install Homebrew and GTK Libraries You need to install Homebrew and GTK Libraries
```shell ```shell
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install rust gtk+3 brew install rust gtk+3
``` ```
### Windows(Not working yet)
First you need to install Visual C++ components from Visual Studio installer - https://visualstudio.microsoft.com/downloads/
### Windows
*Will be avaible in the future*
<!-- First you need to install Visual C++ components from Visual Studio installer - https://visualstudio.microsoft.com/downloads/
Next install Rust from site https://rustup.rs/ Next install Rust from site https://rustup.rs/
After that the latest GTK 3 runtime must be installed from https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases After that the latest GTK 3 runtime must be installed from https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases
-->
## Compilation from source
### Compilation from source
- Download the source - Download the source
``` ```
git clone https://github.com/qarmin/czkawka.git git clone https://github.com/qarmin/czkawka.git
@ -152,91 +182,102 @@ cd czkawka
``` ```
cargo run --bin czkawka_gui cargo run --bin czkawka_gui
``` ```
For Linux-to-Windows cross-building instruction look at the CI. For Linux-to-Windows cross-building instruction look at the CI.
![GUI](https://user-images.githubusercontent.com/41945903/103371136-fb9cae80-4ace-11eb-8d72-7b4c8ac44260.png) ![GUI](https://user-images.githubusercontent.com/41945903/103371136-fb9cae80-4ace-11eb-8d72-7b4c8ac44260.png)
- Run CLI(this will print help with a lot of examples) - Run CLI (this will print help with a lot of examples)
``` ```
cargo run --bin czkawka_cli cargo run --bin czkawka_cli
``` ```
![CLI](https://user-images.githubusercontent.com/41945903/93716816-0bbcfd80-fb72-11ea-8d31-4c87cc2abe6d.png) ![CLI](https://user-images.githubusercontent.com/41945903/93716816-0bbcfd80-fb72-11ea-8d31-4c87cc2abe6d.png)
<!-- End of compilation section -->
## Benchmarks ## Benchmarks
Since Czkawka is written in Rust and aims to be a faster alternative to FSlint (written in Python), we need to compare the speed of these tools.
I tested it on SSD Disk 256 GB GoodRam and i7 4770 CPU.
I prepared a directory and performed a test without any folder exceptions(I removed all directories from FSlint and Czkawka from other tabs than Include Directory) which contained 229868 files which took 203,7 GB and 13708 duplicates files in 9117 groups which took 7.90 GB.
Minimum file size to check I set to 1 KB on all programs
| App| Executing Time |
|:----------:|:-------------:|
| FSlint 2.4.7 (Second Run)| 86s |
| Czkawka 1.4.0 (Second Run) | 12s |
| DupeGuru 4.0.4 (Second Run) | 28s |
I used Mprof for checking memory usage FSlint and Dupeguru, for Czkawka I used Heaptrack. Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint (which written in Python), we need
To not get Dupeguru crash I checked smaller directory with 217986 files and 41883 folders. to compare the speed of these tools.
| App| Idle Ram | Max Operational Ram Usage | Stabilized after search | I tested it on a 256 GB SSD and a i7-4770 CPU.
|:----------:|:-------------:|:-------------:|:-------------:|
| FSlint 2.4.7 | 62 MB | 84 MB | 84 MB |
| Czkawka 1.4.0 | 9 MB | 66 MB | 32 MB |
| DupeGuru 4.0.4 | 80 MB | 210 MB | 155 MB |
Similar Images which check 332 files which takes 1,7 GB I prepared a directory and performed a test without any folder exceptions (I removed all directories from FSlint and
Czkawka from other tabs than Include Directory) which contained 229 868 files, took 203.7 GB and had 13 708 duplicate
files in 9117 groups which took 7.90 GB.
| App| Scan time | Minimum file size to check I set to 1 KB on all programs.
|:----------:|:-------------:|
| Czkawka 1.4.0 | 58s |
| DupeGuru 4.0.4 | 51s |
Similar Images which check 1421 image files which takes 110,1 MB | App | Executing Time |
|:---------------------------:|:--------------:|
| FSlint 2.4.7 (Second Run) | 86s |
| Czkawka 1.4.0 (Second Run) | 12s |
| DupeGuru 4.0.4 (Second Run) | 28s |
| App| Scan time |
|:----------:|:-------------:|
| Czkawka 1.4.0 | 25s |
| DupeGuru 4.0.4 | 92s |
So still is a big room for improvements. I used Mprof for checking memory usage FSlint and DupeGuru, for Czkawka I used Heaptrack.
To not get a crash from DupeGuru I checked a smaller directory with 217 986 files and 41 883 folders.
| App | Idle Ram | Max Operational Ram Usage | Stabilized after search |
|:--------------:|:--------:|:-------------------------:|:-----------------------:|
| FSlint 2.4.7 | 62 MB | 84 MB | 84 MB |
| Czkawka 1.4.0 | 9 MB | 66 MB | 32 MB |
| DupeGuru 4.0.4 | 80 MB | 210 MB | 155 MB |
Similar images which check 332 files which took 1.7 GB
| App | Scan time |
|:--------------:|:----------:|
| Czkawka 1.4.0 | 58s |
| DupeGuru 4.0.4 | 51s |
Similar images which check 1421 image files which took 110.1 MB
| App | Scan time |
|:--------------:|:----------|
| Czkawka 1.4.0 | 25s |
| DupeGuru 4.0.4 | 92s |
<!-- it's a lot of room, not a big room lol -->
So there is still is a lot of room for improvements.
## Comparsion other tools ## Comparsion other tools
| | Czkawka | FSlint | DupeGuru | | | Czkawka | FSlint | DupeGuru |
|:----------:|:-------------:|:-----:|:---:| |:----------------------:|:-------:|:----------:|:-----------------:|
| Language | Rust| Python | Python/Objective C | | Language | Rust | Python | Python/Obj-C |
| OS | Linux, Windows, Mac | Linux | Linux, Windows, Mac| | OS | All | Linux only | All |
| Framework | GTK 3 (Gtk-rs)| GTK 2 (PyGTK) | Qt 5 (PyQt)/Cocoa | | Framework | GTK 3 | PyGTK | Qt 5 (PyQt)/Cocoa |
| Ram Usage | Low | Medium | Very High | | Ram Usage | Low | Medium | Very High |
| Duplicate finder | X | X | X | | Duplicate finder | • | • | • |
| Empty files | X | X | | | Empty files | • | • | |
| Empty folders | X | X | | | Empty folders | • | • | |
| Temporary files | X | X | | | Temporary files | • | • | |
| Big files | X | | | | Big files | • | | |
| Similar images | X | | X | | Similar images | • | | • |
| Zeroed Files| X | | | | Zeroed Files | • | | |
| Music duplicates(tags) | X | | X | | Music duplicates(tags) | • | | • |
| Invalid symlinks | X | X | | | Invalid symlinks | • | • | |
| Broken Files | X | | | | Broken Files | • | | |
| Installed packages | | X | | | Installed packages | | • | |
| Invalid names | | X | | | Invalid names | | • | |
| Names conflict | | X | | | Names conflict | | • | |
| Bad ID | | X | | | Bad ID | | • | |
| Non stripped binaries | | X | | | Non stripped binaries | | • | |
| Redundant whitespace | | X | | | Redundant whitespace | | • | |
| Multiple languages(po) | | X | X | | Multiple languages(po) | | • | • |
| Cache support | X | | X | | Cache support | • | | • |
| Project Activity | High | Very Low | High | | Project Activity | High | Very Low | High |
## Contributions ## Contributions
Contributions to this repository are welcome. Contributions to this repository are welcome.
You can help by creating: You can help by creating a:
- Bug report - memory leaks, unexpected behavior, crashes - Bug report - memory leaks, unexpected behavior, crashes
- Feature proposals - proposal to change/add/delete some features - Feature proposals - proposal to change/add/delete some features
- Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality. If the change is bigger, then it's a good idea to open a new issue to discuss changes. - Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality.
- Documentation - There is [instruction](instructions/Instruction.md) which you can improve. If the change is bigger, then it's a good idea to open a new issue to discuss changes.
- Documentation - There is an [instruction](instructions/Instruction.md) which you can improve.
The code should be clean and well formatted (Clippy and fmt are required in each PR). The code should be clean and well formatted (Clippy and fmt are required in each PR).
@ -245,20 +286,22 @@ Czkawka is a Polish word which means _hiccup_.
I chose this name because I wanted to hear people speaking other languages pronounce it. I chose this name because I wanted to hear people speaking other languages pronounce it.
This name is not as bad as it seems, because I was also thinking about using words like _żółć_, _gżegżółka_ or _żołądź_, but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project. This name is not as bad as it seems, because I was also thinking about using words like _żółć_, _gżegżółka_ or _żołądź_,
but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.
At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself for a possible change of the name of the program, but the opinions were extremely mixed. At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself
for a possible change of the name of the program, and the opinions were extremely mixed.
## License ## License
Code is distributed under MIT license. Code is distributed under MIT license.
Icon is created by [jannuary](https://github.com/jannuary) and licensed CC-BY-4.0. Icon is created by [jannuary](https://github.com/jannuary) and licensed CC-BY-4.0.
Windows dark theme is used from AdMin repo - https://github.com/nrhodes91/AdMin with MIT license Windows dark theme is used from [AdMin repo](https://github.com/nrhodes91/AdMin) with MIT license
Program is completely free to use. The program is completely free to use.
"Gratis to uczciwa cena" - "Free is a fair price" "Gratis to uczciwa cena" - "Free is a fair price"
## Donations ## Donations
If you are using the app, I would appreciate a donation for its further development - https://github.com/sponsors/qarmin. If you are using the app, I would appreciate a donation for its further development, which can be done [here](https://github.com/sponsors/qarmin).

View file

@ -1,59 +1,72 @@
# Instruction # Instruction
- [Basic information](#basic-informations) - [Tools](#tools)
- [Tools - How works?](#tools---how-works) - [Config / Cache files](#configcache-files)
- [Config/Cache files](#configcache-files)
- [GUI](#gui-gtk) - [GUI](#gui-gtk)
- [CLI](#cli) - [CLI](#cli)
- [Tips and tricks](#tips-and-tricks) - [Tips and tricks](#tips-and-tricks)
## Basic Informations Czkawka for now contains two independent frontends - the terminal and graphical interface which share the core module.
Czkawka for now contains two independent frontends - Console and Graphical interface which share the core module which contains basic and common functions used by each frontend.
Using Rust language without unsafe code, helps to create safe, fast with small resource requirements. Using Rust language without unsafe code, helps to create safe, fast with small resource requirements.
This code also has good support for multi-threading.
The code has very good support for multithreading, so the better processor/disk the performance should increase exponentially. # Tools
## Tools - How works?
### Duplicate Finder ### Duplicate Finder
Duplicate Finder allows you to search for files and group them according to a predefined criterion: Duplicate Finder allows you to search for files and group them according to a predefined criterion:
- **By name** - Groups files by name e.g. `/home/rafal/plik.txt` will be treat like duplicate of file `/home/romb/plik.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know what you are doing.
- **By size** - Groups files by its size(in bytes), which must be exactly the same. It is as fast as the previous mode and usually gives much more correct results with duplicates, but I also do not recommend using it if you do not know what you are doing.
- **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great probability whether the files are identical.
This is the slowest, but almost 100% sure way to check the files. - **By name** - Groups files by name e.g. `/home/john/cats.txt` will be treated like a duplicate of a file named
`/home/lucy/cats.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know
what you are doing.
Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different files to be considered identical. - **By size** - Groups files by their size (in bytes and perfect matches only). It is as fast as the previous mode and
usually gives better results with duplicates, but I also do not recommend using it if you do not know what you are doing.
It consists of 3 parts: - **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great
- Grouping files of identical size - allows you to throw away files of unique size, which are already known to have no duplicates at this stage. probability whether the files are identical.
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in the group is independent of the others).
In each such group a small fragment of each file (2KB) is loaded in turn and then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually allowed me to reduce the time of searching for duplicates even by half. This is the slowest, but almost 100% sure way to check the files.
- Checking the Hash - After leaving files that have the same beginning in groups, you should now check the whole contents of the file to make sure they are identical.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first Megabyte. It is perfect for quick search of possible duplicate files. Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different
files to be considered identical.
It consists of 3 steps:
- Grouping files of identical size - allows you to throw away files of unique size, which are already known to have no
duplicates at this stage.
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in
the group is independent of the others). In each such group a small fragment of each file (2KB) is loaded in turn and
then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually
allows me to reduce the time of searching for duplicates even by half.
- Checking the hash - After leaving files that have the same beginning in groups, you should now check the whole contents
of the file to make sure they are identical.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first
megabyte. It is perfect for quick search of possible duplicate files.
### Empty Files ### Empty Files
Searching for empty files is rather easy, because we only need to read file metadata and check if its length is 0. Searching for empty files is easy and fast, because we only need check the file metadata and its length.
### Empty Directories ### Empty Directories
Empty directories are those that do not contain any other files, symbolic links, etc. unless they are other empty directories. At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder
directly selected by the user) and a flag to indicate whether the given directory is empty (at the beginning each one is
At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder directly selected by the user) and a flag to indicate whether the given directory is empty(at the beginning each one is potentially empty). set to be potentionally empty).
First, user-defined folders are put into the pool of folders to be checked. First, user-defined folders are put into the pool of folders to be checked.
Each element is checked to see if it is Each element is checked to see if it is
- folder - this folder is added to the check queue as possible empty - `FolderEmptiness::Maybe` - folder - this folder is added to the check queue as possible empty - `FolderEmptiness::Maybe`
- anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag. - anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer
empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag.
e.g. There is 4 checked folder which may be empty `/krowa/`, `/krowa/ucho/`, `/krowa/ucho/stos/`, `/krowa/ucho/flaga/`. Example: there are 4 checked folders which *may* be empty `/cow/`, `/cow/ear/`, `/cow/ear/stack/`, `/cow/ear/flag/`.
In the last one is found a file, so that means that `/krowa/ucho/flaga/` is not empty and also all parents - `/krowa/ucho/` and `/krowa/`. The last folder contains a file, so that means that `/cow/ear/flag` is not empty and also all its parents - `/cow/ear/` and `/cow/`,
`/krowa/ucho/stos/` still may be empty. but `/cow/ear/stack/` may still be empty.
Finally, all folders with the flag `FolderEmpriness::Maybe` are considered empty Finally, all folders with the flag `FolderEmptiness::Maybe` are defaulted to empty.
### Big Files ### Big Files
From each file inside the given path its size is read and then after sorting it, e.g. 50 largest files are displayed. From each file inside the given path its size is read and then after sorting it, e.g. 50 largest files are displayed.