1
0
Fork 0
mirror of synced 2024-04-28 01:22:53 +12:00

Spicing up the markdown files (#222)

* Spicing up the README

- Making it more readable
- Better English, easier to read
- Hiding links
- Fixing the absolute hideous tables which were impossible to read in the raw readme

* Fixed some things, not a lot though.
This commit is contained in:
bellrise 2021-01-20 11:16:56 +01:00 committed by GitHub
parent 7fdc8ea3fc
commit 5751d8a723
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 206 additions and 150 deletions

287
README.md
View file

@ -1,37 +1,40 @@
![com github qarmin czkawka](https://user-images.githubusercontent.com/41945903/102616149-66490400-4137-11eb-9cd6-813b2b070834.png)
**Czkawka** is written in Rust, simple, fast and easy to use app to remove unnecessary files from your computer.
**Czkawka** is a simple, fast and easy to use app to remove unnecessary files from your computer.
## Features
- Written in memory safe Rust
- Amazingly fast - due using more or less advanced algorithms and multithreading support
- Amazingly fast - due to using more or less advanced algorithms and multithreading
- Free, Open Source without ads
- Multiplatform - works on Linux, Windows and macOS
- Cache support - second and further scans should be a lot of faster than first
- CLI frontend, very fast to automate tasks
- Cache support - second and further scans should be a lot of faster than the first
- CLI frontend - for easy automation
- GUI frontend - uses modern GTK 3 and looks similar to FSlint
- Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions or excluded items with * wildcard
- Rich search option - allows setting absolute included and excluded directories, set of allowed file extensions
or excluded items with the `*` wildcard
- Multiple tools to use:
- Duplicates - Finds duplicates basing on file name, size, hash, first 1 MB of hash
- Empty Folders - Finds empty folders with the help of advanced algorithm
- Big Files - Finds provided number of the biggest files in given location
- Empty Files - Looks for empty files across disk
- Temporary Files - Allows finding temporary files
- Similar Images - Finds images which are not exactly the same(different resolution, watermarks)
- Zeroed Files - Find files which are filled with zeros(usually corrupted)
- Same Music - Search for music with same artist, album etc.
- Empty Folders - Finds empty folders with the help of an advanced algorithm
- Big Files - Finds the provided number of the biggest files in given location
- Empty Files - Looks for empty files across the drive
- Temporary Files - Finds temporary files
- Similar Images - Finds images which are not exactly the same (different resolution, watermarks)
- Zeroed Files - Finds files which are filled with zeros (usually corrupted)
- Same Music - Searches for music with same artist, album etc.
- Invalid Symbolic Links - Shows symbolic links which points to non-existent files/directories
- Broken Files - Finds files with invalid extension or corrupted
- Broken Files - Finds files with an invalid extension or that are corrupted
<!-- The GIF thingy -->
![Czkawka](https://user-images.githubusercontent.com/41945903/104711404-9cbb7400-5721-11eb-904d-9677c189f7ab.gif)
## Instruction
You can find instruction how to use Czkawka [here](instructions/Instruction.md)
## How do I use it?
You can find an instruction on how to use Czkawka [**here**](instructions/Instruction.md).
## Requirements
If you are using Windows or Mac binaries, there is no specific requirements.
Same with Appimage on Linux(except having system 18.04+ or similar).
But compiled GUI binaries on Linux or compiling it on your own os require to install this packages:
Same with Appimage on Linux (except having system 18.04+ or similar).
Although, compiled GUI binaries on Linux or compiling it on your own OS requires you to install these packages:
### Ubuntu/Debian
```
sudo apt install cargo libgtk-3-dev
@ -41,108 +44,135 @@ sudo apt install cargo libgtk-3-dev
sudo yum install gtk3-devel glib2-devel
```
## Usage
# Installation
### Precompiled binaries
Precompiled binaries are available here - https://github.com/qarmin/czkawka/releases/.
Ready-to-go executables are available [**here**](https://github.com/qarmin/czkawka/releases/).
If the app does not run when clicking at a launcher, run it through a terminal.
You don't need to have any additional libraries for CLI Czkawka
#### GUI Requirements
### GUI Requirements
##### Linux
For Czkawka GUI you need to have at least GTK 3.22 and also Alsa installed(for finding broken music files).
It should be installed by default on all the most popular distros.
For Czkawka GUI you are required to have at least `GTK 3.22` and also `Alsa` installed (for finding broken music
files). It should be installed by default on all the most popular distros.
##### Windows
`czkawka_gui.exe` extracted from zip file `windows_czkawka_gui.zip` needs to have all files inside around, because use them.
If you want to move somewhere else exe binary and open it, you need to install GTK 3 runtime from site https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases
##### MacOS
For now you need to install manually GTK 3 libraries, because are dynamically loaded from OS(Help needed to use static linking).
To install it you need to type this commands in terminal
The `czkawka_gui.exe` which is extracted from the `windows_czkawka_gui.zip` zip file needs to be in the same
file as the rest. If you want to move and open the executable somewhere else, you need to install the `GTK 3`
runtime from [**here**](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases).
##### macOS
Currently you need to manually install `GTK 3` libraries, because they are dynamically loaded from the OS (*we need
help in using static linking*). Installation in the terminal:
```shell
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install gtk+3
```
Next you need to go to place where you downloaded app and add executable bit
After that, go to the location where you installed this and add the `executable` permission.
```shell
chmod +x mac_czkawka_gui
```
At the end you can open this app
Execute in the same folder with:
```shell
./mac_czkawka_gui
```
### Appimage
Appimage files are available in release page - https://github.com/qarmin/czkawka/releases/
For now looks that there is a bug with this format, because it doesn't allow opening two images/files at once.
Appimage files are available in release page - [**Github releases**](https://github.com/qarmin/czkawka/releases/)
There is a problem with this currently, as it doesn't allow you to open two images/files at once.
### Cargo
The easiest method to install Czkawka is to use Cargo command, since it basically compile an entire app, you need to install required packages from `Compilation` section
The easiest method to install Czkawka is using the `cargo` command. For compiling it, you need to get all the
requirements from the [compilation section](#Compilation)
```
cargo install czkawka_gui
cargo install czkawka_cli
```
You can update package by typing same command.
You can update the package with the same command.
### Snap
Snap also are available, but there is no access to external drives.
Snaps also are available, but there is no access to external drives.
```
sudo snap install czkawka
```
Snap store entry - https://snapcraft.io/czkawka
The Snap store entry can be found [**here**](https://snapcraft.io/czkawka).
Edgy builds are build for every commit, but it may be a little unstable(very rarely, because I'm not pushing untested code).
Fresh builds are created for every commit, but they may be a little unstable, although that happenes very rarely
because I don't push untested code.
<!-- Dunno if the flatpak section should be here, because it just takes valuable
space, but if you want to keep it you do you. I'm only fixing the english
here. -->
### Flatpak
Maybe someday
### Debian/Ubuntu repository and PPA
Tried to set up it, but for now I have problems described in this issue
https://salsa.debian.org/rust-team/debcargo-conf/-/issues/21
I tried to set up it, but I'm having problems described in this [**issue**](https://salsa.debian.org/rust-team/debcargo-conf/-/issues/21).
### AUR - Arch Linux Package (unofficial)
Czkawka is also available in Arch Linux's AUR from which it can be easily downloaded and installed on the system.
Czkawka is also available in Arch Linux's AUR from which it can be easily installed.
```
yay -Syu czkawka-git
```
This is unofficial package, so new versions will not be always available.
*This is unofficial package, so new versions will not be always available.*
### Devel versions
Artifacts from each commit you can also download here - https://github.com/qarmin/czkawka/actions
## Compilation
### Requirements
Rust 1.48 - Czkawka aims to support only the latest stable Rust version
GTK 3.22 - only for GTK backend
### Development versions
Artifacts from each commit can be downloaded [**here**](https://github.com/qarmin/czkawka/actions)
If you want to compile CLI frontend, then just skip lines which contains `gtk` word.
#### Debian/Ubuntu
<!-- Note the #Compilation link if you're changing this! -->
# Compilation
The compilation section is generally not recommended, because you have multiple better sources
of this app than compiling it yourself.
## Requirements
Program | Min | What for
---------|------|------------------------------------------------------------
Rust | 1.48 | Czkawka aims to support only the latest stable Rust version
GTK | 3.22 | Only for the `GTK` backend
If you only want the terminal version without a GUI, just skip all lines about `gtk`.
#### Debian / Ubuntu
```shell
sudo apt install -y curl # Needed by Rust update tool
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Download the latest stable Rust
sudo apt install -y libgtk-3-dev libasound2-dev
```
#### Fedora/CentOS/Rocky Linux
#### Fedora / CentOS / Rocky Linux
```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Download the latest stable Rust
sudo yum install gtk3-devel glib2-devel alsa-lib-devel
```
#### MacOS
#### macOS
You need to install Homebrew and GTK Libraries
```shell
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install rust gtk+3
```
### Windows(Not working yet)
First you need to install Visual C++ components from Visual Studio installer - https://visualstudio.microsoft.com/downloads/
### Windows
*Will be avaible in the future*
<!-- First you need to install Visual C++ components from Visual Studio installer - https://visualstudio.microsoft.com/downloads/
Next install Rust from site https://rustup.rs/
After that the latest GTK 3 runtime must be installed from https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases
-->
## Compilation from source
### Compilation from source
- Download the source
```
git clone https://github.com/qarmin/czkawka.git
@ -152,91 +182,102 @@ cd czkawka
```
cargo run --bin czkawka_gui
```
For Linux-to-Windows cross-building instruction look at the CI.
![GUI](https://user-images.githubusercontent.com/41945903/103371136-fb9cae80-4ace-11eb-8d72-7b4c8ac44260.png)
- Run CLI(this will print help with a lot of examples)
- Run CLI (this will print help with a lot of examples)
```
cargo run --bin czkawka_cli
```
![CLI](https://user-images.githubusercontent.com/41945903/93716816-0bbcfd80-fb72-11ea-8d31-4c87cc2abe6d.png)
<!-- End of compilation section -->
## Benchmarks
Since Czkawka is written in Rust and aims to be a faster alternative to FSlint (written in Python), we need to compare the speed of these tools.
I tested it on SSD Disk 256 GB GoodRam and i7 4770 CPU.
I prepared a directory and performed a test without any folder exceptions(I removed all directories from FSlint and Czkawka from other tabs than Include Directory) which contained 229868 files which took 203,7 GB and 13708 duplicates files in 9117 groups which took 7.90 GB.
Minimum file size to check I set to 1 KB on all programs
| App| Executing Time |
|:----------:|:-------------:|
| FSlint 2.4.7 (Second Run)| 86s |
| Czkawka 1.4.0 (Second Run) | 12s |
| DupeGuru 4.0.4 (Second Run) | 28s |
I used Mprof for checking memory usage FSlint and Dupeguru, for Czkawka I used Heaptrack.
To not get Dupeguru crash I checked smaller directory with 217986 files and 41883 folders.
Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint (which written in Python), we need
to compare the speed of these tools.
| App| Idle Ram | Max Operational Ram Usage | Stabilized after search |
|:----------:|:-------------:|:-------------:|:-------------:|
| FSlint 2.4.7 | 62 MB | 84 MB | 84 MB |
| Czkawka 1.4.0 | 9 MB | 66 MB | 32 MB |
| DupeGuru 4.0.4 | 80 MB | 210 MB | 155 MB |
I tested it on a 256 GB SSD and a i7-4770 CPU.
Similar Images which check 332 files which takes 1,7 GB
I prepared a directory and performed a test without any folder exceptions (I removed all directories from FSlint and
Czkawka from other tabs than Include Directory) which contained 229 868 files, took 203.7 GB and had 13 708 duplicate
files in 9117 groups which took 7.90 GB.
| App| Scan time |
|:----------:|:-------------:|
| Czkawka 1.4.0 | 58s |
| DupeGuru 4.0.4 | 51s |
Minimum file size to check I set to 1 KB on all programs.
Similar Images which check 1421 image files which takes 110,1 MB
| App | Executing Time |
|:---------------------------:|:--------------:|
| FSlint 2.4.7 (Second Run) | 86s |
| Czkawka 1.4.0 (Second Run) | 12s |
| DupeGuru 4.0.4 (Second Run) | 28s |
| App| Scan time |
|:----------:|:-------------:|
| Czkawka 1.4.0 | 25s |
| DupeGuru 4.0.4 | 92s |
So still is a big room for improvements.
I used Mprof for checking memory usage FSlint and DupeGuru, for Czkawka I used Heaptrack.
To not get a crash from DupeGuru I checked a smaller directory with 217 986 files and 41 883 folders.
| App | Idle Ram | Max Operational Ram Usage | Stabilized after search |
|:--------------:|:--------:|:-------------------------:|:-----------------------:|
| FSlint 2.4.7 | 62 MB | 84 MB | 84 MB |
| Czkawka 1.4.0 | 9 MB | 66 MB | 32 MB |
| DupeGuru 4.0.4 | 80 MB | 210 MB | 155 MB |
Similar images which check 332 files which took 1.7 GB
| App | Scan time |
|:--------------:|:----------:|
| Czkawka 1.4.0 | 58s |
| DupeGuru 4.0.4 | 51s |
Similar images which check 1421 image files which took 110.1 MB
| App | Scan time |
|:--------------:|:----------|
| Czkawka 1.4.0 | 25s |
| DupeGuru 4.0.4 | 92s |
<!-- it's a lot of room, not a big room lol -->
So there is still is a lot of room for improvements.
## Comparsion other tools
| | Czkawka | FSlint | DupeGuru |
|:----------:|:-------------:|:-----:|:---:|
| Language | Rust| Python | Python/Objective C |
| OS | Linux, Windows, Mac | Linux | Linux, Windows, Mac|
| Framework | GTK 3 (Gtk-rs)| GTK 2 (PyGTK) | Qt 5 (PyQt)/Cocoa |
| Ram Usage | Low | Medium | Very High |
| Duplicate finder | X | X | X |
| Empty files | X | X | |
| Empty folders | X | X | |
| Temporary files | X | X | |
| Big files | X | | |
| Similar images | X | | X |
| Zeroed Files| X | | |
| Music duplicates(tags) | X | | X |
| Invalid symlinks | X | X | |
| Broken Files | X | | |
| Installed packages | | X | |
| Invalid names | | X | |
| Names conflict | | X | |
| Bad ID | | X | |
| Non stripped binaries | | X | |
| Redundant whitespace | | X | |
| Multiple languages(po) | | X | X |
| Cache support | X | | X |
| Project Activity | High | Very Low | High |
| | Czkawka | FSlint | DupeGuru |
|:----------------------:|:-------:|:----------:|:-----------------:|
| Language | Rust | Python | Python/Obj-C |
| OS | All | Linux only | All |
| Framework | GTK 3 | PyGTK | Qt 5 (PyQt)/Cocoa |
| Ram Usage | Low | Medium | Very High |
| Duplicate finder | • | • | • |
| Empty files | • | • | |
| Empty folders | • | • | |
| Temporary files | • | • | |
| Big files | • | | |
| Similar images | • | | • |
| Zeroed Files | • | | |
| Music duplicates(tags) | • | | • |
| Invalid symlinks | • | • | |
| Broken Files | • | | |
| Installed packages | | • | |
| Invalid names | | • | |
| Names conflict | | • | |
| Bad ID | | • | |
| Non stripped binaries | | • | |
| Redundant whitespace | | • | |
| Multiple languages(po) | | • | • |
| Cache support | • | | • |
| Project Activity | High | Very Low | High |
## Contributions
Contributions to this repository are welcome.
You can help by creating:
You can help by creating a:
- Bug report - memory leaks, unexpected behavior, crashes
- Feature proposals - proposal to change/add/delete some features
- Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality. If the change is bigger, then it's a good idea to open a new issue to discuss changes.
- Documentation - There is [instruction](instructions/Instruction.md) which you can improve.
- Pull Requests - implementing a new feature yourself or fixing bugs, but you have to pay attention to code quality.
If the change is bigger, then it's a good idea to open a new issue to discuss changes.
- Documentation - There is an [instruction](instructions/Instruction.md) which you can improve.
The code should be clean and well formatted (Clippy and fmt are required in each PR).
@ -245,20 +286,22 @@ Czkawka is a Polish word which means _hiccup_.
I chose this name because I wanted to hear people speaking other languages pronounce it.
This name is not as bad as it seems, because I was also thinking about using words like _żółć_, _gżegżółka_ or _żołądź_, but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.
This name is not as bad as it seems, because I was also thinking about using words like _żółć_, _gżegżółka_ or _żołądź_,
but I gave up on these ideas because they contained Polish characters, which would cause difficulty in searching for the project.
At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself for a possible change of the name of the program, but the opinions were extremely mixed.
At the beginning of the program creation, if the response concerning the name was unanimously negative, I prepared myself
for a possible change of the name of the program, and the opinions were extremely mixed.
## License
Code is distributed under MIT license.
Icon is created by [jannuary](https://github.com/jannuary) and licensed CC-BY-4.0.
Windows dark theme is used from AdMin repo - https://github.com/nrhodes91/AdMin with MIT license
Windows dark theme is used from [AdMin repo](https://github.com/nrhodes91/AdMin) with MIT license
Program is completely free to use.
The program is completely free to use.
"Gratis to uczciwa cena" - "Free is a fair price"
## Donations
If you are using the app, I would appreciate a donation for its further development - https://github.com/sponsors/qarmin.
If you are using the app, I would appreciate a donation for its further development, which can be done [here](https://github.com/sponsors/qarmin).

View file

@ -1,59 +1,72 @@
# Instruction
- [Basic information](#basic-informations)
- [Tools - How works?](#tools---how-works)
- [Config/Cache files](#configcache-files)
- [Tools](#tools)
- [Config / Cache files](#configcache-files)
- [GUI](#gui-gtk)
- [CLI](#cli)
- [Tips and tricks](#tips-and-tricks)
## Basic Informations
Czkawka for now contains two independent frontends - Console and Graphical interface which share the core module which contains basic and common functions used by each frontend.
Czkawka for now contains two independent frontends - the terminal and graphical interface which share the core module.
Using Rust language without unsafe code, helps to create safe, fast with small resource requirements.
This code also has good support for multi-threading.
The code has very good support for multithreading, so the better processor/disk the performance should increase exponentially.
# Tools
## Tools - How works?
### Duplicate Finder
Duplicate Finder allows you to search for files and group them according to a predefined criterion:
- **By name** - Groups files by name e.g. `/home/rafal/plik.txt` will be treat like duplicate of file `/home/romb/plik.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know what you are doing.
- **By size** - Groups files by its size(in bytes), which must be exactly the same. It is as fast as the previous mode and usually gives much more correct results with duplicates, but I also do not recommend using it if you do not know what you are doing.
- **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great probability whether the files are identical.
This is the slowest, but almost 100% sure way to check the files.
- **By name** - Groups files by name e.g. `/home/john/cats.txt` will be treated like a duplicate of a file named
`/home/lucy/cats.txt`. This is the fastest method, but it is very unreliable and should not be used unless you know
what you are doing.
Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different files to be considered identical.
- **By size** - Groups files by their size (in bytes and perfect matches only). It is as fast as the previous mode and
usually gives better results with duplicates, but I also do not recommend using it if you do not know what you are doing.
It consists of 3 parts:
- Grouping files of identical size - allows you to throw away files of unique size, which are already known to have no duplicates at this stage.
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in the group is independent of the others).
In each such group a small fragment of each file (2KB) is loaded in turn and then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually allowed me to reduce the time of searching for duplicates even by half.
- Checking the Hash - After leaving files that have the same beginning in groups, you should now check the whole contents of the file to make sure they are identical.
- **By hash** - A mode containing a check of the hash (cryptographic hash) of a given file which determines with great
probability whether the files are identical.
This is the slowest, but almost 100% sure way to check the files.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first Megabyte. It is perfect for quick search of possible duplicate files.
Because the hash is only checked inside groups of files of the same size, it is practically impossible for two different
files to be considered identical.
It consists of 3 steps:
- Grouping files of identical size - allows you to throw away files of unique size, which are already known to have no
duplicates at this stage.
- PreHash check - Each group of files of identical size is placed in a queue using all processor threads (each action in
the group is independent of the others). In each such group a small fragment of each file (2KB) is loaded in turn and
then hashed. All files whose partial hashes are unique within the group are removed from it. Using this step usually
allows me to reduce the time of searching for duplicates even by half.
- Checking the hash - After leaving files that have the same beginning in groups, you should now check the whole contents
of the file to make sure they are identical.
- **By hashmb** - Works the same way as via hash, only in the last phase it does not check the whole file but only its first
megabyte. It is perfect for quick search of possible duplicate files.
### Empty Files
Searching for empty files is rather easy, because we only need to read file metadata and check if its length is 0.
Searching for empty files is easy and fast, because we only need check the file metadata and its length.
### Empty Directories
Empty directories are those that do not contain any other files, symbolic links, etc. unless they are other empty directories.
At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder directly selected by the user) and a flag to indicate whether the given directory is empty(at the beginning each one is potentially empty).
At the beginning, a special entry is created for each directory containing - the parent path (only if it is not a folder
directly selected by the user) and a flag to indicate whether the given directory is empty (at the beginning each one is
set to be potentionally empty).
First, user-defined folders are put into the pool of folders to be checked.
Each element is checked to see if it is
- folder - this folder is added to the check queue as possible empty - `FolderEmptiness::Maybe`
- anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag.
- anything else - the given folder is "poisoned" with the `FolderEmptiness::No` flag, indicating that the folder is no longer
empty. Then each folder directly or indirectly containing the file is also poisoned with the `FolderEmptiness::No` flag.
e.g. There is 4 checked folder which may be empty `/krowa/`, `/krowa/ucho/`, `/krowa/ucho/stos/`, `/krowa/ucho/flaga/`.
Example: there are 4 checked folders which *may* be empty `/cow/`, `/cow/ear/`, `/cow/ear/stack/`, `/cow/ear/flag/`.
In the last one is found a file, so that means that `/krowa/ucho/flaga/` is not empty and also all parents - `/krowa/ucho/` and `/krowa/`.
`/krowa/ucho/stos/` still may be empty.
The last folder contains a file, so that means that `/cow/ear/flag` is not empty and also all its parents - `/cow/ear/` and `/cow/`,
but `/cow/ear/stack/` may still be empty.
Finally, all folders with the flag `FolderEmpriness::Maybe` are considered empty
Finally, all folders with the flag `FolderEmptiness::Maybe` are defaulted to empty.
### Big Files
From each file inside the given path its size is read and then after sorting it, e.g. 50 largest files are displayed.