czkawka/README.md

# Czkawka
Czkawka is simple, fast and easy to use alternative to Fslint, written in Rust.  
This is my first ever project in Rust so probably a lot of things are not being written in the most optimal way.

## Features
- Written in fast and memory safe Rust
- CLI frontend, very fast and powerful with rich help
- GUI GTK frontend(Still WIP) - use modern GTK 3 and looks similar to FSlint
- GUI Orbtk frontend(Very early WIP) - alternative GUI with reduced functionality
- Saving results to file - allows to easily read entries found by tool
- Rich search option - allows setting absolute included and excluded directories, set of allowed files extensions or excluded items with * wildcard
- Clean Glade file in which UI can be easily modernized
- Multiple tools to use:
  - Duplicates - Finds duplicates basing on its size(fast), hash(accurate)
  - Empty Folders - Finds empty folders with help of advanced algorithm
  - Big Files - Finds provided number of the biggest files in given location
  - Empty Files - Looks for empty files across disk
  - Temporary Files - Allows finding temporary files 

## TODO
- Comments - a lot of things should be described
- Probably extern argument parser in czkawka-cli could be used 
- More unit tests
- Debian package
- Finding files with debug symbols
- Maybe windows support, but this will need some refactoring in code
- Translation support
- GTK Gui
  - Popups
  - Choosing directories(included, excluded)
  - Popup with type of deleted records
  - Run in another thread searching to be able to pause
- Orbtk GUI
  - Basic selecting included and excluded folders
  - Text field to show informations about number of found folders/files
  - Simple buttons to delete 

## Usage and requirements
Rust 1.46 - probably lower also works fine  
GTK 3.24 - for GTK backend

Precompiled binaries are here(may not work in every Linux distro) - https://github.com/qarmin/czkawka/releases/

For now only Linux(and maybe also macOS) is supported

- Install requirements for GTK
```
apt install -y libgtk-3-dev
```

- Download source
```
git clone https://github.com/qarmin/czkawka.git
cd czkawka
```
- Run GTK GUI(Still WIP)
```
cargo run --bin czkawka_gui
```
![GUI GTK](https://user-images.githubusercontent.com/41945903/94106023-d72f9700-fe3a-11ea-821d-48484afd74fb.png)
- Run alternative Orbtk GUI(Still WIP, and currently stopped due https://github.com/intellij-rust/intellij-rust/issues/5943)
```
cargo run --bin czkawka_gui_orbtk
```
![GUI Orbtk](https://user-images.githubusercontent.com/41945903/92405241-7b27fb80-f135-11ea-9fc4-5ebc2b76b011.png)
- Run CLI(this will print help with a lot of examples)
```
cargo run --bin czkawka_cli
```
![CLI](https://user-images.githubusercontent.com/41945903/93716816-0bbcfd80-fb72-11ea-8d31-4c87cc2abe6d.png)

## Speed
Since Czkawka is written in Rust and aims to be a faster alternative for written in Python - FSlint we need to compare speed of this two tools.

I checked my home directory without any folder exceptions(I removed all directories from FSlint advanced tab) which contained 379359 files and 42445 folders and 50301 duplicated files in 29723 groups which took 450,4 MB.

First run reads file entry and save it to cache so this step is mostly limited by disk performance, and with second run cache helps it so searching is a lot of faster.

Duplicate Checker(Version 0.1.0)

| App| Executing Time |
|:----------:|:-------------:|
| Fslint (First Run)| 140s |
| Fslint (Second Run)| 23s |
| Czkawka CLI Release(First Run) | 128s |
| Czkawka CLI Release(Second Run) | 8s |

| App| Idle Ram | Max Operational Ram Usage |
|:----------:|:-------------:|:-------------:|
| Fslint |  |  |
| Czkawka CLI Release |  |
| Czkawka GTK GUI Release |  |


Empty folder finder

| App| Executing Time |
|:----------:|:-------------:|
| Fslint |  |
| Czkawka CLI Release |  |
| Czkawka GTK GUI Release |  |

Differences should be more visible when using slower processor or faster disk.

## How it works?
### Duplicate Finder
The only required parameter for checking duplicates is included folders `-i`. This parameter validates provided folders - which must have absolute path(without ~ and other similar symbols at the beginning), not contains *(wildcard), be dir(not file or symlink), exists. Later same things are done with excluded folders `-e`. 

Next, this included and excluded folders are optimized due to tree structure of file system:
- Folders which contains another folders are combined(separately for included and excluded) - `/home/pulpet` and `/home/pulpet/a` are combined to `/home/pulpet`
- Included folders which are located inside excluded ones are delete - Included folder `/etc/tomcat/` is deleted because excluded folder is `/etc/`
- Non existed directories are being removed
- Excluded path which are outside included path are deleted - Excluded path `/etc/` is removed if included path is `/home/`
If after optimization there is no included folders, then program ends with non zero value(TODO, this should be handled by returning value).

Next with provided by user minimal size of checked size `-s`, program checks recursively or not included folders and checks files by sizes and put files with same sizes to different boxes. 
Next boxes which contains only one element are removed because files inside that means that are not duplicated.

Now if user select this, then provided is checking hash of file, because may happens that files have equal size, but differ in one or more bytes.

There are two available methods to check hash:
- full(default) - check hash of entire file so this method is slow especially with large files and but there is almost no chance that two different files will be treated like they were a duplicates.
- partial - check hash only max first 1MB of file, so it is a lot of more accurate than only checking size of files, but still there is very small chance that files which were identified as duplicates they are not.

At the end if user used `-delete` option, specified files are removed - All Except Oldest/Newest or Only Oldest/Newest
 

## License
Code is distributed under MIT license.
Add basic logic 2020-08-27 06:49:43 +12:00			`# Czkawka`
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00			`Czkawka is simple, fast and easy to use alternative to Fslint, written in Rust.`
Changed a little describtion 2020-09-27 19:32:44 +13:00			`This is my first ever project in Rust so probably a lot of things are not being written in the most optimal way.`

			`## Features`
			`- Written in fast and memory safe Rust`
			`- CLI frontend, very fast and powerful with rich help`
			`- GUI GTK frontend(Still WIP) - use modern GTK 3 and looks similar to FSlint`
			`- GUI Orbtk frontend(Very early WIP) - alternative GUI with reduced functionality`
			`- Saving results to file - allows to easily read entries found by tool`
			`- Rich search option - allows setting absolute included and excluded directories, set of allowed files extensions or excluded items with * wildcard`
Added probably all functionality to upper notebook(connection of buttons) 2020-09-30 10:00:15 +13:00			`- Clean Glade file in which UI can be easily modernized`
Changed a little describtion 2020-09-27 19:32:44 +13:00			`- Multiple tools to use:`
Improve README 2020-09-27 22:13:06 +13:00			`- Duplicates - Finds duplicates basing on its size(fast), hash(accurate)`
Changed a little describtion 2020-09-27 19:32:44 +13:00			`- Empty Folders - Finds empty folders with help of advanced algorithm`
			`- Big Files - Finds provided number of the biggest files in given location`
			`- Empty Files - Looks for empty files across disk`
Improve README 2020-09-27 22:13:06 +13:00			`- Temporary Files - Allows finding temporary files`
Add basic logic 2020-08-27 06:49:43 +12:00
			`## TODO`
Add some comments 2020-09-02 08:48:20 +12:00			`- Comments - a lot of things should be described`
Add support for version argument e.g. `czkawka_cli --version` 2020-09-23 22:17:19 +12:00			`- Probably extern argument parser in czkawka-cli could be used`
README update 2020-09-19 21:25:58 +12:00			`- More unit tests`
Add support for version argument e.g. `czkawka_cli --version` 2020-09-23 22:17:19 +12:00			`- Debian package`
Better explanation, starting working with GUI 2020-09-04 03:33:43 +12:00			`- Finding files with debug symbols`
Basic changes 2020-08-31 03:18:04 +12:00			`- Maybe windows support, but this will need some refactoring in code`
Maybe translation support 2020-09-04 07:46:22 +12:00			`- Translation support`
Add empty folder support for GUI 2020-09-22 21:24:55 +12:00			`- GTK Gui`
			`- Popups`
Fix some typos in words and remove empty space 2020-09-27 04:16:12 +13:00			`- Choosing directories(included, excluded)`
Add support for version argument e.g. `czkawka_cli --version` 2020-09-23 22:17:19 +12:00			`- Popup with type of deleted records`
Add basic to notebook bar 2020-09-24 04:38:47 +12:00			`- Run in another thread searching to be able to pause`
Respect Clippy and minimal size value 2020-09-23 06:35:37 +12:00			`- Orbtk GUI`
			`- Basic selecting included and excluded folders`
Add support for version argument e.g. `czkawka_cli --version` 2020-09-23 22:17:19 +12:00			`- Text field to show informations about number of found folders/files`
			`- Simple buttons to delete`
Add basic logic 2020-08-27 06:49:43 +12:00
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00			`## Usage and requirements`
Changed a little describtion 2020-09-27 19:32:44 +13:00			`Rust 1.46 - probably lower also works fine`
			`GTK 3.24 - for GTK backend`
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00
Changed a little describtion 2020-09-27 19:32:44 +13:00			`Precompiled binaries are here(may not work in every Linux distro) - https://github.com/qarmin/czkawka/releases/`

			`For now only Linux(and maybe also macOS) is supported`
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00
			`- Install requirements for GTK`
Use different modules for library and GUI and CLI apps 2020-09-02 05:34:39 +12:00			```
			`apt install -y libgtk-3-dev`
			```
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00
Use different modules for library and GUI and CLI apps 2020-09-02 05:34:39 +12:00			`- Download source`
			```
Add some comments 2020-09-02 08:48:20 +12:00			`git clone https://github.com/qarmin/czkawka.git`
Use different modules for library and GUI and CLI apps 2020-09-02 05:34:39 +12:00			`cd czkawka`
			```
Small info extension 2020-09-06 06:26:10 +12:00			`- Run GTK GUI(Still WIP)`
Use different modules for library and GUI and CLI apps 2020-09-02 05:34:39 +12:00			```
			`cargo run --bin czkawka_gui`
			```
Add basic scroll window to include and exclude directory folders 2020-09-24 21:58:59 +12:00			`![GUI GTK](https://user-images.githubusercontent.com/41945903/94106023-d72f9700-fe3a-11ea-821d-48484afd74fb.png)`
			`- Run alternative Orbtk GUI(Still WIP, and currently stopped due https://github.com/intellij-rust/intellij-rust/issues/5943)`
Small info extension 2020-09-06 06:26:10 +12:00			```
			`cargo run --bin czkawka_gui_orbtk`
			```
Added images 2020-09-08 04:14:02 +12:00			`![GUI Orbtk](https://user-images.githubusercontent.com/41945903/92405241-7b27fb80-f135-11ea-9fc4-5ebc2b76b011.png)`
Changed a little describtion 2020-09-27 19:32:44 +13:00			`- Run CLI(this will print help with a lot of examples)`
Add some comments 2020-09-02 08:48:20 +12:00			```
			`cargo run --bin czkawka_cli`
			```
Update images 2020-09-21 04:53:53 +12:00			`![CLI](https://user-images.githubusercontent.com/41945903/93716816-0bbcfd80-fb72-11ea-8d31-4c87cc2abe6d.png)`
Use different modules for library and GUI and CLI apps 2020-09-02 05:34:39 +12:00
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00			`## Speed`
			`Since Czkawka is written in Rust and aims to be a faster alternative for written in Python - FSlint we need to compare speed of this two tools.`

			`I checked my home directory without any folder exceptions(I removed all directories from FSlint advanced tab) which contained 379359 files and 42445 folders and 50301 duplicated files in 29723 groups which took 450,4 MB.`

			`First run reads file entry and save it to cache so this step is mostly limited by disk performance, and with second run cache helps it so searching is a lot of faster.`

Add empty folder support for GUI 2020-09-22 21:24:55 +12:00			`Duplicate Checker(Version 0.1.0)`
Started thinking about GTK rs functions 2020-09-08 05:16:09 +12:00
Fixes some crashes when permissions are denied 2020-09-07 19:06:12 +12:00			`\| App\| Executing Time \|`
			`\|:----------:\|:-------------:\|`
			`\| Fslint (First Run)\| 140s \|`
			`\| Fslint (Second Run)\| 23s \|`
			`\| Czkawka CLI Release(First Run) \| 128s \|`
			`\| Czkawka CLI Release(Second Run) \| 8s \|`

Added support for different buttons events for different notebook tabs 2020-09-10 07:32:23 +12:00			`\| App\| Idle Ram \| Max Operational Ram Usage \|`
			`\|:----------:\|:-------------:\|:-------------:\|`
			`\| Fslint \| \| \|`
			`\| Czkawka CLI Release \| \|`
Add support for version argument e.g. `czkawka_cli --version` 2020-09-23 22:17:19 +12:00			`\| Czkawka GTK GUI Release \| \|`
Added support for different buttons events for different notebook tabs 2020-09-10 07:32:23 +12:00

Started thinking about GTK rs functions 2020-09-08 05:16:09 +12:00			`Empty folder finder`

			`\| App\| Executing Time \|`
			`\|:----------:\|:-------------:\|`
			`\| Fslint \| \|`
			`\| Czkawka CLI Release \| \|`
Add support for version argument e.g. `czkawka_cli --version` 2020-09-23 22:17:19 +12:00			`\| Czkawka GTK GUI Release \| \|`
Started thinking about GTK rs functions 2020-09-08 05:16:09 +12:00
Small glade changes 2020-09-08 02:07:29 +12:00			`Differences should be more visible when using slower processor or faster disk.`
Better explanation, starting working with GUI 2020-09-04 03:33:43 +12:00
Changed a little describtion 2020-09-27 19:32:44 +13:00			`## How it works?`
			`### Duplicate Finder`
			The only required parameter for checking duplicates is included folders `-i`. This parameter validates provided folders - which must have absolute path(without ~ and other similar symbols at the beginning), not contains *(wildcard), be dir(not file or symlink), exists. Later same things are done with excluded folders `-e`.

			`Next, this included and excluded folders are optimized due to tree structure of file system:`
			- Folders which contains another folders are combined(separately for included and excluded) - `/home/pulpet` and `/home/pulpet/a` are combined to `/home/pulpet`
			- Included folders which are located inside excluded ones are delete - Included folder `/etc/tomcat/` is deleted because excluded folder is `/etc/`
			`- Non existed directories are being removed`
			- Excluded path which are outside included path are deleted - Excluded path `/etc/` is removed if included path is `/home/`
			`If after optimization there is no included folders, then program ends with non zero value(TODO, this should be handled by returning value).`

Improve README 2020-09-27 22:13:06 +13:00			Next with provided by user minimal size of checked size `-s`, program checks recursively or not included folders and checks files by sizes and put files with same sizes to different boxes.
			`Next boxes which contains only one element are removed because files inside that means that are not duplicated.`
Changed a little describtion 2020-09-27 19:32:44 +13:00
Improve README 2020-09-27 22:13:06 +13:00			`Now if user select this, then provided is checking hash of file, because may happens that files have equal size, but differ in one or more bytes.`

			`There are two available methods to check hash:`
			`- full(default) - check hash of entire file so this method is slow especially with large files and but there is almost no chance that two different files will be treated like they were a duplicates.`
			`- partial - check hash only max first 1MB of file, so it is a lot of more accurate than only checking size of files, but still there is very small chance that files which were identified as duplicates they are not.`

			At the end if user used `-delete` option, specified files are removed - All Except Oldest/Newest or Only Oldest/Newest

Changed a little describtion 2020-09-27 19:32:44 +13:00
Add basic logic 2020-08-27 06:49:43 +12:00			`## License`
Added some more widgets 2020-09-05 09:09:11 +12:00			`Code is distributed under MIT license.`