1
0
Fork 0
mirror of synced 2024-05-18 03:13:36 +12:00

Microoptimizations

This commit is contained in:
Rafał Mikrut 2023-12-04 23:27:26 +01:00
parent 51198c2043
commit 4f6fe076a7
9 changed files with 91 additions and 95 deletions

View file

@ -2,6 +2,8 @@
**Czkawka** (_tch•kav•ka_ (IPA: [ˈʧ̑kafka]), "hiccup" in Polish) is a simple, fast and free app to remove unnecessary files from your computer.
**Krokiet** ((IPA: [ˈkrɔcɛt]), "croquet" in Polish) same as above, but uses Slint frontend.
## Features
- Written in memory-safe Rust
- Amazingly fast - due to using more or less advanced algorithms and multithreading
@ -9,7 +11,7 @@
- Multiplatform - works on Linux, Windows, macOS, FreeBSD and many more
- Cache support - second and further scans should be much faster than the first one
- CLI frontend - for easy automation
- GUI frontend - uses GTK 4 framework and looks similar to FSlint
- GUI frontend - uses GTK 4 or Slint frameworks
- No spying - Czkawka does not have access to the Internet, nor does it collect any user information or statistics
- Multilingual - support multiple languages like Polish, English or Italian
- Multiple tools to use:
@ -36,9 +38,18 @@ Each tool uses different technologies, so you can find instructions for each of
## Benchmarks
Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint or DupeGuru which are written in Python, we need to compare the speed of these tools.
Previous benchmark was done mostly with two python project - dupeguru and fslint.
Both were written in python so it was mostly obvious that Czkawka will be faster due using more low-level functions and faster language.
I tested it on a 256 GB SSD and an i7-4770 CPU.
I tried to use rmlint gui but it not even started on my computer, so instead I used Detwinner, fclones-gui and dupeguru.
I tested it on a 1024 GB SSD(Sata 3) and an i7-4770 CPU(4/8HT), disk contains 1742102 files which took 850 GB
Minimum file size 64KB, with search in hidden folders without any excluded folders/files.
Czkawka 7.0.0
Detwinner 0.4.2
Dupeguru 4.3.1
Fclones-gui 0.2.0
I prepared a disk and performed a test without any folder exceptions and with disabled ignoring of hard links. The disk contained 363 215 files, took 221,8 GB and had 62093 duplicate files in 31790 groups which occupied 4,1 GB.
@ -83,38 +94,40 @@ Similar images which check 349 image files that occupied 1.7 GB
| DupeGuru 4.1.1 (First Run) | 55s |
| DupeGuru 4.1.1 (Second Run) | 1s |
Of course there are multiple tools that offer even better performance, but usually are only specialized in one simple area.
## Comparison to other tools
Bleachbit is a master at finding and removing temporary files, while Czkawka only finds the most basic ones. So these two apps shouldn't be compared directly or be considered as an alternative to one another.
In this comparison remember, that even if app have same features they may work different(e.g. one app may have more options to choose than other).
| | Czkawka | Krokiet | FSlint | DupeGuru | Bleachbit |
|:------------------------:|:-----------:|:-----------:|:------:|:------------------:|:-----------:|
| Language | Rust | Rust | Python | Python/Obj-C | Python |
| Framework base language | C | Rust | C | C/C++/Obj-C/Swift | C |
| Framework | GTK 4 | Slint | PyGTK2 | Qt 5 (PyQt)/Cocoa | PyGTK3 |
| OS | Lin,Mac,Win | Lin,Mac,Win | Lin | Lin,Mac,Win | Lin,Mac,Win |
| Duplicate finder | ✔ | ✔ | ✔ | ✔ | |
| Empty files | ✔ | ✔ | ✔ | | |
| Empty folders | ✔ | ✔ | ✔ | | |
| Temporary files | ✔ | ✔ | ✔ | | ✔ |
| Big files | ✔ | ✔ | | | |
| Similar images | ✔ | ✔ | | ✔ | |
| Similar videos | ✔ | ✔ | | | |
| Music duplicates(tags) | ✔ | ✔ | | ✔ | |
| Invalid symlinks | ✔ | ✔ | ✔ | | |
| Broken files | ✔ | ✔ | | | |
| Names conflict | ✔ | ✔ | ✔ | | |
| Invalid names/extensions | ✔ | ✔ | ✔ | | |
| Installed packages | | | ✔ | | |
| Bad ID | | | ✔ | | |
| Non stripped binaries | | | ✔ | | |
| Redundant whitespace | | | ✔ | | |
| Overwriting files | | | ✔ | | ✔ |
| Multiple languages | ✔ | | ✔ | ✔ | ✔ |
| Cache support | ✔ | ✔ | | ✔ | |
| In active development | Yes | | No | Yes | Yes |
| | Czkawka | Krokiet | FSlint | DupeGuru | Bleachbit |
|:------------------------:|:-----------:|:-----------:|:------:|:-----------------:|:-----------:|
| Language | Rust | Rust | Python | Python/Obj-C | Python |
| Framework base language | C | Rust | C | C/C++/Obj-C/Swift | C |
| Framework | GTK 4 | Slint | PyGTK2 | Qt 5 (PyQt)/Cocoa | PyGTK3 |
| OS | Lin,Mac,Win | Lin,Mac,Win | Lin | Lin,Mac,Win | Lin,Mac,Win |
| Duplicate finder | ✔ | ✔ | ✔ | ✔ | |
| Empty files | ✔ | ✔ | ✔ | | |
| Empty folders | ✔ | ✔ | ✔ | | |
| Temporary files | ✔ | ✔ | ✔ | | ✔ |
| Big files | ✔ | ✔ | | | |
| Similar images | ✔ | ✔ | | ✔ | |
| Similar videos | ✔ | ✔ | | | |
| Music duplicates(tags) | ✔ | ✔ | | ✔ | |
| Invalid symlinks | ✔ | ✔ | ✔ | | |
| Broken files | ✔ | ✔ | | | |
| Names conflict | ✔ | ✔ | ✔ | | |
| Invalid names/extensions | ✔ | ✔ | ✔ | | |
| Installed packages | | | ✔ | | |
| Bad ID | | | ✔ | | |
| Non stripped binaries | | | ✔ | | |
| Redundant whitespace | | | ✔ | | |
| Overwriting files | | | ✔ | | ✔ |
| Multiple languages | ✔ | | ✔ | ✔ | ✔ |
| Cache support | ✔ | ✔ | | ✔ | |
| In active development | Yes | Yes | No | Yes | Yes |
## Other apps
There are many similar applications to Czkawka on the Internet, which do some things better and some things worse:
@ -123,6 +136,7 @@ There are many similar applications to Czkawka on the Internet, which do some th
- [FSlint](https://github.com/pixelb/fslint) - A little outdated, but still have some tools not available in Czkawka
- [AntiDupl.NET](https://github.com/ermig1979/AntiDupl) - Shows a lot of metadata of compared images
- [Video Duplicate Finder](https://github.com/0x90d/videoduplicatefinder) - Finds similar videos(surprising, isn't it), supports video thumbnails
### CLI
Due to limited time, the biggest emphasis is on the GUI version so if you are looking for really good and feature-packed console apps, then take a look at these:
- [Fclones](https://github.com/pkolaczk/fclones) - One of the fastest tools to find duplicates; it is written also in Rust

View file

@ -675,7 +675,7 @@ pub struct FileToSave {
#[derive(Debug, clap::Args)]
pub struct JsonCompactFileToSave {
#[clap(short, long, value_name = "json-file-name", help = "Saves the results into the compact json file")]
#[clap(short = 'C', long, value_name = "json-file-name", help = "Saves the results into the compact json file")]
pub compact_file_to_save: Option<PathBuf>,
}

View file

@ -14,7 +14,7 @@ use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use crate::common::{check_folder_children, check_if_stop_received, prepare_thread_handler_common, send_info_and_wait_for_ending_all_threads, split_path};
use crate::common_dir_traversal::{common_read_dir, get_lowercase_name, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_dir_traversal::{common_read_dir, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_tool::{CommonData, CommonToolData, DeleteMethod};
use crate::common_traits::{DebugPrint, PrintResults};
@ -155,12 +155,7 @@ impl BigFile {
current_folder: &Path,
) {
atomic_counter.fetch_add(1, Ordering::Relaxed);
let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
return;
};
if !self.common_data.allowed_extensions.matches_filename(&file_name_lowercase) {
if !self.common_data.allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return;
}
@ -178,9 +173,9 @@ impl BigFile {
}
let fe: FileEntry = FileEntry {
path: current_file_name.clone(),
size: metadata.len(),
modified_date: get_modified_time(&metadata, warnings, &current_file_name, false),
path: current_file_name,
size: metadata.len(),
};
fe_result.push((fe.size, fe));

View file

@ -22,7 +22,7 @@ use crate::common::{
IMAGE_RS_BROKEN_FILES_EXTENSIONS, PDF_FILES_EXTENSIONS, ZIP_FILES_EXTENSIONS,
};
use crate::common_cache::{get_broken_files_cache_file, load_cache_from_file_generalized_by_path, save_cache_to_file_generalized};
use crate::common_dir_traversal::{common_read_dir, get_lowercase_name, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_dir_traversal::{common_read_dir, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_tool::{CommonData, CommonToolData, DeleteMethod};
use crate::common_traits::*;
@ -185,13 +185,11 @@ impl BrokenFiles {
fn get_file_entry(&self, atomic_counter: &Arc<AtomicUsize>, entry_data: &DirEntry, warnings: &mut Vec<String>, current_folder: &Path) -> Option<FileEntry> {
atomic_counter.fetch_add(1, Ordering::Relaxed);
let file_name_lowercase = get_lowercase_name(entry_data, warnings)?;
if !self.common_data.allowed_extensions.matches_filename(&file_name_lowercase) {
if !self.common_data.allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return None;
}
let file_name_lowercase = entry_data.file_name().to_string_lossy().to_lowercase();
let type_of_file = check_extension_availability(&file_name_lowercase);
if !check_if_file_extension_is_allowed(&type_of_file, &self.checked_types) {

View file

@ -529,11 +529,7 @@ fn process_file_in_file_mode(
minimal_file_size: u64,
maximal_file_size: u64,
) {
let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
return;
};
if !allowed_extensions.matches_filename(&file_name_lowercase) {
if !allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return;
}
@ -653,11 +649,7 @@ fn process_symlink_in_symlink_mode(
directories: &Directories,
excluded_items: &ExcludedItems,
) {
let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
return;
};
if !allowed_extensions.matches_filename(&file_name_lowercase) {
if !allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return;
}

View file

@ -1,8 +1,10 @@
use crate::common_messages::Messages;
use std::collections::HashSet;
use std::fs::DirEntry;
#[derive(Debug, Clone, Default)]
pub struct Extensions {
file_extensions: Vec<String>,
file_extensions_hashset: HashSet<String>,
}
impl Extensions {
@ -28,26 +30,24 @@ impl Extensions {
continue;
}
if !extension.starts_with('.') {
extension = format!(".{extension}");
if extension.starts_with('.') {
extension = extension[1..].to_string();
}
if extension[1..].contains('.') {
if extension.contains('.') {
messages.warnings.push(format!("{extension} is not valid extension because contains dot inside"));
continue;
}
if extension[1..].contains(' ') {
if extension.contains(' ') {
messages.warnings.push(format!("{extension} is not valid extension because contains empty space inside"));
continue;
}
if !self.file_extensions.contains(&extension) {
self.file_extensions.push(extension);
}
self.file_extensions_hashset.insert(extension);
}
if self.file_extensions.is_empty() {
if self.file_extensions_hashset.is_empty() {
messages
.messages
.push("No valid extensions were provided, so allowing all extensions by default.".to_string());
@ -57,32 +57,36 @@ impl Extensions {
pub fn matches_filename(&self, file_name: &str) -> bool {
// assert_eq!(file_name, file_name.to_lowercase());
if !self.file_extensions.is_empty() && !self.file_extensions.iter().any(|e| file_name.ends_with(e)) {
if !self.file_extensions_hashset.is_empty() && !self.file_extensions_hashset.iter().any(|e| file_name.ends_with(e)) {
return false;
}
true
}
pub fn check_if_entry_ends_with_extension(&self, entry_data: &DirEntry) -> bool {
if self.file_extensions_hashset.is_empty() {
return true;
}
let file_name = entry_data.file_name();
let Some(file_name_str) = file_name.to_str() else { return false };
let Some(extension_idx) = file_name_str.rfind('.') else { return false };
let extension = &file_name_str[extension_idx + 1..];
if extension.chars().all(|c| c.is_ascii_lowercase()) {
self.file_extensions_hashset.contains(extension)
} else {
self.file_extensions_hashset.contains(&extension.to_lowercase())
}
}
pub fn using_custom_extensions(&self) -> bool {
!self.file_extensions.is_empty()
!self.file_extensions_hashset.is_empty()
}
pub fn extend_allowed_extensions(&mut self, file_extensions: &[&str]) {
for extension in file_extensions {
assert!(extension.starts_with('.'));
self.file_extensions.push((*extension).to_string());
let extension_without_dot = extension.trim_start_matches('.');
self.file_extensions_hashset.insert(extension_without_dot.to_string());
}
}
pub fn validate_allowed_extensions(&mut self, file_extensions: &[&str]) {
let mut current_file_extensions = Vec::new();
for extension in file_extensions {
assert!(extension.starts_with('.'));
if self.file_extensions.contains(&(*extension).to_string()) {
current_file_extensions.push((*extension).to_string());
}
}
self.file_extensions = current_file_extensions;
}
}

View file

@ -180,7 +180,7 @@ impl SameMusic {
if !self.common_data.allowed_extensions.using_custom_extensions() {
self.common_data.allowed_extensions.extend_allowed_extensions(AUDIO_FILES_EXTENSIONS);
} else {
self.common_data.allowed_extensions.validate_allowed_extensions(AUDIO_FILES_EXTENSIONS);
self.common_data.allowed_extensions.extend_allowed_extensions(AUDIO_FILES_EXTENSIONS);
if !self.common_data.allowed_extensions.using_custom_extensions() {
return true;
}

View file

@ -24,7 +24,7 @@ use crate::common::{
send_info_and_wait_for_ending_all_threads, HEIC_EXTENSIONS, IMAGE_RS_SIMILAR_IMAGES_EXTENSIONS, RAW_IMAGE_EXTENSIONS,
};
use crate::common_cache::{get_similar_images_cache_file, load_cache_from_file_generalized_by_path, save_cache_to_file_generalized};
use crate::common_dir_traversal::{common_read_dir, get_lowercase_name, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_dir_traversal::{common_read_dir, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_tool::{CommonData, CommonToolData, DeleteMethod};
use crate::common_traits::{DebugPrint, PrintResults, ResultEntry};
use crate::flc;
@ -156,7 +156,7 @@ impl SimilarImages {
} else {
self.common_data
.allowed_extensions
.validate_allowed_extensions(&[IMAGE_RS_SIMILAR_IMAGES_EXTENSIONS, RAW_IMAGE_EXTENSIONS, HEIC_EXTENSIONS].concat());
.extend_allowed_extensions(&[IMAGE_RS_SIMILAR_IMAGES_EXTENSIONS, RAW_IMAGE_EXTENSIONS, HEIC_EXTENSIONS].concat());
if !self.common_data.allowed_extensions.using_custom_extensions() {
return true;
}
@ -194,7 +194,6 @@ impl SimilarImages {
let Ok(file_type) = entry_data.file_type() else {
continue;
};
if file_type.is_dir() {
check_folder_children(
&mut dir_result,
@ -226,6 +225,8 @@ impl SimilarImages {
}
}
}
eprintln!("Tested {} files", atomic_counter.load(Ordering::Relaxed));
eprintln!("Imagest to check {}", self.images_to_check.len());
send_info_and_wait_for_ending_all_threads(&progress_thread_run, progress_thread_handle);
@ -233,11 +234,7 @@ impl SimilarImages {
}
fn add_file_entry(&self, current_folder: &Path, entry_data: &DirEntry, fe_result: &mut Vec<(String, FileEntry)>, warnings: &mut Vec<String>) {
let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
return;
};
if !self.common_data.allowed_extensions.matches_filename(&file_name_lowercase) {
if !self.common_data.allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return;
}

View file

@ -18,7 +18,7 @@ use crate::common::{
check_folder_children, check_if_stop_received, delete_files_custom, prepare_thread_handler_common, send_info_and_wait_for_ending_all_threads, VIDEO_FILES_EXTENSIONS,
};
use crate::common_cache::{get_similar_videos_cache_file, load_cache_from_file_generalized_by_path, save_cache_to_file_generalized};
use crate::common_dir_traversal::{common_read_dir, get_lowercase_name, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_dir_traversal::{common_read_dir, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_tool::{CommonData, CommonToolData, DeleteMethod};
use crate::common_traits::{DebugPrint, PrintResults, ResultEntry};
use crate::flc;
@ -135,7 +135,7 @@ impl SimilarVideos {
if !self.common_data.allowed_extensions.using_custom_extensions() {
self.common_data.allowed_extensions.extend_allowed_extensions(VIDEO_FILES_EXTENSIONS);
} else {
self.common_data.allowed_extensions.validate_allowed_extensions(VIDEO_FILES_EXTENSIONS);
self.common_data.allowed_extensions.extend_allowed_extensions(VIDEO_FILES_EXTENSIONS);
if !self.common_data.allowed_extensions.using_custom_extensions() {
return true;
}
@ -213,11 +213,7 @@ impl SimilarVideos {
}
fn add_video_file_entry(&self, entry_data: &DirEntry, fe_result: &mut Vec<(String, FileEntry)>, warnings: &mut Vec<String>, current_folder: &Path) {
let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
return;
};
if !self.common_data.allowed_extensions.matches_filename(&file_name_lowercase) {
if !self.common_data.allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return;
}