A multithreaded duplicate file finder that uses xxHash32
This repository has been archived on 2025-09-28. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.
Find a file
2023-05-06 20:02:13 +02:00
images Updated and improved the README 2023-05-06 00:05:28 +02:00
src/main/java/net/chaoticbyte/xxsherly Changed commandline arguments and how they are parsed, refactored the code, updated the README, improved output 2023-05-06 20:01:17 +02:00
.gitignore Ported the project to maven, updated README, fixed some minor issues in the code 2023-05-05 20:56:59 +02:00
LICENSE Updated LICENSE, changed package to net.chaoticbyte.xxsherly, reimplemented cli argument -n/-noinput 2023-05-05 21:18:31 +02:00
pom.xml Bumped the version to 2.0 2023-05-06 20:02:13 +02:00
README.md Changed commandline arguments and how they are parsed, refactored the code, updated the README, improved output 2023-05-06 20:01:17 +02:00

xxSherly

A fork of Sherly, using xxHash.

Introduction

Sherly is a Multithreaded Duplicate File Finder for your Terminal, written in java. You can Easily find duplicate Images, videos as well as any other type of Data. That can be helpful if you run on small storage or just want to keep regular housekeeping.

This fork uses xxHash instead of MD5 for performance reasons (see Speed comparison). Note that xxHash is not a cryptographic hash function and therefore may produce collisions. That's why the checksum is composed of the xxHash Digest and the filesize.

Usage

usage: xxSherly.jar [options] folder1 folder2 ...
 -c,--color           enable colored output
 -d,--delete          delete all dups except one, without asking first
 -h,--help            show this help message
 -n,--noinput         skip all user input
 -p,--progress        enable progress indicator
 -t,--threads <arg>   override default thread number (defaults to the
                      number of cores)
 -v,--verbose         more verbose output

Build

mvn package assembly:single

Supported Platforms

OS Working Version
Linux Yes 1.0
Windows 10/11 Not yet tested -
macOS Not yet tested -
BSD Not yet tested -

Speed comparison

I let Sherly v1.1.4 and xxSherly v1.0 find duplicates in my Music Library (containing .wav files) using the following commands:

time java -jar Bin/sherly.jar -n -f ~/Music/
time java -jar target/xxSherly-1.0-jar-with-dependencies.jar -n -f ~/Music/

The timings are measured using the Linux tool time (real).

Sherly xxSherly
1st run 4.055s 2.561s
2nd run 4.055s 2.304s
3rd run 4.066s 2.549s
avg 4.059s 2.471s