PeaZip can be used as duplicate files finder, searching binary
identical files in order to de-duplicate data for saving disk space
occupation, or improving compression through elimination of redundancy
of input information, using various detection methods:
in the file manager
When browsing a filesystem (not inside a compressed archive) the file
browser can show file checksum /
value on demand in last column, allowing to identify binary
identical files which have same checksum/hash value.
Clicking the name of the function (in context menu, "File tools" group)
will display hash or checksum value for all (or selected) files.
Clicking "Find duplicates" will display size and hash or checksum value
only for duplicate files - same binary identical content featured in
two or more distinct files - and report the number of non-unique files
In both cases, sorting for CRC column allows to group all files (in
same folder, or same search filter) with identical hash or checksum.
The verification function can be set in main application's menu:
Checksum/hash), a wide selection of algorithms can be selected, ranging
from simple checksum functions as Adler32, CRC
family (CRC16, CRC24,
CRC32, and CRC64) to hash functions like eDonkey/eMule, MD4, MD5
cryptographically strong hash as Ripemd160, SHA-1
SHA384, SHA512), and Whirlpool512.
When browsing an archive this on demand verification is not
available, but (if supported by the archive format) the CRC column will
display data integrity information, i.e. CRC32 in ZIP archives,
allowing to sort archive content by CRC column to group identical files
and find out duplicates.
Identify similar images in
the file manager
When browsing a filesystem, PeaZip can display thumbnails of graphic files
menu, organize, check show picture thumbnails, or select a file
browser's preset style showing thumbnails.
While checksum/hash based inspection allows to find exactly identical
files (and images), thumbnails allows the user to find similar images
(i.e. same picture or graphic saved in different formats, or with
different color depth or compression settings, or scaled to different
sizes), to help in deciding if the (pseudo)duplication is acceptable,
and what copy to keep.
multiple checksum and hash functions at once
"File tools" submenu (context menu) allows to verify multiple hash and
(same featured for the file manager) on multiple files at once, e.g. to
compare a group of file to identify redundant ones, or
files for corruption
when an original
checksum or hash value is
Use of multiple functions, and especially relying on cryptographically
strong hash functions as Ripemd, SHA-2 or Whirlpool, can defeat attempt
of forging identical-looking files, as it is computationally feasible
to find a collision (different input mapped to same output) for simpler
checksum and hash functions.
This way, even a purposely crafted modification of a file would not
pass unnoticed to most sophisticated detection algorithms.
Output value of hashes and checksums can be seen as exadecimal (HEX,
either LSB or MSB) or encoded as
Alternative: byte-to-byte comparison
"File tools" submenu performs byte to byte comparison between two
files; unlike checksum/hash method it is not subject of collisions
circumstance, and can report what the different bytes are - so it not
only tells if two files are not identical, but also what changes were
made between the two versions.
Topics: find duplicate files, deduplicate files,
compare files, detect redundant data, identical files, checksum, hash,
duplicate finder tool, detect duplicates, similar files, check
differences, file hasing,
Related articles: How to improve file
compression performances, Verify
file hash and checksum, How to
encrypted archives, Secure
and join files, Share files
> File management > Find file hash,
calculate checksum, deduplicate data