I've currently got around 2TB of CR2 Canon RAW files, 1TB of MOV/MP4, 600GB of JPGs, - I'm pretty sure a lot of that is duplicate files, but not in any sort of organised back up sense! The storage is there to do that but I want to get everything in the one place then get a proper backup in place. (4TB, 2TB, 5x1TB) Adobe Lightroom has an option to just re-import files from all over the computer and you can tell it to "Ignore Duplicates" however I'm not sure how robust that is... (It also seems to have loads of issues with video files importing so a lot don't get imported!) Does anyone have any suggestions on programs that could help solve this issue? I'd just hate to have a program think something is a duplicate to then end up deleting something there is only one of. (Yes, I know, if it's so important I should have had it better organised and backed up haha) Any help is much appreciated!!!
CCleaner has a built in tool too do this ("Duplicate Finder" under Tools). You can search/compare by different criteria to make sure it is an actual duplicate. Size is normally pretty good I think, but there is an option for "content", which I assume does a bit for bit comparison Just be super careful as any tool of this nature tends to have the ability to search for and then to delete files en masse based upon criteria without manual checking, so if you don't know exactly what you are/it is doing the consequences can be pretty dire.
Yeah I had a look at that, and it seems to find the duplicates pretty reliably, however the deletion process looks to be quite tedious. Ideally I'd like to identify duplicates then move just 1 of each to a new location, then have the option to delete the old duplicates afterwards (some might need to also remain in their original location due to programs needing to access them) - Unless I'm missing something on how you use the program, in which case my bad!!! i.e. Scan Drive D:\ for duplicates Duplicates of File 0001.jpg found - Location - D:\Folder01\0001jpg - D:\Folder02\0001.jpg Copy 0001.jpg to E:\MasterFolder Do you want to delete duplicate 1, 2 or both? Something like that? On a scale of 500,000 files ... A bunch of tools seem to promise this sort of functionality and have good filtering options but they almost always look like they're paid or filled with crapware haha
On reflection (still looking through some of the data manually via WinDirStat) maybe a way to compare entire folders first would be better - as that would eliminate a lot of the actual duplication first before filtering to the per file level (again, only programs I've found that do this require premium versions to be purchased)
Contents themselves. I'll have photos called 4135.cr2 more than once I'd imagine since after 9999 photos it would loop back around. (same goes for the tens of thousands of iPhone photos in .jpg format)
Is VisiPics still any good? It didn't have crapware last time I used it, but that had to have been 10 years ago.
dupeGuru seems to work well. Can move rather than just delete (the main limitation of CCleaner) and it is open source.
It's still good but hasn't been updated in an age, not that it matters much as unless you need one of the planned for features it still does a good job of finding duplicate or similar images.
If it was linux i would have used : Code: find . -type f -print0 | xargs -0 md5sum | sort > /tmp/md5.txt That would give you a list of md5's and corresponding file names from the current directory. So you would end up with list like : Code: 001e6a240da9334b29a9acf703ace65e ./lib/modules/4.4.0-64-generic/kernel/net/netfilter/nf_conntrack_sane.ko 001fb3dec80775b9f72e59745abbb937 ./lib/modules/4.4.0-57-generic/kernel/drivers/watchdog/sbc_epx_c3.ko 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-45/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-57/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-59/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-62/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-63/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-64/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-65/tools/testing/selftests/powerpc/mm/Makefile 001fefe14cb85cf06736da345c5efa7c ./usr/src/linux-headers-4.4.0-66/tools/testing/selftests/powerpc/mm/Makefile You could then see those Makefile are identical according to their MD5 sum, thus they are identical. It would work for binary files too. No idea about Windows though.
DupDetector, just used it today to work through several folders with tens of thousands of image files. It has options to detect by content regardless of name or size, you can set to manually delete/move or auto based on a number of criteria. It also allows you to visually check. Oh and it’s free. https://www.keronsoft.com/dupdetector.html
Update: Gave DupeGuru a try first. Seems to work fine, but I'm just wanting to check one of the parts of the functionality... In the screenshot attached, it doesn't let me do anything with the ones in blue, I'm assuming this is because it's treating these as some sort of "Master" file/folder. (in this case, i've told it to search for identical folders) I suppose what I'd ideally want to do is: A) Move a copy of the "master" folders over to another drive (perhaps into just 1 big folder?) then once I can confirm they are there safely then... B) Delete all previous versions on the orignal drive That way I'm happy they are off the nearly 10 year old drives and onto my new one where I can then back it up. Worst case I'll at least be able to delete the duplicates which is a good start haha! Any ideas or tips on better ways to deal with this mess are more than welcome! (This is only one of the 8 drives scanned ... I know, I'm ashamed of myself )
Not entirely sure mate, sorry. It would appear you can alter some of the search criteria, so maybe worth setting up a "trap" and test to see if it works? Also, Solved my issue above for copying files either in / out of the file structure so that's good. Only other issue I've ran into, and I don't see any way around it yet, is that if Folder A is the same size as folder B but there is a corrupt file in Folder A but not in B, and it wants you to delete B, you'll not know until you access Folder A later on unless you check each individually!
Late to the party on this one, but TreeSize Pro is brilliant: https://www.jam-software.com/treesize/ Use it all the time at work to find identical files on a drive
Visipics does. I'm sure that's what I used a few years back to sort out a folder full of pics, and it picked up duplicates that were different res, allowing me to pick which one I wanted to keep.