1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Storage Finding Duplicate Files - Accurately!

Discussion in 'Tech Support' started by DD_nVidia, 9 Feb 2018.

  1. DD_nVidia

    DD_nVidia Minimodder

    Joined:
    22 Feb 2011
    Posts:
    136
    Likes Received:
    0
    I've currently got around 2TB of CR2 Canon RAW files, 1TB of MOV/MP4, 600GB of JPGs, - I'm pretty sure a lot of that is duplicate files, but not in any sort of organised back up sense!

    The storage is there to do that but I want to get everything in the one place then get a proper backup in place. (4TB, 2TB, 5x1TB)

    Adobe Lightroom has an option to just re-import files from all over the computer and you can tell it to "Ignore Duplicates" however I'm not sure how robust that is... (It also seems to have loads of issues with video files importing so a lot don't get imported!)

    Does anyone have any suggestions on programs that could help solve this issue? I'd just hate to have a program think something is a duplicate to then end up deleting something there is only one of. (Yes, I know, if it's so important I should have had it better organised and backed up [​IMG] haha)


    Any help is much appreciated!!!
     
  2. wolfticket

    wolfticket Downwind from the bloodhounds

    Joined:
    19 Apr 2008
    Posts:
    3,556
    Likes Received:
    646
    CCleaner has a built in tool too do this ("Duplicate Finder" under Tools). You can search/compare by different criteria to make sure it is an actual duplicate. Size is normally pretty good I think, but there is an option for "content", which I assume does a bit for bit comparison

    Just be super careful as any tool of this nature tends to have the ability to search for and then to delete files en masse based upon criteria without manual checking, so if you don't know exactly what you are/it is doing the consequences can be pretty dire.
     
    Last edited: 10 Feb 2018
    silk186 likes this.
  3. DD_nVidia

    DD_nVidia Minimodder

    Joined:
    22 Feb 2011
    Posts:
    136
    Likes Received:
    0
    Yeah I had a look at that, and it seems to find the duplicates pretty reliably, however the deletion process looks to be quite tedious. Ideally I'd like to identify duplicates then move just 1 of each to a new location, then have the option to delete the old duplicates afterwards (some might need to also remain in their original location due to programs needing to access them) - Unless I'm missing something on how you use the program, in which case my bad!!!

    i.e.

    Scan Drive D:\ for duplicates
    Duplicates of File 0001.jpg found
    - Location
    - D:\Folder01\0001jpg
    - D:\Folder02\0001.jpg

    Copy 0001.jpg to E:\MasterFolder

    Do you want to delete duplicate 1, 2 or both?





    Something like that? On a scale of 500,000 files ... :oldconfused:

    A bunch of tools seem to promise this sort of functionality and have good filtering options but they almost always look like they're paid or filled with crapware haha :(
     
  4. DD_nVidia

    DD_nVidia Minimodder

    Joined:
    22 Feb 2011
    Posts:
    136
    Likes Received:
    0
    On reflection (still looking through some of the data manually via WinDirStat) maybe a way to compare entire folders first would be better - as that would eliminate a lot of the actual duplication first before filtering to the per file level (again, only programs I've found that do this require premium versions to be purchased)
     
  5. faugusztin

    faugusztin I *am* the guy with two left hands

    Joined:
    11 Aug 2008
    Posts:
    6,953
    Likes Received:
    270
    It would be good to define what you consider duplicates. Duplicate files in name, or in contents ?
     
  6. DD_nVidia

    DD_nVidia Minimodder

    Joined:
    22 Feb 2011
    Posts:
    136
    Likes Received:
    0
    Contents themselves. I'll have photos called 4135.cr2 more than once I'd imagine since after 9999 photos it would loop back around. (same goes for the tens of thousands of iPhone photos in .jpg format)
     
  7. Cheapskate

    Cheapskate Insane? or just stupid?

    Joined:
    13 May 2007
    Posts:
    12,411
    Likes Received:
    1,968
    Is VisiPics still any good? It didn't have crapware last time I used it, but that had to have been 10 years ago.
     
    Corky42 likes this.
  8. wolfticket

    wolfticket Downwind from the bloodhounds

    Joined:
    19 Apr 2008
    Posts:
    3,556
    Likes Received:
    646
    dupeGuru seems to work well. Can move rather than just delete (the main limitation of CCleaner) and it is open source.
     
    Last edited: 10 Feb 2018
  9. Corky42

    Corky42 Where's walle?

    Joined:
    30 Oct 2012
    Posts:
    9,648
    Likes Received:
    388
    It's still good but hasn't been updated in an age, not that it matters much as unless you need one of the planned for features it still does a good job of finding duplicate or similar images.
     
  10. faugusztin

    faugusztin I *am* the guy with two left hands

    Joined:
    11 Aug 2008
    Posts:
    6,953
    Likes Received:
    270
    If it was linux i would have used :
    Code:
    find . -type f -print0 | xargs -0 md5sum | sort > /tmp/md5.txt
    That would give you a list of md5's and corresponding file names from the current directory. So you would end up with list like :
    Code:
    001e6a240da9334b29a9acf703ace65e  ./lib/modules/4.4.0-64-generic/kernel/net/netfilter/nf_conntrack_sane.ko
    001fb3dec80775b9f72e59745abbb937  ./lib/modules/4.4.0-57-generic/kernel/drivers/watchdog/sbc_epx_c3.ko
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-45/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-57/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-59/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-62/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-63/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-64/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-65/tools/testing/selftests/powerpc/mm/Makefile
    001fefe14cb85cf06736da345c5efa7c  ./usr/src/linux-headers-4.4.0-66/tools/testing/selftests/powerpc/mm/Makefile
    
    You could then see those Makefile are identical according to their MD5 sum, thus they are identical. It would work for binary files too.

    No idea about Windows though.
     
  11. RedFlames

    RedFlames ...is not a Belgian football team

    Joined:
    23 Apr 2009
    Posts:
    15,401
    Likes Received:
    2,996
  12. yodasarmpit

    yodasarmpit Modder

    Joined:
    27 May 2002
    Posts:
    11,428
    Likes Received:
    237
    DupDetector, just used it today to work through several folders with tens of thousands of image files.

    It has options to detect by content regardless of name or size, you can set to manually delete/move or auto based on a number of criteria.

    It also allows you to visually check.

    Oh and it’s free.

    https://www.keronsoft.com/dupdetector.html
     
  13. DD_nVidia

    DD_nVidia Minimodder

    Joined:
    22 Feb 2011
    Posts:
    136
    Likes Received:
    0
    Update:

    Gave DupeGuru a try first. Seems to work fine, but I'm just wanting to check one of the parts of the functionality...

    In the screenshot attached, it doesn't let me do anything with the ones in blue, I'm assuming this is because it's treating these as some sort of "Master" file/folder. (in this case, i've told it to search for identical folders)

    I suppose what I'd ideally want to do is:

    A) Move a copy of the "master" folders over to another drive (perhaps into just 1 big folder?) then once I can confirm they are there safely then...
    B) Delete all previous versions on the orignal drive

    That way I'm happy they are off the nearly 10 year old drives and onto my new one where I can then back it up.

    Worst case I'll at least be able to delete the duplicates which is a good start haha!


    Any ideas or tips on better ways to deal with this mess are more than welcome! (This is only one of the 8 drives scanned ... I know, I'm ashamed of myself :wallbash:)


    duplicates.jpg
     
  14. yuusou

    yuusou Multimodder

    Joined:
    5 Nov 2006
    Posts:
    2,852
    Likes Received:
    916
    Do any of these dedup software search for duplicate images at different resolutions?
     
  15. DD_nVidia

    DD_nVidia Minimodder

    Joined:
    22 Feb 2011
    Posts:
    136
    Likes Received:
    0
    Not entirely sure mate, sorry. It would appear you can alter some of the search criteria, so maybe worth setting up a "trap" and test to see if it works?




    Also, Solved my issue above for copying files either in / out of the file structure so that's good.

    Only other issue I've ran into, and I don't see any way around it yet, is that if Folder A is the same size as folder B but there is a corrupt file in Folder A but not in B, and it wants you to delete B, you'll not know until you access Folder A later on unless you check each individually!
     
  16. Chairboy

    Chairboy I want something good to die for...

    Joined:
    10 Jun 2004
    Posts:
    1,773
    Likes Received:
    112
  17. Cerberus90

    Cerberus90 Car Spannerer

    Joined:
    23 Apr 2009
    Posts:
    7,666
    Likes Received:
    208
    Visipics does. I'm sure that's what I used a few years back to sort out a folder full of pics, and it picked up duplicates that were different res, allowing me to pick which one I wanted to keep.
     
  18. B1GBUD

    B1GBUD ¯\_(ツ)_/¯ Accidentally Funny

    Joined:
    29 May 2008
    Posts:
    3,557
    Likes Received:
    558

Share This Page