1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Storage ReFS 6 Disk RAID 0 Recovery

Discussion in 'Tech Support' started by LordLuciendar, 8 Feb 2016.

  1. LordLuciendar

    LordLuciendar meh.

    Joined:
    16 Sep 2007
    Posts:
    334
    Likes Received:
    5
    So here's the basic situation, I have a 6 disk RAID 0, power was interrupted, and a disk fell off the array and failed the array. The disk's RAID metadata simply got corrupted so that it no longer thought it was a part of the array.

    I attempted to re-add the drive, but although Intel RST could see the array on ICH10R, it did not provide me with the option to create drives or do any management tasks. I tried updating to 14.8.0.1042, rolling back to the last stable release for ICH10R which is 10.1.0.1008. When that failed I suspected that the RAID card BIOS of my Intel SRCSAS18E RAID card in this system was overriding the ICH10R RAID BIOS. I suspected this because during reboots the Intel RST RAID interface (Ctrl+I) did not appear, but the SRCSAS18E BIOS (Ctrl+G) was shown.

    I removed the server from the rack, pulled the SRCSAS18E, and voila, the ICH10R RAID BIOS appeared and I was suddenly able to manage the arrays from RST within the operating system. The first thing I did was delete the array (very bad move) and attempt to recreate it using default values.

    As the disk appeared uninitialized, I tried repairing the partitioning table using Active @ Partition Recovery to write a default GPT table. I've since realized this was a bad move because I've overwritten the first few stripes of data. After several attempts at recovery, I realized I might not have used the default 128K stripe size, I might have used 4k or 8k. I might also have used ReFS rather than NTFS to manage the 11TB volume, which Active's software does not list as a data recovery option.

    So my question is... where do you think I should go from here.

    I don't know the original stripe size, so I have to run each data recovery option against each possibility (4k, 8k, 128k... and if those fail, 16k, 32k, and 64k), which means deleting and recreating the array each time.
    I don't know if the volume is NTFS or ReFS (I would lean towards ReFS) and my current data recovery software isn't aware of ReFS.
    I rebuilt the partition table (GPT) at each stripe size (4k, 8k, 128k), so any remnant of the original partition table is likely overwritten.

    Options include:
    Running a different recovery software, like EaseUS, that is ReFS aware?
    Do I keep running Active @'s SuperScan on each partition configuration?
    Do I pull the drives and send them to a data recovery service like Kroll OnTrack (the data recovery company that recovered data from the space shuttle Colombia in 2003)?

    Note, these drives held about 5TB of data and almost all of it is replaceable, most of it was movies, music, app installers, etc. There is only about 100GB of pictures, documents, etc. that we need to recover.
     
  2. noizdaemon666

    noizdaemon666 I'm Od, Therefore I Pwn

    Joined:
    15 Jun 2010
    Posts:
    6,096
    Likes Received:
    804
    If the 100GB of data is completely irreplaceable and you 100% need it, I'd get a data recovery company involved. I wouldn't use Kroll as they're expensive for what they do. Yes they're good, but not better than others.
     
  3. LordLuciendar

    LordLuciendar meh.

    Joined:
    16 Sep 2007
    Posts:
    334
    Likes Received:
    5
    Yes, Kroll came back with a $250 fee for data recovery assessment, $3k-8k for recovery. I only intend to try them as a last ditch effort.

    I should qualify my scenario. I perform data recovery for clients regularly. I have access to a full suite of forensic tools, but most of my tools are older versions not yet updated for ReFS compatibility.

    Does anyone have any suggestions for a ReFS file recovery tool?

    Such as:
    • UFS Explorer
    • R-Studio Data Recovery
    • Remo Recover
    • ReclaiMe File Recovery
    • LSoft Active Uneraser
    • DiskInternals Uneraser
    • EaseUS Data Recovery Wizard
     
  4. nimbu

    nimbu Multimodder

    Joined:
    28 Nov 2002
    Posts:
    2,596
    Likes Received:
    283
    Cant comment on ReFS recovery but on a side note. I have used Kroll a few times in the past. The assessment is actually very comprehensive giving you a list of files that can be recovered and iirc % chance of corruption.
     
  5. Xlog

    Xlog Minimodder

    Joined:
    16 Dec 2006
    Posts:
    714
    Likes Received:
    80
    First of all, what where you smoking then setting up 6 disk RAID0 array (maybe not you? still want to know what that person was smoking), why was there irreplaceable data on it in the first place?
    Second of all, any attempts at array recovery should be done in read only mode.
    I'd say go with Kroll before you do any more damage.

    If you insist on doing it yourself - make copies of disks first and if you need to do any writes, do it on those copies.
     
  6. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,725
    I'm with Xlog. The '0' in 'RAID0' is the percentage likelihood that you'll get any data back in the event of the loss of one or more disks. You need to send the disks to a professional outfit, and even then you're almost certainly not going to get the data back.

    Reasons to use RAID0: Temporary storage of data where speed is paramount.
    Reasons not to use RAID0: You're playing Russian Roulette with your data, only you're using a semiautomatic pistol instead of a revolver.
     
  7. LordLuciendar

    LordLuciendar meh.

    Joined:
    16 Sep 2007
    Posts:
    334
    Likes Received:
    5
    Let me see if I can find a way to articulate this without seeming defensive:

    I am quite familiar with the volatility of data integrity on a RAID 0. A single disk failure in a RAID 0 renders data essentially unrecoverable. Data recovery of the array after a single disk failure is only able to recover files that are smaller than the stripe size, so less than 128k with the Intel RST default stripe size.

    With that said, almost every RAID array I have in my house is a RAID 0. Why do I run the risk of data loss you might ask? Because I did a 6 month research project on the efficiency of storage performance for a client and determined that for most data sets it is far more efficient to configure many arrays/servers with RAID 0, then to ensure that data is always in at least two places at once (via DFS for example, or a very aggressive backup schedule).

    Of the 6TB of data on this server, approximately 5.9TB of it was unimportant, backed up, or duplicated somewhere else. The reason I need to recover data is because I had just rebuilt a workstation, eliminating the redundancy of that data for a short period of time, and in that time this power flicker occurred.

    I should also note that my RAID 0 array did not suffer a disk failure. I have had Intel RAID 0 arrays fail before and restored function instantly each time simply by re-adding the disk. The platters in this disk spin, data is still available, all I need to do is align the stripes to make the data readable as though it is one big disk and to use a data recovery program to read the data since the partition table is gone.

    As an update on that part of the scenario, I dropped some money on R-Studio. I was able to detect a significant amount of fragments by scanning the array at 128k, but the data was unretrievable. I believe this is because the stripe size was off. I am re-running at 8k and I will re-rerun at every stripe size until I get results.
     
  8. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,725
    Do you, or does the client, have that study available publicly? I'm willing to stick my journo hat on and ask profeshunally: Linux User & Developer would *love* to run a piece about that, especially as it basically flies in the face of generally-accepted best practice.

    Let me know if you and/or your client fancy being in a magazine. Fascinating stuff!

    If not, could you at least reveal the benefit of RAID0 over JBOD (or a more robust RAID, like RAID6) revealed by the study? What applications are you using that require the extra speed over a network connection?
     
  9. LordLuciendar

    LordLuciendar meh.

    Joined:
    16 Sep 2007
    Posts:
    334
    Likes Received:
    5
    I'd have to rewrite the report I wrote, quite a bit of it is confidential, but having an article published seems like a good reason to do it. Only thing is, the environment is 100% Windows.

    I can provide some information though. The workload is mostly indexing and file conversion on large data sets (i.e. billions of files and several TB). There were some serious bottlenecks with their data processing, so I ran a study benchmarking arrays of JBOD, RAID 0, RAID 5, and RAID 10. It also tested different controller types (enterprise SAS card, embedded chipset, Windows Dynamic Disks, Storage Spaces) and different data access types (native, mounted VHDX, VHDX from within a VM) and benchmarks (CDM vs timed file transfer of 1M files (~10GB)).

    The verdict was that controller doesn't matter. Storage spaces and dynamic disks perform just as well as an enterprise controller, even under significant load. VHDs have a less than 1% performance hit, mounted or in a VM, and the performance of a RAID 0 outpaces everything by a huge margin. We still use RAID 10 in servers where uptime is a concern, RAID 1 or 10 for OS drives, RAID 10 for arrays that serve information that is available for public access, etc. For data that is manipulated and processed each day though, RAID 0 arrays across the board, but for every bit of that data there is a process that ensures it is always in multiple locations at once, or for a few temp files, that backups occur every 8hrs or so.

    This was before we started working with ReFS (mixed opinion on that one) and data deduplication (which is amazing as long as you're set up for it). I have a server with 7-8TB of data stored on it that takes up less than 500GB on the host drive.
     
  10. LordLuciendar

    LordLuciendar meh.

    Joined:
    16 Sep 2007
    Posts:
    334
    Likes Received:
    5
    I should also mention that we tested 4 and 6 disk arrays, along with a handful of 1-6 disk arrays. We also tested different cache modes, (write back vs write through) and a handful of other metrics. The tests were done on SAS and SATA 3Gbps controllers, with WD RE4 2TB and Seagate Cheetah 15k 300GB disks. RAID card was an Intel SRCSASJV, onboard controller was an ICH9R backed by a Core 2 Quad equivalent Xeon and 4GB of RAM. We haven't seen the need to re-run the test with newer or higher performance hardware (like SSDs).

    The big takeaway from the study was never, ever, ever... use RAID 5/6. If the data is accessed so infrequently to justify the extra space of RAID 5 over RAID 10, put it on tape or external drives and lock it away someplace safe.
     
  11. Gareth Halfacree

    Gareth Halfacree WIIGII! Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    4 Dec 2007
    Posts:
    17,132
    Likes Received:
    6,725
    Seriously interesting stuff, dude! Windows-only is a bit of a flaw for LU&D, of course, but the meat of the study should be applicable cross-platform - it's only the implementation specifics (like using ReFS and Windows tools for the data duplication) that are Windows-specific. Worst case, I could always try pitching it to PC Pro or similar.

    Don't kill yourself anonymising the study in any sort of rush - especially until you've recovered the data for your client! - but if you and your client want a bit of publicity, let me know. You can email me about it on freelance@halfacree.co.uk.
     

Share This Page