1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Windows WHEA'nt getting anywhere resolving this

Discussion in 'Tech Support' started by ModSquid, 22 Jul 2022.

  1. ModSquid

    ModSquid Multimodder

    Joined:
    16 Apr 2011
    Posts:
    2,660
    Likes Received:
    849
    Right. Another day, another issue with this machine. Another grumpy frown that I need to 180 with help from bit-assist™ if possible.

    Out of seemingly nowhere, this box has started BSODing (WHEA Uncorrectable Errors, same each time) and I cannot figure out why. No significant changes to hardware or software that I can think of, no change to overclock, not being put under any particular workload or stress, yet every now and then (I thought it was tied to temps, but they seem reasonable at the time of a crash and I've seen it continue operating when warmer, especially over the last few days) it shuts down one monitor with the BSOD whilst leaving the other one displayed doing whatever it was doing at the time ie. it doesn't shut off. I did post here some time back, but at that point I didn't have as much info to hand and I've only just remembered that thread. I'll close it out if we manage to resolve this one.

    Annoyingly, whenever the crash happens, no error report is created. I have set the option accordingly:

    upload_2022-7-22_8-10-6.png

    ...but it just sits on the crash screen at 0% forever without creating any dump file. I know there's nothing being created because I've used Nirsoft's BlueScreenView to try and retrieve any logs but there are none available. Can't even see them in File Explorer.

    The Windows logs are also about as much use as giving a bikini to a nun, as all they show is that there was a crash and a restart, but not what caused the error:

    8th July Admin Events:

    upload_2022-7-22_8-25-50.png

    Those entries there show reboot without clean shutdown, prev shutdown being unexpected, security centre failing to validate caller and some COM server local activation permission issue. Crash happened between 12.30pm and 3pm from my side but logs seem to narrow that down to just before 2pm.

    Application logs show nothing but corroborate times:

    upload_2022-7-22_8-30-19.png

    Security log shows nothing, Setup log shows nothing unusual on the dates I'm looking at (8th, 13th and 18th):

    upload_2022-7-22_8-31-58.png

    System log shows nothing other than that the machine has shutdown (with a timestamp inconsistency in log entry time vs time log states shutdown happened - is a time sync issue the cause of all this?):

    upload_2022-7-22_8-35-32.png

    upload_2022-7-22_8-36-24.png

    Applications and Services logs are mostly empty:

    upload_2022-7-22_8-39-19.png

    The Admin Events log from the 18th shows what I mean by timestamp discrepancies:

    upload_2022-7-22_8-42-36.png

    The log entry of 12:35 says system shut down at 12:06 - it was obviously running at 12:07 though and with the crash, pause to see if a minidump generated then immediate reboot, it both a) wouldn't have been back up by 12:07 and b) wouldn't have taken until 12:35 either (I restarted immediately).

    I have noticed this occurring several times when I went out and left the machine running with the screens turned off, but then it has also happened when sitting at the desk, so that's probably not a factor. Sometimes I do delay a restart when Windows has annoyingly decided to update itself without asking and wants to reboot whilst I'm in the middle of working - not sure if this has any impact.

    I also get a lot of entries in the logs saying "Disk x has been surprise removed" with x being 4 or 5 usually. I do have a bunch of SSDs and HDDs in the machine, wired up to a power button panel so I can isolate them when I need to boot into different OSs (don't ask - WIP) but without having to go through Disk Management. DM incidentally reports no disk 4 or 5 present, just the uninitialised 4TB backup HDD (Disk 1) that I think is the only one not hooked up to the isolation panel but possibly straight to the board (need to check). The others are all powered off so I wouldn't expect them to show in DM. No idea if this is contributing to the issue.

    I'm out of other ideas, including where to even look for evidence or how to get the minidump to generate to provide a clue. If anyone could point me in a direction that doesn't involve a complete reinstall, I'd be grateful.

    Thanks in advance as always!
     
  2. noizdaemon666

    noizdaemon666 I'm Od, Therefore I Pwn

    Joined:
    15 Jun 2010
    Posts:
    6,099
    Likes Received:
    805
    If it was a normal setup, I'd say test all the hard drives, then test the RAM if they all pass. However, it may be more useful to disconnect the power isolation panel for a while and see if it still BSODs.
    In case you want to go the first route, GSmartControl for HDDs and Memtest86 (boot it in UEFI mode) for RAM.
     
  3. ModSquid

    ModSquid Multimodder

    Joined:
    16 Apr 2011
    Posts:
    2,660
    Likes Received:
    849
    Thanks noiz - appreciated and I'll do all three in order. You thinking it might be the power panel blipping to the disk? How does that explain the monitors staying on (and displaying the last frame on the second screen) though? I do have Crystal disk installed already - is that similar to GSmart?

    I also checked this out:

    upload_2022-7-22_11-26-24.png

    But apart from the weird French Fry Win Update entry (you what now?), nothing really of note there either.
     
  4. noizdaemon666

    noizdaemon666 I'm Od, Therefore I Pwn

    Joined:
    15 Jun 2010
    Posts:
    6,099
    Likes Received:
    805
    GSmart lets you run full manufacturer diagnostic tests on the disks. Crystal Disk lets you benchmark or view the Smart Data. At least that's what it did the last time I used it :hehe:
    It might be power setup, yeah. Are all the SATA ports setup for hotswap?
     
  5. ModSquid

    ModSquid Multimodder

    Joined:
    16 Apr 2011
    Posts:
    2,660
    Likes Received:
    849
    Getting fkg sick of it now - it did its other trick again last night when I went down for dinner. Came back up and the box to all appearances was on, fans spinning, keyboard lit up etc. but the screens were both in standby. As if it had turned the screens off as part of the power plan, but it was completely unresponsive and needed a hard reboot.
    I was actually wrong before when I said the above - I remember now that when I go out and come back, I find it as I've just said here (pretend standby), rather than BSODing. Since it's had to be reset and has lost all my open work again, I'll reboot and check the BIOS for the hotswaps - I thought they were all active; @noizdaemon666 - if they're not, should I enable them all, or should I untick the ones that aren't being used (and does this matter if I have drives connected but turned off at the power panel)? I'll also run those diagnostics later today and see what they throw up - you were right re: Crystal but it reported no issues anyway.

    This morning's logs show plenty of these errors before it apparently went unresponsive last night but no idea what they are:

    upload_2022-7-23_7-25-36.png

    And also the time discrepancy again:

    upload_2022-7-23_7-27-9.png

    It reckons it shut down at 20:20 but was clearly still running at 20:41. I'll leave it running when I pop out later this morning and see what happens. S0d's Law it'll be fine, but I'll run those other tests on it and see what they dig up. I'm also going to have to keep an eye on my thumbs, as either they or the spacebar are now generating double spaces out of the blue...FFS...
     
  6. kim

    kim hardware addict

    Joined:
    10 Jan 2016
    Posts:
    1,318
    Likes Received:
    635
    I allow myself to climb onboard, reading your issues, I have no leads to follow yet, but this is the kind of issue I'd like to understand too, so I hope you don't mind if I join myself in this investigation ...
    I was tempted to tell you to install and run : Whocrashed, to see what it could say about it

    https://www.resplendence.com/download/whocrashedSetup.exe
    but you said the eventlog wasn't even giving further informations, so it might be pointless...:idea: Sometimes, crash report mentions very low Windows system processes, like ntoskrnl.exe ou hal.dll, or else, but when the crash origin is too low, it's almost impossible to find the source with certainty.

    WHEA errors should be notified in WHEA logger with an ID number, witch helps to find the source of the problem like this:

    [​IMG]

    I also thought about Antifreeze :
    https://www.majorgeeks.com/files/details/antifreeze.html

    About this point, I have experienced this kind of oddity in the past crashing with a WHEA error on a PC displaying on 2 screens, what I read is that after a WHEA crash, the system performs a repair that may fail and let it frozen or partially functioning :oldconfused:

    I have to stop here but I'll come back on it ASAP
    have a nice day
     
    Last edited: 23 Jul 2022
  7. sandys

    sandys Multimodder

    Joined:
    26 Mar 2006
    Posts:
    4,932
    Likes Received:
    727
    Have you run everything at base settings, no OC anywhere, bios reset, so mem is clocked to base settings, no xmp etc.
     
  8. ModSquid

    ModSquid Multimodder

    Joined:
    16 Apr 2011
    Posts:
    2,660
    Likes Received:
    849
    I haven't yet, actually - I've got a few bits I urgently need to get out today, but will run noiz's tests and then follow your suggestion if I still keep getting the issue. I only say that because it was running fine at existing settings for a while and only recently seems to have started playing up (although the box I built for the kid has the odd WHEA error too and someone (possibly you?) suggested I back off the o/c and they've definitely reduced in number, but reserving judgement on whether they've gone completely. But that's a story for another thread...), so I'll try and eliminate options as I go. Cheers though, sandys - appreciate the additional options. This issue is doubly annoying as it's difficult to reproduce, so I can't check to see if a change has fixed anything.

    The more the merrier! Welcome to my regular corner on random (often self-inflicted and daft) errors...!
    I've followed your screenshot but I don't even show errors in the log, annoyingly:

    upload_2022-7-23_14-46-52.png

    I do however have some form of note in the Operational section, but not sure if these notes reference the errors that for some reason aren't logging in the correct section:

    upload_2022-7-23_14-48-42.png

    This entry seems to persist for every day there are logged occurrences, but not every day has a log. Always the same text as well.

    Ah, that might explain that then.

    Bear with me, chaps, while I try and clear this desk of jobs and then I'll run those tests and report back :thumb:
     

Share This Page