1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

I dont get this...

Discussion in 'bit-tech Folding Team' started by Votick, 31 May 2009.

  1. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109

    [19:53:45] Folding@home Core Shutdown: UNSTABLE_MACHINE

    ??????
     
  2. Unicorn

    Unicorn Uniform November India

    Joined:
    25 Jul 2006
    Posts:
    12,726
    Likes Received:
    456
    Your first EUE... gratz :( (ealry Unit End btw)

    Unfortunately this happens sometimes. I don't know the specifics of the "exception thrown" error, but EUE's occcur if you have an unstable overclock on hardware, your hardware is overheating, or there is an error in the setup of your client.
     
  3. Paddy4929

    Paddy4929 NangO-Gamer

    Joined:
    24 Apr 2009
    Posts:
    136
    Likes Received:
    3
    Welcome to my world. :D
     
  4. Christopher N. Lew

    Christopher N. Lew Folding in memory of my father

    Joined:
    23 Apr 2009
    Posts:
    1,358
    Likes Received:
    46
    > Run: exception thrown during GuardedRun

    This means there has been an error during the work unit. This does happen, some units are (unknowingly) doomed from the start, or it may be a hardware problem

    > Folding@home Core Shutdown: UNSTABLE_MACHINE

    This is, rather broadly, the type of error.

    > + Results successfully sent
    > [19:53:50] Thank you for your contribution to Folding@Home.

    This means the work done has been successfully returned to Stanford. You will get the partial points for the work completed.

    Keep an eye on these errors, if you get several, then you may have a hardware fault - poor cooling, over-ambitious overclocking, just a rogue card. There is a third-party application called Fahwatch, which will look at your logs and alert you to problems. Worth having for the peace of mind.
     
    Last edited: 1 Jun 2009
  5. Madness_3d

    Madness_3d Bit-Tech/Asus OC Winner

    Joined:
    26 Apr 2009
    Posts:
    1,040
    Likes Received:
    36
    AHA!!! i've been looking for someone else with this error for a while


    Code:
    [14:04:29] Completed 20%
    [14:09:25] Completed 21%
    [14:14:21] Completed 22%
    [14:19:16] Completed 23%
    [14:24:08] Completed 24%
    [14:29:13] Completed 25%
    [14:34:09] Completed 26%
    [14:35:53] + Working...
    [14:39:08] Completed 27%
    [14:44:06] Completed 28%
    [14:49:05] Completed 29%
    [14:54:06] Completed 30%
    [14:59:01] Completed 31%
    [15:03:59] Completed 32%
    [15:08:56] Completed 33%
    [15:13:51] Completed 34%
    [15:18:51] Completed 35%
    [15:23:47] Completed 36%
    [15:27:39] SEH code: 3221225477
    [15:27:39] Run: exception thrown during GuardedRun
    [15:27:39] Run: exception thrown in GuardedRun -- Gromacs cannot continue further.
    [15:27:39] Going to send back what have done -- stepsTotalG=8000000
    [15:27:39] Work fraction=0.3675 steps=8000000.
    [15:27:43] logfile size=173128 infoLength=173128 edr=0 trr=23
    [15:27:43] - Writing 173664 bytes of core data to disk...
    [15:27:43] Done: 173152 -> 5491 (compressed to 3.1 percent)
    [15:27:43]   ... Done.
    [15:27:43] 
    [15:27:43] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [15:27:47] CoreStatus = 7A (122)
    [15:27:47] Sending work to server
    [15:27:47] Project: 5911 (Run 5, Clone 89, Gen 1)
    I'm getting the same guarded run failure. every other EUE i've seen hasnt contained this...
    I can only think its happening to me with the 185 drivers...
    thing is i'm getting through some wu's ok but others are EUEing out.
    so it takes a while to test and be sure.
    @ Votick what hardware are you using & what drivers?
     
  6. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    Im using an 8600GT with the latest drivers from the nvidia website.

    You say it could be cooling? It dose hit about 80+ sometimes so that could be the problem.
    It throws out so much heat that the case can't realy cope so I'v left the lid off for the moment.
    When the lid is on and after a few hours hits about 75C. :( Even goes up past 80, 89C the highest I have seen it using speedfan to measure.
     
  7. Madness_3d

    Madness_3d Bit-Tech/Asus OC Winner

    Joined:
    26 Apr 2009
    Posts:
    1,040
    Likes Received:
    36
    I've investigated mine, by removing my CPU and gfx overclock, I have however only seen this problem since I moved to the 185.85 drivers. I have even tried adjusting my rivatuner auto fan modulation to bring the card to run at load at 72C and even then it still throws these errors. I'm gonna try and backdate but it takes a while to test because it doesn't fail on every wu.
     
  8. JackOfAll

    JackOfAll What's a Dremel?

    Joined:
    23 Apr 2009
    Posts:
    671
    Likes Received:
    6
    I'm coming at this from a Linux viewpoint, having updated the the Linux Wine CUDA wrapper with the additional error code enums and new dummy stub methods for CUDA v2.2, which the 185 drivers ship with. Although, 2.1 -> 2.2 is supposed to be ABI compatible and indeed looks to be, I was getting some very strange behaviour with the CUDA sdk examples, compiled against v2.1, but using v2.2 at runtime. So much so, that knowing the Stanford code is compiled against an earlier version, I'd advocate against using any 185 version driver, regardless of the platform, until Stanford has validated it. I'm not sure that has happened yet. No doubt someone will correct me if I'm wrong, so I'll add the obligatory, YMMV.
     
  9. JackOfAll

    JackOfAll What's a Dremel?

    Joined:
    23 Apr 2009
    Posts:
    671
    Likes Received:
    6
    Just noticed your sig. That's quite an aggressive overclock you have there on that 260 - 747/1533/1250. Maybe not the problem if you've backed it off. But from my experience with the GTX260, I'd be very surprised if that doesn't spit out an EUE, every 1 out of 4 WU's, running with core > 700MHz and shader clock > 1500MHz. Have you bumped the voltage?
     
  10. Unicorn

    Unicorn Uniform November India

    Joined:
    25 Jul 2006
    Posts:
    12,726
    Likes Received:
    456
    You're right, JackOfAll, that is an ambitious overclock to be running f@h on with a 260. Is that air cooled as well? Slack that off a bit, run a few more WU's and see what happens would be my advice.
     
  11. Christopher N. Lew

    Christopher N. Lew Folding in memory of my father

    Joined:
    23 Apr 2009
    Posts:
    1,358
    Likes Received:
    46
    Only 'could be' cooling. Every card is different but I would not expect trouble at 75C, or even 85C. I'm told that most nVidia cards are setup to start throttling back when they reach 105C, so 30 degrees below that could be considered 'safe'. Just keep watch, see if you are getting frequent UNSTABLE_MACHINEs (above one failure in ten units), if not then your hardware is unlikely to be the cause.
     
  12. Madness_3d

    Madness_3d Bit-Tech/Asus OC Winner

    Joined:
    26 Apr 2009
    Posts:
    1,040
    Likes Received:
    36
    Apologies, To clarify that, I'm not folding at those speeds. Those are just the maximum frequencies I've benchmarked my machine at. I have been folding at the stock speeds for my card (640/1363/1150) and have still had these errors. I am considering it to be a hardware failure possibly however as if I overclock my GTX 260 at all my machine locks to a coloured screen under heavy load (furmark) which can lock the machine if there is a spike of activity in games like crysis. Its hooked up to a 650w BeQuiet! Dark Power Pro so I'm pretty sure its not power that's the problem. I am thinking of reverting to 182.50 drivers and seeing if that cures the problem. Im also gonna have a look-see if its a specific wu that my cards tripping up on. Also wondering if it my CPU playing silly buggers with my folding client as it isnt *technically* supported by my motherboard :worried: it works fine as long as I dont disable cool & quiet

    edit: just got an nv4_disp bluescreen while folding :duh: so definately thinking about reverting drivers, although my WU has apparently survived
     
    Last edited: 1 Jun 2009
  13. JackOfAll

    JackOfAll What's a Dremel?

    Joined:
    23 Apr 2009
    Posts:
    671
    Likes Received:
    6
    OK, understood. I just saw the sig, those numbers, and assumed you were folding with the card overclocked at those speeds.

    I'd be inclined to go ditch the 185 drivers for the earlier version, and perhaps even drop your XXX 640/1363/1150 down to 'standard 260' 576/1242/999 speeds, see if you get some stability for a WU or two, before moving your clocks back to XXX levels.
     
  14. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    I just got another
    [19:53:45] Folding@home Core Shutdown: UNSTABLE_MACHINE
    On 8600GT #1

    hmmm :( its not heat.
     
  15. saspro

    saspro IT monkey

    Joined:
    23 Apr 2009
    Posts:
    9,613
    Likes Received:
    404
    run the gpu memort test too for 2000 goes.
    Try running 3dm06 & the crysis benchmark
     
  16. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    the 8500GT has errored too :(

    I will try the mem tests :) thanks
     
  17. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    Memtest has come back fine.


    hmm any other idears?

    could it be a dodgy WU?
    the 700+ point ones seem to be the ones that screw up :(
     
  18. saspro

    saspro IT monkey

    Joined:
    23 Apr 2009
    Posts:
    9,613
    Likes Received:
    404
    Run the memtest for 2000 goes. This should take a while to run

    What's the rest of the kit?

    Have you tried reinstalling the client?
     
  19. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    I ran the mem test and left it going came back and it had finished and closed down.
    0 errors

    Rest of the kit is pretty basic
    AMD Athlon 3500+ CPU
    1GB RAM

    I haven't try'd re-installing yet
    It's doing another 700+ point WU now so im going to see if it dies again.
     
  20. Votick

    Votick My CPU's hot but my core runs cold.

    Joined:
    21 May 2009
    Posts:
    2,321
    Likes Received:
    109
    Nope ok client re-install coming up.
    Another shut down happend :(
     

Share This Page