1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Stability issues

Discussion in 'bit-tech Folding Team' started by mercinarynurse, 2 May 2009.

  1. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    Hi folks.

    This is my first post since custom pc forums shut down so if this problem seems familiar i can only apologise.

    I am having a few problems with my f@h client. I run the gpu client. Whenever i run the system i get about 2 - 3 minutes before the client stops. the log says i have an unstable machine. My current machine is a new build i have just completed. The only component that has remained the same is my graphics card. I am currently using the following:

    Gigabyte EP43-S3L mobo
    E5200 intel core duo processor (not yet overclocked)
    2 x 2Gb OCZ Pc 6400 RAM
    XFX Ati Radeon 4950 (no overclock applied)
    Cooler Master 450watt Psu

    i have not had a chance to apply any overclock to any compnents as of yet, however my machine has been fully stress tested and has not shown any performance issues what so ever. Here is a copy of my latest F@H log. Any help would be greatly appreciated.




    Launch directory: C:\Users\Scotty\AppData\Roaming\Folding@home-gpu


    [19:14:36] - Ask before connecting: No
    [19:14:36] - User name: Mercinarynurse (Team 35947)
    [19:14:36] - User ID: 4697D9013C5EB021
    [19:14:36] - Machine ID: 2
    [19:14:36]
    [19:14:36] Loaded queue successfully.
    [19:14:36] Initialization complete
    [19:14:36] - Preparing to get new work unit...
    [19:14:36] + Attempting to get work packet
    [19:14:36] - Connecting to assignment server
    [19:14:37] - Successful: assigned to (171.64.65.103).
    [19:14:37] + News From Folding@Home: GPU folding beta
    [19:14:37] Loaded queue successfully.
    [19:14:40] + Closed connections
    [19:14:40]
    [19:14:40] + Processing work unit
    [19:14:40] Core required: FahCore_11.exe
    [19:14:40] Core found.
    [19:14:40] Working on queue slot 06 [May 2 19:14:40 UTC]
    [19:14:40] + Working ...
    [19:14:40]
    [19:14:40] *------------------------------*
    [19:14:40] Folding@Home GPU Core - Beta
    [19:14:40] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [19:14:40]
    [19:14:40] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [19:14:40] Build host: amoeba
    [19:14:40] Board Type: AMD
    [19:14:40] Core :
    [19:14:40] Preparing to commence simulation
    [19:14:40] - Looking at optimizations...
    [19:14:40] - Created dyn
    [19:14:40] - Files status OK
    [19:14:40] - Expanded 85744 -> 444252 (decompressed 518.1 percent)
    [19:14:40] Called DecompressByteArray: compressed_data_size=85744 data_size=444252, decompressed_data_size=444252 diff=0
    [19:14:40] - Digital signature verified
    [19:14:40]
    [19:14:40] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:14:40]
    [19:14:40] Assembly optimizations on if available.
    [19:14:40] Entering M.D.
    [19:14:46] Tpr hash work/wudata_06.tpr: 3902013207 2947917259 312280835 127646471 2055549506
    [19:14:46] Working on 1254 p4755_lam5w_300K_g91
    [19:14:47] Client config found, loading data.
    [19:14:47] Starting GUI Server
    [19:15:05] mdrun_gpu returned
    [19:15:05] NANs detected on GPU
    [19:15:05]
    [19:15:05] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [19:15:08] CoreStatus = 7A (122)
    [19:15:08] Sending work to server
    [19:15:08] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:15:08] - Error: Could not get length of results file work/wuresults_06.dat
    [19:15:08] - Error: Could not read unit 06 file. Removing from queue.
    [19:15:08] - Preparing to get new work unit...
    [19:15:08] + Attempting to get work packet
    [19:15:08] - Connecting to assignment server
    [19:15:09] - Successful: assigned to (171.64.65.103).
    [19:15:09] + News From Folding@Home: GPU folding beta
    [19:15:09] Loaded queue successfully.
    [19:15:12] + Closed connections
    [19:15:17]
    [19:15:17] + Processing work unit
    [19:15:17] Core required: FahCore_11.exe
    [19:15:17] Core found.
    [19:15:17] Working on queue slot 07 [May 2 19:15:17 UTC]
    [19:15:17] + Working ...
    [19:15:17]
    [19:15:17] *------------------------------*
    [19:15:17] Folding@Home GPU Core - Beta
    [19:15:17] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [19:15:17]
    [19:15:17] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [19:15:17] Build host: amoeba
    [19:15:17] Board Type: AMD
    [19:15:17] Core :
    [19:15:17] Preparing to commence simulation
    [19:15:17] - Looking at optimizations...
    [19:15:17] - Created dyn
    [19:15:17] - Files status OK
    [19:15:17] - Expanded 85744 -> 444252 (decompressed 518.1 percent)
    [19:15:17] Called DecompressByteArray: compressed_data_size=85744 data_size=444252, decompressed_data_size=444252 diff=0
    [19:15:17] - Digital signature verified
    [19:15:17]
    [19:15:17] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:15:17]
    [19:15:17] Assembly optimizations on if available.
    [19:15:17] Entering M.D.
    [19:15:24] Tpr hash work/wudata_07.tpr: 3902013207 2947917259 312280835 127646471 2055549506
    [19:15:24] Working on 1254 p4755_lam5w_300K_g91
    [19:15:24] Client config found, loading data.
    [19:15:24] Starting GUI Server
    [19:15:28] mdrun_gpu returned
    [19:15:28] NANs detected on GPU
    [19:15:28]
    [19:15:28] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [19:15:30] CoreStatus = 7A (122)
    [19:15:30] Sending work to server
    [19:15:30] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:15:30] - Error: Could not get length of results file work/wuresults_07.dat
    [19:15:30] - Error: Could not read unit 07 file. Removing from queue.
    [19:15:30] - Preparing to get new work unit...
    [19:15:30] + Attempting to get work packet
    [19:15:30] - Connecting to assignment server
    [19:15:31] - Successful: assigned to (171.64.65.103).
    [19:15:31] + News From Folding@Home: GPU folding beta
    [19:15:31] Loaded queue successfully.
    [19:15:34] + Closed connections
    [19:15:39]
    [19:15:39] + Processing work unit
    [19:15:39] Core required: FahCore_11.exe
    [19:15:39] Core found.
    [19:15:39] Working on queue slot 08 [May 2 19:15:39 UTC]
    [19:15:39] + Working ...
    [19:15:39]
    [19:15:39] *------------------------------*
    [19:15:39] Folding@Home GPU Core - Beta
    [19:15:39] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [19:15:39]
    [19:15:39] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [19:15:39] Build host: amoeba
    [19:15:39] Board Type: AMD
    [19:15:39] Core :
    [19:15:39] Preparing to commence simulation
    [19:15:39] - Looking at optimizations...
    [19:15:39] - Created dyn
    [19:15:39] - Files status OK
    [19:15:39] - Expanded 85744 -> 444252 (decompressed 518.1 percent)
    [19:15:39] Called DecompressByteArray: compressed_data_size=85744 data_size=444252, decompressed_data_size=444252 diff=0
    [19:15:39] - Digital signature verified
    [19:15:39]
    [19:15:39] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:15:39]
    [19:15:39] Assembly optimizations on if available.
    [19:15:39] Entering M.D.
    [19:15:45] Tpr hash work/wudata_08.tpr: 3902013207 2947917259 312280835 127646471 2055549506
    [19:15:45] Working on 1254 p4755_lam5w_300K_g91
    [19:15:46] Client config found, loading data.
    [19:15:46] Starting GUI Server
    [19:16:04] mdrun_gpu returned
    [19:16:04] NANs detected on GPU
    [19:16:04]
    [19:16:04] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [19:16:08] CoreStatus = 7A (122)
    [19:16:08] Sending work to server
    [19:16:08] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:16:08] - Error: Could not get length of results file work/wuresults_08.dat
    [19:16:08] - Error: Could not read unit 08 file. Removing from queue.
    [19:16:08] - Preparing to get new work unit...
    [19:16:08] + Attempting to get work packet
    [19:16:08] - Connecting to assignment server
    [19:16:09] - Successful: assigned to (171.64.65.103).
    [19:16:09] + News From Folding@Home: GPU folding beta
    [19:16:09] Loaded queue successfully.
    [19:16:11] + Closed connections
    [19:16:16]
    [19:16:16] + Processing work unit
    [19:16:16] Core required: FahCore_11.exe
    [19:16:16] Core found.
    [19:16:16] Working on queue slot 09 [May 2 19:16:16 UTC]
    [19:16:16] + Working ...
    [19:16:16]
    [19:16:16] *------------------------------*
    [19:16:16] Folding@Home GPU Core - Beta
    [19:16:16] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [19:16:16]
    [19:16:16] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [19:16:16] Build host: amoeba
    [19:16:16] Board Type: AMD
    [19:16:16] Core :
    [19:16:16] Preparing to commence simulation
    [19:16:16] - Looking at optimizations...
    [19:16:16] - Created dyn
    [19:16:16] - Files status OK
    [19:16:16] - Expanded 85744 -> 444252 (decompressed 518.1 percent)
    [19:16:16] Called DecompressByteArray: compressed_data_size=85744 data_size=444252, decompressed_data_size=444252 diff=0
    [19:16:16] - Digital signature verified
    [19:16:16]
    [19:16:16] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:16:16]
    [19:16:16] Assembly optimizations on if available.
    [19:16:16] Entering M.D.
    [19:16:23] Tpr hash work/wudata_09.tpr: 3902013207 2947917259 312280835 127646471 2055549506
    [19:16:23] Working on 1254 p4755_lam5w_300K_g91
    [19:16:23] Client config found, loading data.
    [19:16:23] Starting GUI Server
    [19:16:41] mdrun_gpu returned
    [19:16:41] NANs detected on GPU
    [19:16:41]
    [19:16:41] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [19:16:45] CoreStatus = 7A (122)
    [19:16:45] Sending work to server
    [19:16:45] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:16:45] - Error: Could not get length of results file work/wuresults_09.dat
    [19:16:45] - Error: Could not read unit 09 file. Removing from queue.
    [19:16:45] - Preparing to get new work unit...
    [19:16:45] + Attempting to get work packet
    [19:16:45] - Connecting to assignment server
    [19:16:46] - Successful: assigned to (171.64.65.103).
    [19:16:46] + News From Folding@Home: GPU folding beta
    [19:16:46] Loaded queue successfully.
    [19:16:49] + Closed connections
    [19:16:54]
    [19:16:54] + Processing work unit
    [19:16:54] Core required: FahCore_11.exe
    [19:16:54] Core found.
    [19:16:54] Working on queue slot 00 [May 2 19:16:54 UTC]
    [19:16:54] + Working ...
    [19:16:54]
    [19:16:54] *------------------------------*
    [19:16:54] Folding@Home GPU Core - Beta
    [19:16:54] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [19:16:54]
    [19:16:54] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [19:16:54] Build host: amoeba
    [19:16:54] Board Type: AMD
    [19:16:54] Core :
    [19:16:54] Preparing to commence simulation
    [19:16:54] - Looking at optimizations...
    [19:16:54] - Created dyn
    [19:16:54] - Files status OK
    [19:16:54] - Expanded 85744 -> 444252 (decompressed 518.1 percent)
    [19:16:54] Called DecompressByteArray: compressed_data_size=85744 data_size=444252, decompressed_data_size=444252 diff=0
    [19:16:54] - Digital signature verified
    [19:16:54]
    [19:16:54] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:16:54]
    [19:16:54] Assembly optimizations on if available.
    [19:16:54] Entering M.D.
    [19:17:00] Tpr hash work/wudata_00.tpr: 3902013207 2947917259 312280835 127646471 2055549506
    [19:17:00] Working on 1254 p4755_lam5w_300K_g91
    [19:17:01] Client config found, loading data.
    [19:17:01] Starting GUI Server
    [19:17:04] mdrun_gpu returned
    [19:17:04] NANs detected on GPU
    [19:17:04]
    [19:17:04] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [19:17:08] CoreStatus = 7A (122)
    [19:17:08] Sending work to server
    [19:17:08] Project: 4755 (Run 2, Clone 448, Gen 3)
    [19:17:08] - Error: Could not get length of results file work/wuresults_00.dat
    [19:17:08] - Error: Could not read unit 00 file. Removing from queue.
    [19:17:08] EUE limit exceeded. Pausing 24 hours.

    Folding@Home Client Shutdown.




    Many thanks
     
  2. Unicorn

    Unicorn Uniform November India

    Joined:
    25 Jul 2006
    Posts:
    12,726
    Likes Received:
    456
    Sorryt to hear you're having problems with the client... I see you are using a new 4950 card, if you could tell us which driver version you are using it might help. I suspect a driver/ support issue with the new GPU is the problem here but the others might be able to fill you in on that better than I can.

    Not that it usually matters but why is the GPU client machine ID set to 2? Are you running a CPU client also? I hope you get it running staby because you've got a rig capable of at least 8K PPD there!
     
  3. cgcox1

    cgcox1 Obessed Folder! Me? Surely not!

    Joined:
    24 Apr 2009
    Posts:
    1,187
    Likes Received:
    1
    Looking at the log i don't think it is a hardware failure issue. The client actually only seems to run for about 3 seconds in the log, the time between 'starting GUI' and the NAN error. So i would imagine a configuration or compatibility issue myself. Not run an ATI on GPU2 so sorry i can't be of more use.
     
  4. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    thanks for replying guys.

    I was very briefly running the cpu client but i uninstalled it before i ran the gpu client. I also run the client off of my PS3.

    As for drivers, i am using the most up to date drivers for all my hardware. (up to date as of Thursday.) This is honstly driving me mad.

    I deliberately picked low powered but efficient parts so i could have 24/7 folding. :wallbash: I know that Nvidia cards are better for folding but im trying to keep some of my old AMD loyalties alive :hehe:
     
  5. Unicorn

    Unicorn Uniform November India

    Joined:
    25 Jul 2006
    Posts:
    12,726
    Likes Received:
    456
    I'm also guilty as charged of that... I may fold on Intel and nVidia hardware but my 2009 competition gaming rig is Phenom/ ATI based, as it has been for the past 5 years :rolleyes:
     
  6. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    I Know the feeling. I have kept up my AMD loyalties alive through graphics cards for along time now. Unfortunately my better half wont let me have more than one system at a time and gets grumpy when i spend the pennies on computer components (funny how her 10 to Nth power quantaties of shoes and handbags are acceptable :sigh:) so i have had to compromise by going to intel for processors and sticking with ATI/AMD for my graphics requirements.

    Back to the problem at hand. I have scowered forums trying to find a solution to this problem but no one seems to know how to fix it. Some people have found setting XP sp2 compatibility has fixed the issue but unfortunately that hasnt worked for me. :wallbash::wallbash::wallbash::wallbash:
     
  7. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    Ok this is really annoying me now.

    i have checked all my drivers to ensure they are up to date, i have set F@H to windows xp compatibillity (im using vista 32bit edition) . I have deleted the work and log files so that the client would start from scratch and i have stress tested my system again to ensure there are no hardware issues. I can not think of anything else to do. Here is my latest log:

    18:00:33] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [18:00:33]
    [18:00:33] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [18:00:33] Build host: amoeba
    [18:00:33] Board Type: AMD
    [18:00:33] Core :
    [18:00:33] Preparing to commence simulation
    [18:00:33] - Looking at optimizations...
    [18:00:33] - Created dyn
    [18:00:33] - Files status OK
    [18:00:33] - Expanded 68576 -> 357580 (decompressed 521.4 percent)
    [18:00:33] Called DecompressByteArray: compressed_data_size=68576 data_size=357580, decompressed_data_size=357580 diff=0
    [18:00:33] - Digital signature verified
    [18:00:33]
    [18:00:33] Project: 5746 (Run 1, Clone 43, Gen 145)
    [18:00:33]
    [18:00:33] Assembly optimizations on if available.
    [18:00:33] Entering M.D.
    [18:00:39] Tpr hash work/wudata_01.tpr: 2733042196 3334769117 4173826759 3721029696 1899696260
    [18:00:39] Working on Protein
    [18:00:40] Client config found, loading data.
    [18:00:40] Starting GUI Server
    [18:00:57] mdrun_gpu returned
    [18:00:57] NANs detected on GPU
    [18:00:57]
    [18:00:57] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [18:00:59] CoreStatus = 7A (122)
    [18:00:59] Sending work to server
    [18:00:59] Project: 5746 (Run 1, Clone 43, Gen 145)
    [18:00:59] - Error: Could not get length of results file work/wuresults_01.dat
    [18:00:59] - Error: Could not read unit 01 file. Removing from queue.
    [18:00:59] - Preparing to get new work unit...
    [18:00:59] + Attempting to get work packet
    [18:00:59] - Connecting to assignment server
    [18:01:01] - Successful: assigned to (171.64.65.102).
    [18:01:01] + News From Folding@Home: GPU folding beta
    [18:01:01] Loaded queue successfully.
    [18:01:02] - Attempt #1 to get work failed, and no other work to do.
    Waiting before retry.
    [18:01:09] + Attempting to get work packet
    [18:01:09] - Connecting to assignment server
    [18:01:10] - Successful: assigned to (171.64.65.102).
    [18:01:10] + News From Folding@Home: GPU folding beta
    [18:01:10] Loaded queue successfully.
    [18:01:13] + Closed connections
    [18:01:18]
    [18:01:18] + Processing work unit
    [18:01:18] Core required: FahCore_11.exe
    [18:01:18] Core found.
    [18:01:18] Working on queue slot 02 [May 3 18:01:18 UTC]
    [18:01:18] + Working ...
    [18:01:18]
    [18:01:18] *------------------------------*
    [18:01:18] Folding@Home GPU Core - Beta
    [18:01:18] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [18:01:18]
    [18:01:18] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [18:01:18] Build host: amoeba
    [18:01:18] Board Type: AMD
    [18:01:18] Core :
    [18:01:18] Preparing to commence simulation
    [18:01:18] - Looking at optimizations...
    [18:01:18] - Created dyn
    [18:01:18] - Files status OK
    [18:01:18] - Expanded 70179 -> 360060 (decompressed 513.0 percent)
    [18:01:18] Called DecompressByteArray: compressed_data_size=70179 data_size=360060, decompressed_data_size=360060 diff=0
    [18:01:18] - Digital signature verified
    [18:01:18]
    [18:01:18] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:01:18]
    [18:01:18] Assembly optimizations on if available.
    [18:01:18] Entering M.D.
    [18:01:24] Tpr hash work/wudata_02.tpr: 1612572180 2495514915 4286034490 3299768619 3555354150
    [18:01:24] Working on Protein
    [18:01:24] Client config found, loading data.
    [18:01:24] Starting GUI Server
    [18:01:41] mdrun_gpu returned
    [18:01:41] NANs detected on GPU
    [18:01:41]
    [18:01:41] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [18:01:44] CoreStatus = 7A (122)
    [18:01:44] Sending work to server
    [18:01:44] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:01:44] - Error: Could not get length of results file work/wuresults_02.dat
    [18:01:44] - Error: Could not read unit 02 file. Removing from queue.
    [18:01:44] - Preparing to get new work unit...
    [18:01:44] + Attempting to get work packet
    [18:01:44] - Connecting to assignment server
    [18:01:45] - Successful: assigned to (171.64.65.102).
    [18:01:45] + News From Folding@Home: GPU folding beta
    [18:01:45] Loaded queue successfully.
    [18:01:48] + Closed connections
    [18:01:53]
    [18:01:53] + Processing work unit
    [18:01:53] Core required: FahCore_11.exe
    [18:01:53] Core found.
    [18:01:53] Working on queue slot 03 [May 3 18:01:53 UTC]
    [18:01:53] + Working ...
    [18:01:53]
    [18:01:53] *------------------------------*
    [18:01:53] Folding@Home GPU Core - Beta
    [18:01:53] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [18:01:53]
    [18:01:53] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [18:01:53] Build host: amoeba
    [18:01:53] Board Type: AMD
    [18:01:53] Core :
    [18:01:53] Preparing to commence simulation
    [18:01:53] - Looking at optimizations...
    [18:01:53] - Created dyn
    [18:01:53] - Files status OK
    [18:01:53] - Expanded 70179 -> 360060 (decompressed 513.0 percent)
    [18:01:53] Called DecompressByteArray: compressed_data_size=70179 data_size=360060, decompressed_data_size=360060 diff=0
    [18:01:53] - Digital signature verified
    [18:01:53]
    [18:01:53] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:01:53]
    [18:01:53] Assembly optimizations on if available.
    [18:01:53] Entering M.D.
    [18:01:59] Tpr hash work/wudata_03.tpr: 1612572180 2495514915 4286034490 3299768619 3555354150
    [18:01:59] Working on Protein
    [18:01:59] Client config found, loading data.
    [18:01:59] Starting GUI Server
    [18:02:17] mdrun_gpu returned
    [18:02:17] NANs detected on GPU
    [18:02:17]
    [18:02:17] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [18:02:21] CoreStatus = 7A (122)
    [18:02:21] Sending work to server
    [18:02:21] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:02:21] - Error: Could not get length of results file work/wuresults_03.dat
    [18:02:21] - Error: Could not read unit 03 file. Removing from queue.
    [18:02:21] - Preparing to get new work unit...
    [18:02:21] + Attempting to get work packet
    [18:02:21] - Connecting to assignment server
    [18:02:22] - Successful: assigned to (171.64.65.102).
    [18:02:22] + News From Folding@Home: GPU folding beta
    [18:02:22] Loaded queue successfully.
    [18:02:25] + Closed connections
    [18:02:30]
    [18:02:30] + Processing work unit
    [18:02:30] Core required: FahCore_11.exe
    [18:02:30] Core found.
    [18:02:30] Working on queue slot 04 [May 3 18:02:30 UTC]
    [18:02:30] + Working ...
    [18:02:30]
    [18:02:30] *------------------------------*
    [18:02:30] Folding@Home GPU Core - Beta
    [18:02:30] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [18:02:30]
    [18:02:30] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [18:02:30] Build host: amoeba
    [18:02:30] Board Type: AMD
    [18:02:30] Core :
    [18:02:30] Preparing to commence simulation
    [18:02:30] - Looking at optimizations...
    [18:02:30] - Created dyn
    [18:02:30] - Files status OK
    [18:02:30] - Expanded 70179 -> 360060 (decompressed 513.0 percent)
    [18:02:30] Called DecompressByteArray: compressed_data_size=70179 data_size=360060, decompressed_data_size=360060 diff=0
    [18:02:30] - Digital signature verified
    [18:02:30]
    [18:02:30] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:02:30]
    [18:02:30] Assembly optimizations on if available.
    [18:02:30] Entering M.D.
    [18:02:36] Tpr hash work/wudata_04.tpr: 1612572180 2495514915 4286034490 3299768619 3555354150
    [18:02:36] Working on Protein
    [18:02:37] Client config found, loading data.
    [18:02:37] Starting GUI Server
    [18:02:54] mdrun_gpu returned
    [18:02:54] NANs detected on GPU
    [18:02:54]
    [18:02:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [18:02:56] CoreStatus = 7A (122)
    [18:02:56] Sending work to server
    [18:02:56] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:02:56] - Error: Could not get length of results file work/wuresults_04.dat
    [18:02:56] - Error: Could not read unit 04 file. Removing from queue.
    [18:02:56] - Preparing to get new work unit...
    [18:02:56] + Attempting to get work packet
    [18:02:56] - Connecting to assignment server
    [18:02:57] - Successful: assigned to (171.64.65.102).
    [18:02:57] + News From Folding@Home: GPU folding beta
    [18:02:57] Loaded queue successfully.
    [18:03:00] + Closed connections
    [18:03:05]
    [18:03:05] + Processing work unit
    [18:03:05] Core required: FahCore_11.exe
    [18:03:05] Core found.
    [18:03:05] Working on queue slot 05 [May 3 18:03:05 UTC]
    [18:03:05] + Working ...
    [18:03:05]
    [18:03:05] *------------------------------*
    [18:03:05] Folding@Home GPU Core - Beta
    [18:03:05] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
    [18:03:05]
    [18:03:05] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
    [18:03:05] Build host: amoeba
    [18:03:05] Board Type: AMD
    [18:03:05] Core :
    [18:03:05] Preparing to commence simulation
    [18:03:05] - Looking at optimizations...
    [18:03:05] - Created dyn
    [18:03:05] - Files status OK
    [18:03:05] - Expanded 70179 -> 360060 (decompressed 513.0 percent)
    [18:03:05] Called DecompressByteArray: compressed_data_size=70179 data_size=360060, decompressed_data_size=360060 diff=0
    [18:03:05] - Digital signature verified
    [18:03:05]
    [18:03:05] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:03:05]
    [18:03:05] Assembly optimizations on if available.
    [18:03:05] Entering M.D.
    [18:03:11] Tpr hash work/wudata_05.tpr: 1612572180 2495514915 4286034490 3299768619 3555354150
    [18:03:11] Working on Protein
    [18:03:12] Client config found, loading data.
    [18:03:12] Starting GUI Server
    [18:03:29] mdrun_gpu returned
    [18:03:29] NANs detected on GPU
    [18:03:29]
    [18:03:29] Folding@home Core Shutdown: UNSTABLE_MACHINE
    [18:03:31] CoreStatus = 7A (122)
    [18:03:31] Sending work to server
    [18:03:31] Project: 5740 (Run 3, Clone 72, Gen 124)
    [18:03:31] - Error: Could not get length of results file work/wuresults_05.dat
    [18:03:31] - Error: Could not read unit 05 file. Removing from queue.
    [18:03:31] EUE limit exceeded. Pausing 24 hours.

    Folding@Home Client Shutdown.


    Any of you gurus out there have any idea what else could be going wrong?
     
  8. cgcox1

    cgcox1 Obessed Folder! Me? Surely not!

    Joined:
    24 Apr 2009
    Posts:
    1,187
    Likes Received:
    1
    I wonder what NANs detected on GPU means? That seems to be the problem.
     
  9. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    My thoughts exactly.

    I had a look through various forums to see if anyone knew what it meant. No one seems to know. Other people who have had the same problem have tried different solutions with variedlevels of success. I have not been able to get anything to work. The GPU has remained stable throughout and the highest temperature it has recorded was 46 so im utterly stumped as to the problem.:sigh:
     
  10. cgcox1

    cgcox1 Obessed Folder! Me? Surely not!

    Joined:
    24 Apr 2009
    Posts:
    1,187
    Likes Received:
    1
    The GPU will remain stable as its not doing any work, of that i am pretty sure. You get similar sort of immediate throughouts when you (ok me) have setup my multiple gpu setups wrong.

    Have you tired this forum.

    http://foldingforum.org/viewforum.php?f=51
     
  11. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    yeah i had alook on that thread, there have been people with similar problems but no one seems to know what nans on gpu means.

    i still havent found a solution sadly. My current ppd is 0 :miffed: and it doesnt look like its gonna change anytime soon. grrrr
     
  12. JackOfAll

    JackOfAll What's a Dremel?

    Joined:
    23 Apr 2009
    Posts:
    671
    Likes Received:
    6
    It seems that I'll have to do a little investigation to. One half of one of the 4x GTX295's in the "Weapon" is continually EUE with NAN's. I've already run the new GPU memtest program on it. That isn't flagging any errors. Hmmmm.
     
  13. uncle_fungus

    uncle_fungus P/T Folding@home developer

    Joined:
    27 Mar 2009
    Posts:
    176
    Likes Received:
    5
    It means the calculations done by the GPU have produced nonsensical results. NAN stands for Not A Number, and usually means that the simulation has tried to divide by zero.

    This is not usually indicative of a driver issue (the simulation would normally fail instantly), but rather a hardware issue (which may or may not be caused by overheating). Incidentally, most hardware stress testers for GPUs don't test for mathematical accuracy, just for visual accuracy, which are not necessarily congruent.
     
  14. JackOfAll

    JackOfAll What's a Dremel?

    Joined:
    23 Apr 2009
    Posts:
    671
    Likes Received:
    6
    I'll have a play in the morning, swapping power for this specific card to another rail. Swap the card to another slot. etc. etc. I don't think my problem is overheating and the other 7 GPU's are clean. Currently that GPU is running @ 82 degC.

    Code:
    
    [clivem@c7super:~/foldingathome]$ grep NAN GPU[1-8]_GTX295/FAHlog*
    GPU1_GTX295/FAHlog-Prev.txt:[14:24:14] NANs detected on GPU
    GPU1_GTX295/FAHlog-Prev.txt:[16:03:25] NANs detected on GPU
    GPU1_GTX295/FAHlog-Prev.txt:[16:09:43] NANs detected on GPU
    GPU1_GTX295/FAHlog-Prev.txt:[16:13:20] NANs detected on GPU
    GPU1_GTX295/FAHlog-Prev.txt:[16:22:06] NANs detected on GPU
    GPU1_GTX295/FAHlog.txt:[20:04:53] NANs detected on GPU
    GPU1_GTX295/FAHlog.txt:[21:56:24] NANs detected on GPU
    
    [14:24:14] Completed 95%
    [14:24:14] mdrun_gpu returned
    [14:24:14] NANs detected on GPU
    
    [16:03:25] Completed 22%
    [16:03:25] mdrun_gpu returned
    [16:03:25] NANs detected on GPU
    
    [16:08:58] Completed 2%
    [16:09:43] mdrun_gpu returned
    [16:09:43] NANs detected on GPU
    
    [16:13:20] Completed 2%
    [16:13:20] mdrun_gpu returned
    [16:13:20] NANs detected on GPU
    
    [16:22:06] Completed 5%
    [16:22:06] mdrun_gpu returned
    [16:22:06] NANs detected on GPU
    
    [20:04:53] Completed 74%
    [20:04:53] mdrun_gpu returned
    [20:04:53] NANs detected on GPU
    
    [21:56:24] Completed 78%
    [21:56:24] mdrun_gpu returned
    [21:56:24] NANs detected on GPU
    
     
  15. uncle_fungus

    uncle_fungus P/T Folding@home developer

    Joined:
    27 Mar 2009
    Posts:
    176
    Likes Received:
    5
    Could try running nvidia memtest on it (see thread of same name). It may just be a faulty card.

    If you can grep out the specific PRCGs that failed, I can check the WU database to see if any of them have been returned successfully by anyone else.
     
  16. mercinarynurse

    mercinarynurse What's a Dremel?

    Joined:
    2 May 2009
    Posts:
    13
    Likes Received:
    0
    So from what i can see here it looks like i am going to have to grovel to me mrs to be allowed to fork out on another new graphocs card. As this is the only component that has transfered from my old pc to my new one. Hmmmm i suspect my GPU wont be folding for some time. DOH!!!!!!!!
     
  17. JackOfAll

    JackOfAll What's a Dremel?

    Joined:
    23 Apr 2009
    Posts:
    671
    Likes Received:
    6
    I already ran the GPU memtest on it for an hour or so earlier - no errors.

    One of my NAN'd WU's was OK on another card, so it's hardware specific, not the WU's. It may just be that the PS rail supplying that card is a little flakey. Anyway, I'll eliminate that tomorrow and RMA the card if necessary.
     
  18. coolamasta

    coolamasta Folding@Home CC Captain 2010/11/12

    Joined:
    26 Apr 2009
    Posts:
    2,618
    Likes Received:
    110
    I'm getting NAN GPU issues on one of my cards, doesn't happen all the time though, just now and again, bit weird!
     

Share This Page