I've had an ongoing problem with EUEs that has recently exacerbated itself. In summary, I haven an 8600GT (no monitor attached) that kept EUEing and pausing for 24hrs, alongside an 8800GTX that folded fine (monitor attached) and an 8800GTS in another PC (monitor attached). Every time I rebooted the machine it would sometimes do another WU fine and at the end fail again and refuse to go any further. Anyways, I figured it may be a flaky card, not enough power, or something else, and in any case didn't have time to fix it and I had two other GPUs folding anyway. So now for good reasons that are irrelevant here, I have to make both PCs, and so all three GPUs, headless, and as soon as I did so this morning all three GPUs start EUEing. So I think aha must be a lack of monitor thing, and cobble together three dummy plugs with some 82ohm resistors and three DVI-VGA adaptors, restart both PCs and all three GPUs kick in and fold like the clappers... And then all three EUE again. Anyone any ideas? If it helps, the PC with the two gfx cards is Server 2003, and the one with the single card is running Win 7 Enterprise. No SLI is set up. Apologies in advance if I'm doing something obviously dim here, but Google isn't helping.
Have you tried running the memtestg80 utility on the cards to see if they are having memory issues? It's on the Stanford Downloads page under the utilities section.
Zero errors on all three cards. For info, two of the cards never returned a single EUE until disconnecting the monitors and trying to fold headless or just with dummy plugs. Also, I've made sure the machine with two cards in is not using SLI, has the desktop extended to the secondary card, and has physx deactivated.
What flags are you giving each of the clients? I've never had to span the desktops to get mine to work, and I've rebuilt it twice since I've had it due to various issues with the OS. I simply plug the cards in, install them, give the relevant flag (ie. -gpu 0) and the relevant nvidia flag and away it goes. Mine is a 4-GPU config, but the same principle would apply. From what I remember about spanning the screens, you should have only 1 screen enabled with the others greyed out once everything is installed and setup. Make sure to check the settings stuck after the restart and that you only have 1 screen enabled and 1 disabled. You can check that by right clicking on the desktop, click Properties and go to the last tab. One should be solid and another grayed out.
The PC with the single 8800GTS 512 is running: SMP client with machineID 1 and max.CPU 80% and GPU client with -forcegpu nvidia_g80 -gpu 0 and machineID 2 The PC with the 8800GTX 768 and 8600GT 512 is running: SMP client with machine ID 1 and max.CPU 80% and -forcegpu nvidia_g80 -gpu 0 (machineID 2) -forcegpu nvidia_g80 -gpu 0 (machineID 3) That look right? Like I said, all was fine with actual monitors plugged in so I assumed the flags were all correct. Next time I'm physically at the machines I'll check the monitor settings. When I checked this morning, I only made sure the desktop spanning was switched on as I had seen in other posts, but if you say it should be switched off then I shall do so.
Sorry, that second one is: -forcegpu nvidia_g80 -gpu 0 (machineID 2) -forcegpu nvidia_g80 -gpu 1 (machineID 3)
The only thing you're missing and I use is the -local flags on the GPU clients. Mine look like this: -local -gpu 0 -local -forcegpu nvidia_g80 -gpu 1 -local -forcegpu nvidia_g80 -gpu 2 -local -forcegpu nvidia_g80 -gpu 3 Notice I don't force the GPU on the first graphics card.
-local: Instructs the client to read its configuration information from the local directory in the client.cfg file rather than, on Windows, from the installation directory specified in the registry. (I run it just to be safe. It's basically the Windows equivalent of a "working directory" command). -forcegpu: This flag basically helps with headless (ie. nothing connected) multi-gpu setups. The first card doesn't need to be forced because this is the one that Windows is booting with, and as such, is picked up and working before even logging in. The switch is needed for the others in order to tell the program where the gpu is (the -gpu x switch) and what it is (the -forcegpu switch). This flag may also help with headless multi-gpu setups on which the client does not acknowledge the presence of multiple GPUs. --- On a slight sidenote, look what you've done to me saspro! I've got so into folding that I've been studying the switches independantly! Damn you!
Thanks Well, I've made all cards -local and only have force_gpu on the second card on one PC. I've also disabled the extend desktop on that dual-gpu machine and restarted both to make sure the settings stuck, and they did. Now to see what happens...
It is said that -local flag is no longer required, but if it makes a difference, then use it No-one has mentioned drivers yet, but version: 258.96 WHQL is said to allow folding without monitors and dummy plugs. You'll also find people who say it doesn't run as fast as xxx.xx, but I leave you to decide what is worthwhile.
Well, an improvement all round. When I checked this morning, all GPUs and SMP clients were still folding, so thanks Only thing now is the second GPU on the dual machine is showing in HFM as RunningAsync, which I gather is work being logged in a time different to that expected or something like that. I'll wait a bit and see if it moves on or changes any. Scratch that. Just changed to Hung, so I'll restart and see if it behaves for the rest of the day.
The dual GPU PC now appears to be behaving. The one with the single GPU still throwing EUEs. I'll just keep tweaking settings until I make it work, and if I can't I'll blat and reinstall the lot and see if that improves things. Thanks again