I have a WinXP PC with 2 processors and 1 GPU, which is shared via user switching (XP feature). In case I load the PC under my profile, everything works nicely with all the applications, but once the profile switch occurs, some time after this I receive a crash of CUDA application (609 one, BRPcuda32) with a yell of "can't allocate XXX bytes of GPU memory", this lets that task wait 5 minutes and then it starts rapidly failing CUDA tasks within 5s each with "Unable to initialize CUDA something" and general error of 1020. The very same scenario goes if the other profile is loaded first after reboot, while there's no profile switching, everything works OK, with the same crash sequence as soon as we switch places at the monitor. This is not only Einstein@Home problem, the SETI@Home sometimes grants a CUDA WU that behaves in the same line with these - so there should be something with the CUDA.
I also wonder, is there a possibility to recover a "failed to init" slot and force BOINC to re-run this task after I reboot the system to reset the GPU and its supporting systems? I just don't want failing tasks.
Copyright © 2024 Einstein@Home. All rights reserved.
CUDA apps behave weirdly, also can I restart the task received i
)
Yes, and it's called Windows. When you do a fast user switch in Windows, you're also changing video-driver (just as you would when using remote desktop procedure) from one you installed to one used by Windows, one which doesn't know anything about CUDA or OpenGL. This causes the BOINC to lose the connection with the GPU. Thus all your work will err.
A fix for this will be in the next BOINC (6.12), where when the connection to the GPU is lost for whatever reason, all subsequent work for that GPU will pause, until BOINC has restarted and knows where the GPU is again.
Most errors cannot be recovered from. But it's also not necessary as the redundancy in BOINC is that work is normally sent to two independent computers, both of which must return the same outcome and if one doesn't or it's different, the project sends it out to a third, a fourth, a fifth computer until one returns the same outcome as any of the others did.
RE: I have a WinXP PC with
)
Windows does not support the switching of users AND the continuing crunching using a gpu. It had to do with the drivers in the switched user, they are generic so the gpu tasks all fail. The only way to make this work is to suspend crunching when you switch users and not start back until you come back, or to get each person their own pc. It is a Windows thing not a Boinc thing, the newer beta versions of Boinc MAY have this fixed, they are working around the problem, but I am not sure.
oh, thanks a lot. We don't
)
oh, thanks a lot. We don't need 2 PCs and sometimes consider leaving the one in question to someone else - but no way will we do that :) there's too much for a person which he can't access without one. Good thing that there will be a workaround.