Yesterday I became aware of a stagnant BRP3-file. Normaly they are finished within 80 - 85 min, but this one had done 9:45h and still counting up at stadily progress of 66,7% during the next 15min. As GPU-load was 0%, memory controller load just 7% and CPU-load <0,01% (Process Explorer), I decided to abort this file. From that on every works constantly well.
For me this is the first time. But how often does this happen to other participants? And is there no mechanism installed watching the progress, abortimg the task at some limit?
I lost the succesfull crunching of 7 other files. But it was just luck, that I looked on. It could have become much longer times, even some days and the disservice proportionally bigger.
What´s your experience in this regard?
Kind regards
Martin
Copyright © 2024 Einstein@Home. All rights reserved.
Stagnant BRP4-file
)
I had a similar problem once which went away after I moved the PC to a cooler room. In my case, the CUDA Task would lock up only right at the start tho (= at 0% progress), so your problem might be different.
Yes, every BOINC task comes with a measure of its complexity (assigned by the work generator) that will translate (based on benchmark) to a maximum allowed runtime, based on the individual computer's speed. Because the benchmarks are not very reliable, projects tend to set this value very conservatively so that tasks may run VERY long before timing out. But eventually, they will time out.
HBE
RE: I decided to abort this
)
You could have tried to suspend/resume the task and then to see if it continued normally. Sometimes a restart of the BOINC client or a reboot helps too.
There is. After a runtime ten times the originally estimated time to completion, the task is aborted with an exit code of -177 "Maximum elapsed time exceeded".
Gruß,
Gundolf
[edit]Less than a minute difference! ;-)[/edit]
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: There is. After a
)
Note that (AFAIK) for this timeout the client still measures 'runtime' as CPU time. If the CUDA App runs 20x as fast as the CPU App on your system, this means that that the task will time out after 200x the normal execution time, in your case after 16 days (if running 24h a day).
BM
BM
RE: Yesterday I became
)
I got the same problem and it could be related to BRP4, because it didn't occur with BRP 3 tasks.
Since BRP4 tasks do not give enough workload for my GPU's I've added some new projects with GPU tasks. So what happens is, when the switch between the projects occurs, than sometimes the nvidia driver crashes and for a few seconds there is a black screen, before it resumes. It is significant, that GPU tasks are sent to Nirvana only after such a driver crash. The problem can be solved very easily with restarting the boinc client. The GPU tasks resume at the last checkpoint.
So the reason, why I mention, that this could be related to BRP4 is, that I have one machine, where I'm running only Einstein and Milkyway GPU tasks. Nothing of the nwe GPU projects. And even on this machine, there occurs the same problem and it was NOT the case with BRP3 tasks.
NVIDIA Driver version is 275.33 on three machines
BOINC version is 2.12.26
OS is WIN7 Ultimate x64 on two machines and Vista 32bit
RE: I got the same problem
)
Driver crashes at task switch could be related to the BOINC API issue we discussed here.
Driver 275.33 on Windows 7 is certainly in the vulnerable zone for that problem, which will only be fixed when the BRP app (the same app for BRP3 and BRP4) is modified and re-compiled against the new API, as described in AppCoprocessor at 'Cleanup on premature exit'.
Until then the only known workround is to revert to 266.xx series video drivers, for those cards which are supported by drivers from that era.
I have reinstalled the 266.xx
)
I have reinstalled the 266.xx drivers on all machines. There is another problem with that driver. For reasons, that I cannot evaluate, boinc sometimes doesn't request new BRP4 or CPU tasks, even if there is non to crunch. The message is: Not reporting or requesting tasks.
After stopping and restarting Boinc client, it immediately requests new tasks. My guess is, that this still occurs at the switching between the projects.