Recently the ABP2cuda23 workunits were stopped, and then they started again but with a shorter calculation time, perhaps about one-half of what they were before. They seemed to be calculating fine for awhile, but then earlier today they started to be reported with "Error while Computing", but only, of course, after pretty much reaching the end of the calculations (on my machine, about 1 hour 12 minutes or so on average. I first thought there was just one bad work unit, but all of them right now are coming up with this error. See work units 78855915 or 78848870 or 78845539 for example (there are others as well). I cannot read the task output files easily myself, but maybe some experts out there can see something quickly.
I also noticed that instead of saying the work units were using 1 CPU and 1 GPU (as they have for months now) they now say using 0.29 CPU and 1 GPU .... while this is certainly better for computer utilization, has something changed in the code recently that might be causing this error ?
Cheers, Richard
Copyright © 2024 Einstein@Home. All rights reserved.
Problems with new ABP2cuda23 calculations ?
)
See this thread on the changes on ABP2.
As for your error, I see that they end with:
Maximum elapsed time exceeded
And then the ABP2 application seems to crash. Not a very nice ending.
It's something the Einstein developers have to look into.
0 0x0018b9f0 SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
1 0x001798b0 SIGPIPE: write on a pipe with no reader
2 0x0017b00f SIGPIPE: write on a pipe with no reader
3 0x0017b23b SIGPIPE: write on a pipe with no reader
4 0x9976781d SIGPIPE: write on a pipe with no reader
5 0x997676a2
Thread 1 crashed with X86 Thread State (32-bit):
eax: 0xffffffe1 ebx: 0x00000003 ecx: 0xb0003afc edx: 0x9973a0fa
edi: 0x00000000 esi: 0x00000000 ebp: 0xb0003b38 esp: 0xb0003afc
ss: 0x0000001f efl: 0x00000206 eip: 0x9973a0fa cs: 0x00000007
ds: 0x0000001f es: 0x0000001f fs: 0x0000001f gs: 0x00000037
Could be related to the new
)
Could be related to the new scheduler. We're looking into that.
BM
BM
RE: And then the ABP2
)
I think that this is an (unwanted) side effect of the Client killing the application that ran longer that the Client expected.
BM
BM
Can you reconstruct from your
)
Can you reconstruct from your Client messages when you received the task in question?
When we changed the scheduler yesterday there were a few minutes when it ran with wrong plan class settings. You probably got it during that time. This is our fault, sorry for that. Things should be in order again.
BM
BM
I checked the task list, as I
)
I checked the task list, as I had rebooted the computer as part of my diagnostics and lost all the current messages. I got a whole pile of work units (121 tasks in fact) as follows : the first one that gave an error was sent 2 Aug 2010 5:29:18 UTC and the last one I have was sent 2 Aug 2010 12:45:53 UTC, but the vast majority (about 110 of them) were in the 12:24 to 12:26 area.
I've run 26 of them so far, all with the same error. Right now I have the rest of them suspended ... should I abort them and let the client reload ?
Regards
Richard
RE: I checked the task
)
Those messages are stored in the file stdoutdae.txt in your BOINC data directory. The name is for windows but should be similar on other platforms.
If you are adventurous, you could edit your client_state.xml file, but be careful and make backup copies before you try, because errors can trash your whole cache. I found the following in a thread on the SETI forum (Message 1019942):
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)