no problems here downloading/crunching alberts. they take less time than the other WUs so i claim less credit.
BUT the computer I am paired with (well, it has been sent most of the alberts i have had) hasn't returned their alberts yet, so i must wait for credit.... grr
Normally I get almost instant credit since I've got a 4 day queue. 3 days to cover my isp going down friday evening and not being fixed until monday (happened twice in the last 6 mo), and one more day incase thier sysadmin needs to overnight a spare part. It looks like the person you're waiting on has a similarly long queue.
IT could be worse afterall. I've got a 5 results waiting on a noob who appears to've quit after returning 6 errors the last week of dec, and a 6th on annother noob that only did a single work unit.
2005-12-31 12:45:36.1250 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'.
2005-12-31 12:45:36.1250 [normal]: Started search at lalDebugLevel = 0
2005-12-31 12:45:36.8125 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2005-12-31 12:45:36.8125 [normal]: No usable checkpoint found, starting from beginning.
2005-12-31 12:50:57.9843 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ... done.
2005-12-31 16:24:34.0937 [normal]: Search finished successfully.
Looks like normal operations to me. That is, I think the "No usable checkpoint found . . ." messages are indicative of the first time Albert tried to write a checkpoint for those particular WU's. Every Albert WU I have looked at has one of these messages. In other words, it is only be a problem if a WU gets more than one of these messages.
Man, I don't know abou that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time.
An idea for the reduced "initial replication" part, I'm not sure if that is possible without a lot of work though:
Maybe fresh results of those workunits, that have result entries with "Over/No reply" could be delivered preferably to hosts with host.avg_turnaround < 3 days
Man, I don't know about that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time.
Ray,
70-80 hours is way too long for your machine, especially considering the WUs weren't even completed in that time, unless there's some incompatibility with Win98/albert that I don't know about. I'd suspect either thermal throttling or something very CPU-intensive running alongside it. Anything you know of that might qualify?
Regards,
Michael
microcraft
"The arc of history is long, but it bends toward justice" - MLK
Your results really do not look good, the messages indicate a problem.
- No heartbeat from core client for 31 sec - exiting
- Corrupted Fstat-file '...': has 2697271 bytes instead of 2700598
This is what I would do in this case :
- exclude the BOINC directory from beeing scanned by antivirus software
- while BOINC is not running, do a scandisk
- check the message board for known incompatibilities with Win9x
The plain "Maximum CPU time exceeded" error without additional messages might also be caused by an "over-optimized" BOINC client that causes a too high benchmark value. The maximum allowed CPU time isn't a constant but calculated from the benchmark values I think.
As is evident from my profile I've accum'd almost 4K credits w/EAH. I'm engaged in three other BOINC science applications, and except for a recent Rosetta hiccup there are no other problems. Rosetta completed the last two WU's w/out issue. Concurrently with BOINC applications, I'm processing UD Agent (Rosetta and/or LigandFit). I'm getting a mean time between UD Agent checkpoints of about 59 minutes with 1 STD being 1:21:00 over a period of 300 checkpoints. This is reasonable performance for UD Agent (and is why I bowed out of WCG processing, i.e., checkpointing for that BOINC application is non-deterministic).
Task switching between BOINC applications occurs about every 3:20:00, and write to disk is every 0:01:00. That should ensure at least one iteration of each application once per CPU wake period.
As far as CPU intensive processing: there's nothing going on. When I desire to launch one of my sims (Falcon4.0 or F1 2002), I wait for UD Stats to show a recent checkpoint, and then I suspend/snooze both BOINC and UD Agent. The rest is just normal IE browsing/Outlook Express.
I'm perceiving either a problem with Albert (and this appears to have just started around New Years).
I'm running default 5.13 BOINC, albeit with a optimized SETI application (that shouldn't affect EAH though). I am OC'd at 112 FSB running PC133 ECC SDRAM async at 4/3. But that hasn't changed either. What HAS changed is Albert.
It could be that my box is dying, i.e., I'm running a slot 1 P3 on a P3V4X, and HD00 (which NEVER spins down because of SpyBot's Tea Timer) is getting long in the tooth at 5 years. The CPU is cooled w/Vantec P35030 dual-fan CPU cooler (shimmed w/Arctic Silver). The P3V4X clock generator has a Arctic Silver shimmed passive (486) heat-sink (as does the Northbridge). If my system is dying, its dying selectively (only w/respect to EAH).
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)
(I never use screen saver or graphics).
Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000
I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)
(I never use screen saver or graphics).
Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000
I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)
Jord,
Eej, maat! As the other half of the "Graphics Bug" tag-team, I guess that leaves me off the case, too, since it's equally unlikely to be a graphics adaptor driver issue. :-)
Michael
microcraft
"The arc of history is long, but it bends toward justice" - MLK
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)
(I never use screen saver or graphics).
Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000
I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)
RE: no problems here
)
Normally I get almost instant credit since I've got a 4 day queue. 3 days to cover my isp going down friday evening and not being fixed until monday (happened twice in the last 6 mo), and one more day incase thier sysadmin needs to overnight a spare part. It looks like the person you're waiting on has a similarly long queue.
IT could be worse afterall. I've got a 5 results waiting on a noob who appears to've quit after returning 6 errors the last week of dec, and a 6th on annother noob that only did a single work unit.
RE: RE: And on my older
)
Man, I don't know abou that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time.
An idea for the reduced
)
An idea for the reduced "initial replication" part, I'm not sure if that is possible without a lot of work though:
Maybe fresh results of those workunits, that have result entries with "Over/No reply" could be delivered preferably to hosts with host.avg_turnaround < 3 days
RE: Man, I don't know about
)
Ray,
70-80 hours is way too long for your machine, especially considering the WUs weren't even completed in that time, unless there's some incompatibility with Win98/albert that I don't know about. I'd suspect either thermal throttling or something very CPU-intensive running alongside it. Anything you know of that might qualify?
Regards,
Michael
microcraft
"The arc of history is long, but it bends toward justice" - MLK
@Professor Ray : Your
)
@Professor Ray :
Your results really do not look good, the messages indicate a problem.
- No heartbeat from core client for 31 sec - exiting
- Corrupted Fstat-file '...': has 2697271 bytes instead of 2700598
This is what I would do in this case :
- exclude the BOINC directory from beeing scanned by antivirus software
- while BOINC is not running, do a scandisk
- check the message board for known incompatibilities with Win9x
The plain "Maximum CPU time exceeded" error without additional messages might also be caused by an "over-optimized" BOINC client that causes a too high benchmark value. The maximum allowed CPU time isn't a constant but calculated from the benchmark values I think.
Nope, doesn't make any
)
Nope, doesn't make any sense.
As is evident from my profile I've accum'd almost 4K credits w/EAH. I'm engaged in three other BOINC science applications, and except for a recent Rosetta hiccup there are no other problems. Rosetta completed the last two WU's w/out issue. Concurrently with BOINC applications, I'm processing UD Agent (Rosetta and/or LigandFit). I'm getting a mean time between UD Agent checkpoints of about 59 minutes with 1 STD being 1:21:00 over a period of 300 checkpoints. This is reasonable performance for UD Agent (and is why I bowed out of WCG processing, i.e., checkpointing for that BOINC application is non-deterministic).
Task switching between BOINC applications occurs about every 3:20:00, and write to disk is every 0:01:00. That should ensure at least one iteration of each application once per CPU wake period.
As far as CPU intensive processing: there's nothing going on. When I desire to launch one of my sims (Falcon4.0 or F1 2002), I wait for UD Stats to show a recent checkpoint, and then I suspend/snooze both BOINC and UD Agent. The rest is just normal IE browsing/Outlook Express.
I'm perceiving either a problem with Albert (and this appears to have just started around New Years).
I'm running default 5.13 BOINC, albeit with a optimized SETI application (that shouldn't affect EAH though). I am OC'd at 112 FSB running PC133 ECC SDRAM async at 4/3. But that hasn't changed either. What HAS changed is Albert.
It could be that my box is dying, i.e., I'm running a slot 1 P3 on a P3V4X, and HD00 (which NEVER spins down because of SpyBot's Tea Timer) is getting long in the tooth at 5 years. The CPU is cooled w/Vantec P35030 dual-fan CPU cooler (shimmed w/Arctic Silver). The P3V4X clock generator has a Arctic Silver shimmed passive (486) heat-sink (as does the Northbridge). If my system is dying, its dying selectively (only w/respect to EAH).
My latest Albert went out
)
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)
(I never use screen saver or graphics).
Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000
I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)
RE: My latest Albert went
)
Jord,
Eej, maat! As the other half of the "Graphics Bug" tag-team, I guess that leaves me off the case, too, since it's equally unlikely to be a graphics adaptor driver issue. :-)
Michael
microcraft
"The arc of history is long, but it bends toward justice" - MLK
RE: My latest Albert went
)
Maybe you should try the Beta application! ;-)
Actually, I happened to find a similar result last week and posted this message on the NEW: WINDOWS TEST APPLICATION FOR EINSTEIN@HOME board.
I have to admit that Jord used the more appropriate venue.
Edited - to improve the humor (maybe).
Wow... 6 in a row?? All with
)
Wow... 6 in a row?? All with the same error. Anyone?
I stopped BOINC already, restarted it, did a reboot. Or am I getting the bad batch on purpose? ;)
edit: 8 in a row now. Einstein is at No New Work until I figure out what's happening here. No need to blow through the other 8 units.