calculation errors

XuS
XuS
Joined: 12 Aug 10
Posts: 2
Credit: 59162
RAC: 0
Topic 195269

Hi!

Einstein home runs since a week smoothly,
however today i have 3 calculation errors.

I have made a screenshot:

Can i do anything to fix this problem?

BTW I use windows 7 64 Bit i7 920 @ 3600GHz + 260gtx nivida.

best regards, Daniel

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118661563816
RAC: 19093702

calculation errors

Quote:

Can i do anything to fix this problem?

BTW I use windows 7 64 Bit i7 920 @ 3600GHz + 260gtx nivida.


Hi Daniel,

If you look on the website at the tasks list for your computer, you can jump pages/scroll down until you find the tasks that have errored out. If you then click on the TaskID of any of these tasks, you can get to see what each task returned to the project, including any error messages. You have to scan through a lot of mundane stuff until you find the juicy bits like -

 - exit code 99 (0x63)


and

2010-08-20 21:12:02.5095 (1156) [CRITICAL]: Required frequency-bins [1701288, 1701303] not covered by SFT-interval [1701373, 1702193]
		[Parameters: alpha:0, Dphi_alpha:1.701295e+006, Tsft:1.800000e+003, *Tdot_al:9.999356e-001]
XLAL Error - LocalXLALComputeFaFb (/home/bema/einsteinathome/HierarchicalSearch/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c:515):

Input domain error

LocalXALComputeFaFb() failed


and even more stuff from the second snippet to the end of the page.

If you stick "exit code 99" and "Input domain error" into a single google search you will get the top hit as a link to an entry to the BOINC Faq Service and the particular FAQ entry that deals with this particular problem. That FAQ entry contains a link to an old thread here which talks about this particular problem that you seem to be having. Even though the thread is old and does refer to the S5R2 and S5R3 runs of several years ago, it is quite likely that it's the same issue since the current apps are just developments of the previous apps.

If you read all about this particular error, you wont get a guaranteed solution since I don't think the problem was ever fully sorted out. I have seen these sorts of errors myself in my own hosts over the years and my gut feeling is that the error can be triggered by too aggressive overclocking and/or excess heat. I have always been able to stop them from occurring by backing off the overclock just a bit and/or by giving the CPU heat sink/fan a good clean. I think that the problem is particularly temperature sensitive because these sorts of errors seem to want to crop up on hot days when the house is shut up.

Had any hot days in your locality recently? When did you last check the cleanliness of the fins on your heat sink? :-).

Intel really should be the subject of a class action against the total lack of fitness for purpose of their current stock HSF design. I haven't actually seen the HSF for an i7 but if it's anything like that for a Q8400, it would be really bad for how quickly it can block with fluff and how hard it is to service. I have to clean mine about every six months if I want to stop overheating and it's such a pain since you cannot actually remove the fan to clean the fins on a Q8400 without breaking the HSF/CPU interface and so having to redo the thermal paste and jump through the stupid push-pins dance once again. At least with my older Q6600 HSFs, I can remove just the fan to give them a good clean with a long bristle brush and a vacuum cleaner without actually having to remove the heat sink itself.

Cheers,
Gary.

XuS
XuS
Joined: 12 Aug 10
Posts: 2
Credit: 59162
RAC: 0

Thank you for answer. I

Message 99168 in response to message 99167

Thank you for answer.

I think overclocking is not the problem.
My intel i7 920 is very easy to overclock and the temperature is absolutly no problem. I tested my settings with prime95 without any errors.

I also use only 70% of the cpu time for boinc so my cpu has a temperature oc max 50° and in the single cores max 60<° which is absolutly no problem for this cpu.

Normally if caluclations errors appear they appear on a single thread of a programm. Whenn i tested my cpu settings, esspecially the cpu voltage i noticed that "undervoltaging" can be the cause of calculation errors becauaw there is not enoguh power for each cpu core.

I believe that the calculation error is not my fault. As I said Einstein@Home was running a week smoothly and then suddenly i got thise Global Correlations S5 work to do and errors occur. I will keep watching if more errors occure and if thats the fact i need to find the cause...

best regards, Daniel

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252172489
RAC: 33740

For the last three tasks your

Message 99169 in response to message 99168

For the last three tasks your computer reported as an error other participants returned successful results (wu1,wu2,wu3). This doesn't look like a general fault in the program or the setup of these workunits.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 780576859
RAC: 1208380

RE: I believe that the

Message 99170 in response to message 99168

Quote:

I believe that the calculation error is not my fault. As I said Einstein@Home was running a week smoothly and then suddenly i got thise Global Correlations S5 work to do and errors occur.

Hi
Not sure about Austria but in Germany it's been considerably hotter in the last two days compared to the week before.

CU
HB

Omur Ozbahceliler
Omur Ozbahceliler
Joined: 14 Aug 10
Posts: 1
Credit: 2551631
RAC: 0

I have those computation

Message 99171 in response to message 99170

I have those computation errors too. I've tried my CPU with OC or without OC. It keeps giving error 90% of processes. I'm running other projects too and totally no problem with them.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 780576859
RAC: 1208380

RE: I have those

Message 99172 in response to message 99171

Quote:
I have those computation errors too. I've tried my CPU with OC or without OC. It keeps giving error 90% of processes. I'm running other projects too and totally no problem with them.

Hi!

You seem to have a different problem: all your CUDA tasks fail with a error message indicting the process could not be be stared. There isn't anything else in the logs, maybe your Messages tab of BOINC Manager is displaying something more useful.

Are you running other Projects as well that use the NVIDIA graphics card??

CU
HB

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 1

CreateProcess() failed -

Message 99173 in response to message 99171

CreateProcess() failed - Access is denied. (0x5) explains what's happening on your system.

P.S: Bikeman, should I send you a new keyboard? ;-)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118661563816
RAC: 19093702

RE: I think overclocking is

Message 99174 in response to message 99168

Quote:
I think overclocking is not the problem.


OK. Having satisfied yourself about overclocking/overheating, the next step is to consider RAM and then the PSU. From your previous comments, I assume you will certainly know how to check these.

I've been running E@H apps for well over 5 years on a very large number of hosts of various vintages and in situations where the apps were much less mature than they are now. Of course it could still be an app problem. I've personally dealt with lots of compute errors over the years. Just about every single time I was ready to blame the app for new compute errors, I was able to find a hardware issue that, when fixed, caused the errors to disappear.

Overclocking/heat were the major causes, but things like faulty RAM, swollen capacitors, flakey PSUs, etc, have all played their part. I would have done around 20 motherboard repairs for failing caps that were producing erratic behaviour - reboots, BSODs, and lockups of various types, including compute errors. I've also done cap replacements on graphics cards and in PSUs. In your case, with presumably a newish motherboard, I wouldn't suspect bad caps - well not yet ;-).

Good luck with tracking down whatever is doing it.

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 325597527
RAC: 72805

RE: Intel really should be

Message 99175 in response to message 99167

Quote:
Intel really should be the subject of a class action against the total lack of fitness for purpose of their current stock HSF design. I haven't actually seen the HSF for an i7 but if it's anything like that for a Q8400, it would be really bad for how quickly it can block with fluff and how hard it is to service. I have to clean mine about every six months if I want to stop overheating and it's such a pain since you cannot actually remove the fan to clean the fins on a Q8400 without breaking the HSF/CPU interface and so having to redo the thermal paste and jump through the stupid push-pins dance once again. At least with my older Q6600 HSFs, I can remove just the fan to give them a good clean with a long bristle brush and a vacuum cleaner without actually having to remove the heat sink itself.


Hear hear. Their stock fan/heatsinks are crap for the i7 series - as it's almost precisely the same one for a Q8400 ( slightly different due to the socket ). I've replaced all of mine ( and some of my friend's ) with 3rd party products. My favorite is this one mainly because it clamps the CPU between the cooler base and a supplied brace/frame that attaches on the other side of the mobo using hex bolts/nuts. I get CPU/cores temps in the low 40's when running E@H flat chat.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118661563816
RAC: 19093702

RE: My favorite is this one

Message 99176 in response to message 99175

Quote:
My favorite is this one mainly because it clamps the CPU between the cooler base and a supplied brace/frame that attaches on the other side of the mobo using hex bolts/nuts.


For a long time, I've resisted buying after market coolers because with various cheap and easy mods, I've been able to get sufficient improvement to stock coolers for it not to be necessary. I'm even prepared to put up with Q6600 HSFs because I can clean them quickly. I'm no longer prepared to put up with the Q8400/E6500/E3300 HSF problem because it is so time consuming to deal with.

As I pondered the Intel problem, I decided to experiment with AMD after an absence of many years. I've built 8 machines based on Asrock mobos (mainly N68-S3 UCC - $55) with Phenom II x2 555BE processors ($109). These have unlocked multipliers and the good chance of unlocking the disabled cores and so ending up with quad core machines. Of the 8, I've ended up with 7 quads and one tri-core. Even that one runs as a quad but is not fully stable and can't be overclocked. All the others can - some more than others. As a tri-core that one overclocks quite nicely.

When I started building these, my frustration with Intel HSFs caused me to look for a budget cooler that would significantly outperform either the AMD or Intel stock HSF. I also wanted to have something that didn't require mobo removal and the ability to swap between Intel/AMD at will. I decided on this one, (locally $26), mainly on the basis of this review. That site is a very good source of reviews on rather diverse hardware like CPU coolers and PSUs, etc. They have a later review of a newer version of this TX3 cooler which they say isn't as good as the earlier model, but I reckon I know the reason for the poorer performance of the newer model, even when there is really no significant hardware design change.

Years ago, I noticed that all heatsinks tend not to be truly flat, unless they have had special preparation. If I need to improve cooling, I decided that I would always lap a heatsink. I have a nice sturdy small sheet of plate glass and a series of wet-or-dry papers - from 220 grit to 1600 grit. My experience is that most heatsinks are convex - some quite badly so. After 20mins lapping they are all nicely flat. So I reckon the review site probably tested a dud when they did their followup test of the newer model. Until you lap them, it's hard to tell how bad they might be. Most of the ones I've bought were not too bad but I did get a couple of real shockers.

Another problem I have is that I've used existing SFF desktop cases that were available once I decommissioned my Tualatin Celerons. The cases are very nice to work with and are well engineered but they are not tall enough to accommodate a tower design cooler. I built the Phenoms in tower cases so the TX3 coolers work nicely. If I decide to change the cooling for the Intels (without changing the cases) my best option is something like this, which is designed to fit SFF cases. I'm trialling one at the moment (it was also improved by lapping) and it is handling an overclocked x4 Phenom with no problems. The cooling is definitely not as good as with the TX3 but it is significantly better than stock cooling. Locally they cost $34. So the TX3 at $26 is quite good value.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.