S6Bucket & S6LV1 compute errors

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0
Topic 196209

The following is an extract from Boinc logs regarding the compute errors.

Every single one of the gravitaional wave tasks errors out like this in under a second.

I 1st suspected a corrupted d/load of the client for this task, so completed all other tasks and then I reset the project.

Which got me the client files and new work including the 3 tasks in the log.

Obviously the reset did not cure the problem.

All other tasks run OK..

03/03/2012 13:36:15 | Einstein@Home | Computation for task h1_0433.05_S6GC1__770_S6BucketA_0 finished
03/03/2012 13:36:15 | Einstein@Home | Output file h1_0433.05_S6GC1__770_S6BucketA_0_0 for task h1_0433.05_S6GC1__770_S6BucketA_0 absent
03/03/2012 13:36:15 | Einstein@Home | Starting task h1_0433.05_S6GC1__769_S6BucketA_0 using einstein_S6Bucket version 101
03/03/2012 13:36:16 | Einstein@Home | Computation for task h1_0433.05_S6GC1__769_S6BucketA_0 finished
03/03/2012 13:36:16 | Einstein@Home | Output file h1_0433.05_S6GC1__769_S6BucketA_0_0 for task h1_0433.05_S6GC1__769_S6BucketA_0 absent
03/03/2012 13:36:16 | Einstein@Home | Starting task h1_0433.00_S6GC1__482_S6BucketA_1 using einstein_S6Bucket version 101
03/03/2012 13:36:18 | Einstein@Home | Computation for task h1_0433.00_S6GC1__482_S6BucketA_1 finished
03/03/2012 13:36:18 | Einstein@Home | Output file h1_0433.00_S6GC1__482_S6BucketA_1_0 for task h1_0433.00_S6GC1__482_S6BucketA_1 absent

Anone got any ideas on this problem, since I cannot disable those tasks in E@H settings, the tick box is grey'd out:-(
ps.
this is on my 2nd rig, my main rig has some of the same tasks but they are still pending, and it has a much different setup both cpu, gpu and os.

Regards

Cliff,

Been there, Done that, Still no damm T Shirt.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

S6Bucket & S6LV1 compute errors

Quote:

Obviously the reset did not cure the problem.

Your cure for Seti was:

Quote:
The cure that time was to reset the project and repair boinc, then get new tasks

Perhaps you haven't done enough before asking for tasks again. ;-)

Claggy

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Claggy, I guess

Hi Claggy,
I guess I wasnt clear, those error WERE the new tasks after reset:-(
So I still have a problem with that machine mangling the gravity wave tasks,
and I have no apparent way of preventing them being sent to the machine.

Other than using no new tasks, but that affects ALL tasks.

I also have no idea why the problem exists or how to cure it.

CPU is an AMD Phenom IIx4 965 Family 16 mod 4 step3.
GPU's are GTS450
and GTS520
Nvidia driver 285.62

The machine processes other GPU & CPU tasks without problem, both E@H and S@H
its just those category of tasks that error out every time

Oh, and my main rig is actually processing those type of tasks and they seem ok.
So I'm no longer worried that its like smallpox and catching:-)

Cheers,

Cliff,

Been there, Done that, Still no damm T Shirt.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

RE: Hi Claggy, I

Quote:
Hi Claggy,
I guess I wasnt clear, those error WERE the new tasks after reset:-(


I guess i wasn't clear too, did you do a Boinc repair install this time too?

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2961015980
RAC: 695500

RE: RE: Hi Claggy,

Quote:
Quote:
Hi Claggy,
I guess I wasnt clear, those error WERE the new tasks after reset:-(

I guess i wasn't clear too, did you do a Boinc repair install this time too?

Claggy


I don't see what good that could do. The only possible 'repair' work it could do on the data directory is to check and, if necessary, reset the access permissions - and I see no sign of this being a permission problem.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Claggy, No, I

Hi Claggy,
No, I had a load of seti tasks running. And I thought it was an E@H client problem. Not a boinc one.

Cliff,

Been there, Done that, Still no damm T Shirt.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

All the error ones seem to

All the error ones seem to mention "invalid float operation" which would suggest your CPU is faulty or the app isn't really an SSE2 one. Have you run any diagnostics on it?

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

I've been looking at the cpu

I've been looking at the cpu temp, its been running too hot, so I've added cooling and decreased cpu usage in local options and that has reduced the heat being generated.

I've been told elsewhere that those tasks are particularly intensive on both cpu and memory.

As it stands I've stopped task fetch until such time as I can find a way of further reducing the heating problem.

That cpu is a BE one, but its not being overclocked, so I may try underclocking it at some time.

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 734256658
RAC: 1292755

Hi This looks like a

Hi

This looks like a particular mysterious problem that only some hosts are affected by. This could be a bug in the app or BOINC code, I would not worry too much about the computer. No workaround is known at this time. I notified the devs.

Thanks for the report,
HB

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

Hello, here is a 64bit Linux

Hello, here is a 64bit Linux machine showing that FPU exception, all analogous to what you described. It is a dual Xeon X5675, hyperthreading switched off, summing up to 12 cores. With Linux, the problem disappears when running the same E@H app with the same kernel in an older (2 years) Linux environment with the then current 64bit BOINC 6.10.58. When you go back to BOINC 6.10...60, does E@H then run those h1_* jobs for you? The problem also goes away (Linux) when running the same 64 bit kernel with a 32 bit userland, i.e. the 32bit BOINC client version (tested 7.0.15) using 32bit libraries. Maybe you can investigate that, too. For details on your CPU you may use the fine CPU-Z utility http://www.cpuid.com/softwares/cpu-z.html .

Cheers,

Steffen

vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz
stepping        : 2
microcode       : 0x15
cpu MHz         : 3068.000
cache size      : 12288 KB
physical id     : 1
siblings        : 6
core id         : 10
cpu cores       : 6
apicid          : 52
initial apicid  : 52
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 6133.55
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual

A.M.
A.M.
Joined: 14 Jun 06
Posts: 15
Credit: 66121829
RAC: 0

(See also:

(See also: http://einsteinathome.org/node/196210)

I am (still) having problems with Gravitational Wave tasks. All of them error out in under 2 seconds, with the following line in stderr:

C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_S6Bucket_1.01_windows_intelx86__SSE2.exe
 caused a Float Invalid Operation at location 0055406a in module C:\ProgramData\BOINC\projects\
einstein.phys.uwm.edu\einstein_S6Bucket_1.01_windows_intelx86__SSE2.exe.

or

C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_S6LV1_1.10_windows_intelx86__SSE2.exe
 caused a Float Invalid Operation at location 005699c3 in module C:\ProgramData\BOINC\projects\
einstein.phys.uwm.edu\einstein_S6LV1_1.10_windows_intelx86__SSE2.exe.

This http://einsteinathome.org/host/4671431 host only.

Is any progress on this problem being made (or attempted)?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.