Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1591912355
RAC: 769424

This validate error problem

This validate error problem probably will keep Bernd busy when he goes to work Monday. 

Jacob Klein
Jacob Klein
Joined: 22 Jun 11
Posts: 45
Credit: 114028547
RAC: 0

I'm also getting a bunch of

I'm also getting a bunch of validate errors, on my 2 hosts:
https://einsteinathome.org/host/12256783
https://einsteinathome.org/host/7330531

I'm very interested to know if these "invalid" results, might be due to an OpenCL problem in NVIDIA drivers versions 436.02 through 440.97.

We are currently tracking that issue, for some of SETI's OpenCL tasks, here:
https://setiathome.berkeley.edu/forum_thread.php?id=84694
https://setiathome.berkeley.edu/forum_thread.php?id=84780

Thoughts?

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1591912355
RAC: 769424

I don't think so in my

I don't think so in my case. 

NVIDIA GeForce GTX 1060 3GB (3072MB) driver: 425.31

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Jacob Klein

Jacob Klein wrote:
Thoughts?

I've got a bunch of validate errors with AMD cards too. I don't think it could be solely a problem with Nvidia driver here at Einstein.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18750580286
RAC: 7092828

Probably just the way the

Probably just the way the task parameters were configured.  Something is not quite right with the 2.02 app.

 

Jacob Klein
Jacob Klein
Joined: 22 Jun 11
Posts: 45
Credit: 114028547
RAC: 0

Okay. Thanks for replying,

Okay. Thanks for replying, fellow crunchers.

Alexander Favorsky
Alexander Favorsky
Joined: 18 Jun 16
Posts: 36
Credit: 176523961
RAC: 77849

The same thing for v2.02 -

The same thing: v2.02 - only pending and validate errors, no valids.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117690329191
RAC: 35086821

The reason for the validate

The reason for the validate error issue and that it is now fixed has been announced in the Tech News thread.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117690329191
RAC: 35086821

I've just checked the two

I've just checked the two hosts that I have running the V2.02 test app.  I had suspended crunching of GW work on them (and allowed some FGRPB1G work) when the validate error problem was discovered.  At that time there were a couple of validate errors and no valid results for GW tasks on each one.

When the small batch of FGRPB1G tasks finished on each, the GW work was restarted and at the present time (and just for the V2.02 app) one host has 8 valid, 41 pending and 1 validate error whilst the other has 9 valid, 46 pending and 0 validate errors.  Everything looks pretty good.  No invalid results on either machine.

It can probably be assumed that batches of validate error results, perhaps based on particular frequency bins are being fed back into the 'corrected' validator and so it will probably take some time for all these different frequencies to be processed fully.  As far as I can see, the validate error problem is disappearing pretty fast.

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Three of these today from one

Three of these today from one host:

Outcome:Computation error Client state:Compute error Exit status:114 (0x00000072) Unknown error code

The target internal file identifier is incorrect. (0x72) - exit code 114 (0x72)</message>

https://einsteinathome.org/task/893410246  https://einsteinathome.org/task/893410247  https://einsteinathome.org/task/893441228 

2 run less than a minute and 1 run more than two hours.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.