BRP5 Validate Errors && Error While Computing

The Xorcist
The Xorcist
Joined: 16 Aug 11
Posts: 16
Credit: 464281554
RAC: 0

Hi Mikey, Thanks for

Hi Mikey,

Thanks for responding to my issues I very much appreciate it.

I dont do any cpu work so all my cores are available for gpu computing.

While 4gb ram may be on the low side I believe that its more than enough. If its possible that more than 4gb ram is now a minimum requirement I would have sincerely appreciated being alerted.

Personally I believe some of these tasks are just broken. Theres too many other machines exhibiting the same issues.

From what I understand unhandled software exceptions are considered bad form regardless of the reason that caused them.

Regards,
Jason

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

Seems the admins have acted

Seems the admins have acted and cancelled this bad batch of workunits. All "my" tasks that caused errors are now flagged with "WU cancelled".
That's all good and fine, just wish some more communication from the project to the "resources" (us) about it. Mushroom management doesn't apply well to volunteer workers :)

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1587675743
RAC: 752814

I hope so, I have wasted 13

I hope so, I have wasted 13 hrs of GPU time, that is not a good use of my scant resources.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

My Linux host is also having

My Linux host is also having validation errors from July 18th to the 23rd. Initially I thought something went wrong with one of my GPUs but these invalids are occurring across all three of my GPUs.

I checked some of the failed tasks and the ones that I checked appear to have failed on all hosts that attempted to run these tasks.

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

RE: Seems the admins have

Quote:
Seems the admins have acted and cancelled this bad batch of workunits


Here's a new PB0028_003 that is failing but has not been cancelled.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250399102
RAC: 34763

Apparently we had a set of

Apparently we had a set of "bad" beams, all following the name pattern "PB0028_003?1*". The remaining workunits of these have been canceled, the last of these only a few minutes ago.

These originally caught our attention by the number of validate errors these resulted in, with an unusual delay because of maintenance work on our monitoring system.

We spent a couple of hours investigating what exactly causes these client errors (general access violations), but couldn't exactly nail it down. For some reason, these errors only happen on Windows systems, not on OSX or Linux, although the code being used is the same (and passed valgrind).

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.