It would appear so. You do get occasional bad WUs which give repeated errors until the limit of 20 error results is reached and the WU is then automatically abandoned. You seem to be copping rather a lot. Unfortunately, they have to be weeded out manually if they are identified before the limit of 20 failures is reached.
I'll send Bernd an email and ask him to look into it.
We've have identified a corrupt dataset and cancelled all workunits/tasks analyzing it. They all bear a name like "p2030.20111103.G44.50-01.64.N.b*". We also identified some more stored data that hasn't been sent so far and removed it from the data pool. We furthermore came up with additional tests to prevent such corrupt data to enter our pre-processing chain.
Update: I just checked for any other suspicious datasets that already made into our work unit/task pool. I found none. Thus the situation should be back to normal now. Occasional validate errors might still occur of course.
Here is an example of somebody noticing a canceled WU. I've wondered why there seems to be always one apparently 'good' result in a long list of 'validate errors' when a WU is bad. I guess the answer is obvious. The validator is called when the second result arrives and checks that one first. When it discards the second result, it never gets around to actually checking the contents of the first result. Something like that, I imagine.
Some errors on my Intel Core Duo iMac 269029992 and 269030200.
Both have the same error code.
Both have the same reason why the validator is unhappy
Validate error [6] (00000010)
- result file has entries that aren't numbers
I have quite a few iMacs (not to mention all my Linux hosts) getting a number of these too. The Devs are aware of this and are trying to work out why this is happening. Some validate errors are the result of hardware problems at the client end. There are too many of these on Linux and OS X to be just due to hardware problems.
Unfortunately, a lot of Dev time is going into the test project so that the new apps (OpenCL and the new GW app) will be ready when needed. Our little problem is probably on the back burner. I'm sure it wont be forgotten and I'm sure we'll all be informed when progress is ultimately made. We just need to be patient.
Just like some previous ones of yours, the more detailed info about that one is
Validate error [6] (00000010)
- result file has entries that aren't numbers
Thanks, Gary. But what does not being a number mean? Accompanied by the wrong class code? Not recognised because of wrong precision or wrong-endedness? Not representing digits from a particular number base in a particular text encoding?
There is a separate error message when a value is outside an acceptable range so it's not just a matter of precision.
I would assume that certain fields in the result file simply contained values that were not numeric. I'm sure that I was told that the result file isn't human readable, although I don't think I've actually gone to the trouble to check this for myself. I have no information about what a bad value looks like but I assume that it could be just garbage as a consequence of a bug deep in the app code or in a math library (or wherever) that gets triggered with certain data values being processed. It's obviously not something that can be diagnosed easily.
Out of the last 8 Gamma ray WU on my Linux boxes (1 C7 and 1 Athon X2 4850e), only one (on the AMD) has been positively validated.
The next one that's just failed is: 268539068
with "Validate error (2:00000010)"
The C7 has another one in the pipe to be finished tomorrow. 2 Grav wave S6 tasks were successful the last 5 days.
I'll finish the last Gamma ray on the C7 and the one in work on the Athlon and then I'll postpone Gamma ray until there's an update available, that's supposed to improve this.
Sorry, but work that's nearly always coming out with an error doesn't get the project further on, I think.
... The next one that's just failed is: 268539068
with "Validate error (2:00000010)"
The full error message is
Validate error [6] (00000010)
- result file has entries that aren't numbers
which is quite common amongst these errors.
Quote:
I'll finish the last Gamma ray on the C7 and the one in work on the Athlon and then I'll postpone Gamma ray until there's an update available, that's supposed to improve this.
For anyone particularly troubled by this ongoing problem, turning off FGRP tasks in your preferences would be a sensible course of action. There's no similar issue with GW tasks.
RE: Nobody likes some
)
It would appear so. You do get occasional bad WUs which give repeated errors until the limit of 20 error results is reached and the WU is then automatically abandoned. You seem to be copping rather a lot. Unfortunately, they have to be weeded out manually if they are identified before the limit of 20 failures is reached.
I'll send Bernd an email and ask him to look into it.
Cheers,
Gary.
Hi, We've have identified
)
Hi,
We've have identified a corrupt dataset and cancelled all workunits/tasks analyzing it. They all bear a name like "p2030.20111103.G44.50-01.64.N.b*". We also identified some more stored data that hasn't been sent so far and removed it from the data pool. We furthermore came up with additional tests to prevent such corrupt data to enter our pre-processing chain.
Sorry for the inconvenience!
Oliver
Einstein@Home Project
Update: I just checked for
)
Update: I just checked for any other suspicious datasets that already made into our work unit/task pool. I found none. Thus the situation should be back to normal now. Occasional validate errors might still occur of course.
Best,
Oliver
Einstein@Home Project
Thanks for attending to this
)
Thanks for attending to this promptly.
Here is an example of somebody noticing a canceled WU. I've wondered why there seems to be always one apparently 'good' result in a long list of 'validate errors' when a WU is bad. I guess the answer is obvious. The validator is called when the second result arrives and checks that one first. When it discards the second result, it never gets around to actually checking the contents of the first result. Something like that, I imagine.
Cheers,
Gary.
Some errors on my Intel Core
)
Some errors on my Intel Core Duo iMac 269029992 and 269030200.
Both have the same error code.
RE: Some errors on my Intel
)
Both have the same reason why the validator is unhappy
I have quite a few iMacs (not to mention all my Linux hosts) getting a number of these too. The Devs are aware of this and are trying to work out why this is happening. Some validate errors are the result of hardware problems at the client end. There are too many of these on Linux and OS X to be just due to hardware problems.
Unfortunately, a lot of Dev time is going into the test project so that the new apps (OpenCL and the new GW app) will be ready when needed. Our little problem is probably on the back burner. I'm sure it wont be forgotten and I'm sure we'll all be informed when progress is ultimately made. We just need to be patient.
Cheers,
Gary.
RE: Just like some previous
)
Thanks, Gary. But what does not being a number mean? Accompanied by the wrong class code? Not recognised because of wrong precision or wrong-endedness? Not representing digits from a particular number base in a particular text encoding?
NG
NG
RE: ... what does not being
)
There is a separate error message when a value is outside an acceptable range so it's not just a matter of precision.
I would assume that certain fields in the result file simply contained values that were not numeric. I'm sure that I was told that the result file isn't human readable, although I don't think I've actually gone to the trouble to check this for myself. I have no information about what a bad value looks like but I assume that it could be just garbage as a consequence of a bug deep in the app code or in a math library (or wherever) that gets triggered with certain data values being processed. It's obviously not something that can be diagnosed easily.
Cheers,
Gary.
Out of the last 8 Gamma ray
)
Out of the last 8 Gamma ray WU on my Linux boxes (1 C7 and 1 Athon X2 4850e), only one (on the AMD) has been positively validated.
The next one that's just failed is: 268539068
with "Validate error (2:00000010)"
The C7 has another one in the pipe to be finished tomorrow. 2 Grav wave S6 tasks were successful the last 5 days.
I'll finish the last Gamma ray on the C7 and the one in work on the Athlon and then I'll postpone Gamma ray until there's an update available, that's supposed to improve this.
Sorry, but work that's nearly always coming out with an error doesn't get the project further on, I think.
RE: ... The next one that's
)
The full error message is
which is quite common amongst these errors.
For anyone particularly troubled by this ongoing problem, turning off FGRP tasks in your preferences would be a sensible course of action. There's no similar issue with GW tasks.
Cheers,
Gary.