I am seeing an unusually high number of EWC errors across all my machines, in the time period 18 July through to 20 July. All EWC errors appear within the first few seconds of startup. The Validate Errors have occurred on 3 different hosts, with a mix of AMD/NV gpu's.
I have looked at a number of the error WU's and the same errors seem to be occurring with the wingmen as well. Is there a problem with this batch of data?
Gord
Copyright © 2024 Einstein@Home. All rights reserved.
BRP5 Validate Errors && Error While Computing
)
I have also seen a fair number of tasks fail in the same way across my fleet.
I've had a quick look around the top hosts for others with errors and found many machines have had the same problem.
The common factor being task names beginning PB0028_003xx_xxx_x
Could be bad batch of data but I have seen more than one instance where these tasks have gone on to be successfully completed and validated by wingmen.
Gav.
edit: example here
RE: The common factor being
)
Same here. I have 3 validate errors and 2 error while computing on BRP5 and all are PB0028_003 tasks.
RE: RE: The common factor
)
Also on one of my hosts with a Maxwell (GTX750) GPU.
Four "error while computing" outcomes on PB0028_003xx_xxx_x tasks, all with elapsed times between 2 and 3 seconds.
In case it might be of any use, here are links to the page with stderr output for my four:
error 1
error 2
error 3
error 4
Just checked the logs across
)
Just checked the logs across the whole of my fleet and found a total of 81 PB0028_003xx_xxx_x tasks have ended in error. Fortunately 61 of those tasks failed after only a few seconds, the remaining 20 crunched to completion only to end up as validate errors which is a tad annoying :)
Yes that is annoying. I surly
)
Yes that is annoying. I surly hope they get this data set figured out soon.
Compute error (Unhandled
)
Compute error (Unhandled Exception Detected...)
http://einsteinathome.org/task/446494511
http://einsteinathome.org/task/446434051
http://einsteinathome.org/task/446302306
http://einsteinathome.org/task/446006202
http://einsteinathome.org/task/445118418
Validate Error
http://einsteinathome.org/task/446087449
Regards
Jason
I wonder if we will be told
)
I wonder if we will be told what went wrong.
sry, wrong post...
)
sry, wrong post...
I realise my machine means
)
I realise my machine means very little but the amount of electricity alone wasted on some of these tasks seems a real shame.
What is going on here ?
And another Compute error (Unhandled Exception Detected...)
http://einsteinathome.org/task/446602736
RE: I realise my machine
)
Are you leaving a cpu core free to feed the gpu? At only 4gb of ram in that machine if not it could be at least part of the problem. The percentage of cpu usage shown in the Boinc Manager is apparently just a figment of someones imagination compared to what is really being used. If you are not try it and see if the errors go away, if they do not you have at least eliminated one possible cause.