We were warned by the powers that be to expect much more variability ...
And so it came to pass. First two results are back from 1001562: although they both ran at the same time (maybe ~5 mins stagger in start times) on the same GTX 750 Ti, there's a 20% difference in run time, and a 400% difference in CPU time.
Mind you, even the slower of the two tasks is practically half the runtime of the v1.39 application that was running before - and even better on CPU time.
Recursion too deep; the stack overflowed.
(0x3e9) - exit code 1001 (0x3e9)
Activated exception handling...
[06:42:27][8388][INFO ] Starting data processing...
[06:42:27][8388][ERROR] No suitable CUDA device available!
[06:42:27][8388][ERROR] Demodulation failed (error: 1001)!
06:42:27 (8388): called boinc_finish(1001)
All beta tasks have failed within the first 2-3 seconds of starting. Using driver 340.52 here. I have set two of my AMD GPU machines to allow the beta app and thus far (only a few minutes) they are running correctly.
Bernd posted in Technical News that he had made a change which should help certain problems, but would require project reset on the affected hosts to propagate the change. As he mentioned the "No suitable CUDA device" error, which is part of our syndrome (though he also referred to "New clients" suggesting he might be referring to another), I wondered if this "fix" was for the problem discussed here, and did a project reset on one of my two affected clients. All this got me was complete loss of my work in progress and a quick succession of additional failures on newly downloaded beta work of the excess recursion/stack overflow/no suitable device type we discuss here. I quickly disabled acceptance of beta applications, to get the machine back in constructive work on non-beta Parkes.
Bernd posted in Technical News that he had made a change which should help certain problems, but would require project reset on the affected hosts to propagate the change. As he mentioned the "No suitable CUDA device" error, which is part of our syndrome (though he also referred to "New clients" suggesting he might be referring to another), I wondered if this "fix" was for the problem discussed here, and did a project reset on one of my two affected clients. All this got me was complete loss of my work in progress and a quick succession of additional failures on newly downloaded beta work of the excess recursion/stack overflow/no suitable device type we discuss here. I quickly disabled acceptance of beta applications, to get the machine back in constructive work on non-beta Parkes.
Let me do some digging around to figure out exactly what he changed, and whether it applies duct tape to either of our problems ("Recursion too deep; the stack overflowed." or "No suitable CUDA device available!"). The latter was the only one he mentioned, and I've added my comments to the Tech thread.
Won't be until later this afternoon, when current tasks finish.
Many thanks for testing this, and sorry for the inconveniences.
The problem reported here with respect to the BRP6 CUDA Beta apps is sufficiently widespread so I deprecated these app versions for the moment, we will look into this in more detail on Monday.
I have a suspicion that our update of the BOINC API lib version, compared to the official app, has some role in this. The part of the BOINC API code that selects the "right" GPU for an app instance to run on was always a bit tricky and underwent many changes, I would not be surprised to find out that between this mechanism and our app code that deals with this, something broke (again).
Again, thanks for testing, detecting problems like these is the whole point of a Beta test. I let the OpenCL versions of the Beta apps active for the moment.
Deprecating the app makes it a touch difficult to carry on testing and isolating the problem ;) :P
Never mind, I've got the Beta files now, and enough experience with 'rebranding' tasks from one app_version to another, to leave me some scope for experimenting later. I'll do my best to only trash one task at a time...
Deprecating the app makes it a touch difficult to carry on testing and isolating the problem ;) :P
Never mind, I've got the Beta files now, and enough experience with 'rebranding' tasks from one app_version to another, to leave me some scope for experimenting later. I'll do my best to only trash one task at a time...
I think we nailed down the problem already pretty well, and I really don't want to flood volunteers over the weekend with failing tasks until their daily quota is running out. Thanks for all the useful input!
RE: We were warned by the
)
And so it came to pass. First two results are back from 1001562: although they both ran at the same time (maybe ~5 mins stagger in start times) on the same GTX 750 Ti, there's a 20% difference in run time, and a 400% difference in CPU time.
Mind you, even the slower of the two tasks is practically half the runtime of the v1.39 application that was running before - and even better on CPU time.
I don't get the linux version
)
I don't get the linux version 1.47 working. All jobs errored out.
With the recent driver 346.47
http://einsteinathome.org/task/487117512
Same issue here with my
)
Same issue here with my GTX660Ti host 10698787
7.4.36
Recursion too deep; the stack overflowed.
(0x3e9) - exit code 1001 (0x3e9)
Activated exception handling...
[06:42:27][8388][INFO ] Starting data processing...
[06:42:27][8388][ERROR] No suitable CUDA device available!
[06:42:27][8388][ERROR] Demodulation failed (error: 1001)!
06:42:27 (8388): called boinc_finish(1001)
All beta tasks have failed within the first 2-3 seconds of starting. Using driver 340.52 here. I have set two of my AMD GPU machines to allow the beta app and thus far (only a few minutes) they are running correctly.
Until now all wu's ended with
)
Until now all wu's ended with exit status 1001 and empty stderr
Edit: sorry, I forgot: both pc's are windows x64 pc's with nVidia cards / latest available drivers/non beta. One is win7 and one is win10
Add my GTX 670 host 5744895
)
Add my GTX 670 host 5744895 to the list which can't run v1.47 Beta, but can run previous v1.39.
Win7/64, BOINC v7.4.36 - it's the first 64-bit machine I've tried the Beta on, but I don't think that's the whole story.
Bernd posted in Technical
)
Bernd posted in Technical News that he had made a change which should help certain problems, but would require project reset on the affected hosts to propagate the change. As he mentioned the "No suitable CUDA device" error, which is part of our syndrome (though he also referred to "New clients" suggesting he might be referring to another), I wondered if this "fix" was for the problem discussed here, and did a project reset on one of my two affected clients. All this got me was complete loss of my work in progress and a quick succession of additional failures on newly downloaded beta work of the excess recursion/stack overflow/no suitable device type we discuss here. I quickly disabled acceptance of beta applications, to get the machine back in constructive work on non-beta Parkes.
RE: Bernd posted in
)
Let me do some digging around to figure out exactly what he changed, and whether it applies duct tape to either of our problems ("Recursion too deep; the stack overflowed." or "No suitable CUDA device available!"). The latter was the only one he mentioned, and I've added my comments to the Tech thread.
Won't be until later this afternoon, when current tasks finish.
Hi all Many thanks for
)
Hi all
Many thanks for testing this, and sorry for the inconveniences.
The problem reported here with respect to the BRP6 CUDA Beta apps is sufficiently widespread so I deprecated these app versions for the moment, we will look into this in more detail on Monday.
I have a suspicion that our update of the BOINC API lib version, compared to the official app, has some role in this. The part of the BOINC API code that selects the "right" GPU for an app instance to run on was always a bit tricky and underwent many changes, I would not be surprised to find out that between this mechanism and our app code that deals with this, something broke (again).
Again, thanks for testing, detecting problems like these is the whole point of a Beta test. I let the OpenCL versions of the Beta apps active for the moment.
Cheers
HB
Deprecating the app makes it
)
Deprecating the app makes it a touch difficult to carry on testing and isolating the problem ;) :P
Never mind, I've got the Beta files now, and enough experience with 'rebranding' tasks from one app_version to another, to leave me some scope for experimenting later. I'll do my best to only trash one task at a time...
RE: Deprecating the app
)
I think we nailed down the problem already pretty well, and I really don't want to flood volunteers over the weekend with failing tasks until their daily quota is running out. Thanks for all the useful input!
Cheers
HB