Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Beyond
Beyond
Joined: 28 Feb 05
Posts: 121
Credit: 2376996212
RAC: 5725454

RE: Be interesting to do a

Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.


What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250626132
RAC: 34473

RE: RE: Be interesting to

Quote:
Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Possibly. Actually this morning I internally proposed to test the OpenCL App on NVidia. The problem is that on NVidia the OpneCL App produces numbers pretty different from those of ATI cards (and CPU Apps); the results won't "validate". We'll make another attempt at this as soon as time allows, but currently we (developers) all have our plates more than full.

BM

BM

Beyond
Beyond
Joined: 28 Feb 05
Posts: 121
Credit: 2376996212
RAC: 5725454

RE: RE: RE: Be

Quote:
Quote:
Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Possibly. Actually this morning I internally proposed to test the OpenCL App on NVidia. The problem is that on NVidia the OpneCL App produces numbers pretty different from those of ATI cards (and CPU Apps); the results won't "validate". We'll make another attempt at this as soon as time allows, but currently we (developers) all have our plates more than full. BM


One would think CUDA would be generally faster than OpenCL on NV cards. Strange that OpenCL on NV won't validate, although it's been said that NVidia hasn't been putting the resources into OpenCL development that AMD has. Another possibility for speed improvement might be CUDA 4.2. GPUGrid received a large performance boost when they migrated their app to 4.2 at least.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250626132
RAC: 34473

RE: RE: RE: RE: Be

Quote:
Quote:
Quote:
Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Possibly. Actually this morning I internally proposed to test the OpenCL App on NVidia. The problem is that on NVidia the OpneCL App produces numbers pretty different from those of ATI cards (and CPU Apps); the results won't "validate". We'll make another attempt at this as soon as time allows, but currently we (developers) all have our plates more than full. BM

One would think CUDA would be generally faster than OpenCL on NV cards. Strange that OpenCL on NV won't validate, although it's been said that NVidia hasn't been putting the resources into OpenCL development that AMD has. Another possibility for speed improvement might be CUDA 4.2. GPUGrid received a large performance boost when they migrated their app to 4.2 at least.

See here.

BM

Edit: corrected link.

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2960166065
RAC: 713613

RE: See here. BM Edit:

Quote:

See here.

BM

Edit: corrected link.


Actually, I think you meant here.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7227841589
RAC: 1102525

Eric_Kaiser wrote:Do you have

Eric_Kaiser wrote:
Do you have an explanation for this?


No, I certainly don't know the internals of your system, nor just how the various bits of hardware and software that decide who wins a resource competition make their decisions. What I do have is considerable evidence that the WUs themselves are closely equivalent in computational requirements (unlike SETI now, and some Einstein work in the past, where WU computational requirements have varied substantially).

Eric_Kaiser wrote:

I can observe that the estimated runtime for not started BRP4 WUs vary from 20 minutes up to 120 minutes.

BTW for BRP5 current range of estimated runtimes is from 3,5hrs up to 11hrs.


This seems downright odd, though perhaps I don't understand your meaning. On my hosts, the estimated runtime for unstarted BRP4 and BRP5 WUs is very nearly same for all work of a given type on a given host (often identical to the second). The estimate moves up and down over time, probably as completed (or reported?) work comes in over or under estimate, but I never see two unstarted BRP WUs of the same type on the same host showing appreciably different estimated times. I assume here we are talking about the column in the boincmgr task pane titled "Remaining (estimated)" or the equivalent column in BoincTasks Tasks pane titled "time left".

I actually do have a concrete investigative suggestion, in case you are either interested in checking my equivalence assertion on your own hardware, or interested in investigating effects. As the fundamental issue is competition for shared resources, the first step is to eliminate nearly all sharing. I suggest:

1. suspend all projects save Einstein.
2. use the Web page Computing preferences "On multiprocessors, use at most: Enforced by version 6.1+" parameter to allow only one single pure CPU job. I think for your 6-core i7-3930K running hyperthreaded that the value 8 (%) in this field would do the job. Note that this limitation does NOT limit in any way the CPU support tasks for the GPU jobs--but it will greatly reduce the competition for memory bandwitch and peripheral bus bandwidth on your motherboard. Probably more importantly, it should cut latency--the elapsed time after your GPU job wants some service until service actually begins.
3. cut back to a single GPU task. GPU tasks don't actually run "in parallel", but rather swap back and forth quite rapidly among the number you allow which can maintain most of their computing state within the GPU between swaps. If some effect causes one of the two active tasks to get the "attention" of the GPU back more quickly than the other, then the unattended one will report a longer elapsed time, despite consuming no more resource.

If you run that way for a very few tasks (it will go quicker with BRP4 tasks), I think you'll see that your hardware runs the tasks with very little elapsed time variation when it is not doing so very much adjudication of competing resource requests.

Then, if your curiosity extends to further investment of your time and loss of credit, you could gradually put back elements of your standard configuration, and observe the effects.

I'd understand and not disagree if your curiosity does not extend to this much work and credit loss.

Eric_Kaiser
Eric_Kaiser
Joined: 7 Oct 08
Posts: 16
Credit: 25699305
RAC: 0

@archae86 That's a good hint.

@archae86
That's a good hint. I will give it a try.
And yes I'm meaning the column with the remaining time that's being updated durinng calculations.
And once again yes this remaining time extremely varies on WUs of same type directly after download and while they are waiting to be executed.
But I will change the settings for Einstein and give it a try.

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4360
Credit: 3216843843
RAC: 2043498

Last three of BRP5 WUs have

Last three of BRP5 WUs have errored out on error 1008 (demodulation failed), before that 10 were crunched thru successfully. Here's the link http://einsteinathome.org/host/7034402/tasks&offset=0&show_names=1&state=0&appid=23.

I restarted Boinc before the last one errored, that obviously did not help. Now I've restarted the host so I'll see if that helps.

tbret
tbret
Joined: 12 Mar 05
Posts: 2115
Credit: 4863891961
RAC: 149728

RE: Here's a graph

Quote:

Here's a graph showing the effect of the BRP5 introduction at 4k/task on the daily credit for my 9 hosts with NVIDIA GPUs.

How much, if any, of this is due to the increase in "Pendings" caused by the effect of adding a new project which is slower to "Validate?"

I guess I'm asking if you have a factor in there to adjust for that.

You would expect there to be a temporary dip in RAC until the average age of Pendings for BRP5s reached the same as for BRP4s, wouldn't you?

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

RE: RE: Here's a graph

Quote:
Quote:

Here's a graph showing the effect of the BRP5 introduction at 4k/task on the daily credit for my 9 hosts with NVIDIA GPUs.

How much, if any, of this is due to the increase in "Pendings" caused by the effect of adding a new project which is slower to "Validate?"

The graph is based purely on the local job log, so it's effectively "real time" (and therefore isn't affected by pendings). Essentially the calculation is "this is what you get if your work is all validated as correct". It's an accurate guide, as long as the host has a low error rate.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.