Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Beyond

Joined: 28 Feb 05

Posts: 121

Credit: 2376996212

RAC: 5725454

RE: Be interesting to do a

28 May 2013 12:46:40 UTC

Message 115625 in response to message 115624

(moderation:

)

Quote:

Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250626132

RAC: 34473

RE: RE: Be interesting to

28 May 2013 14:27:06 UTC

Message 115626 in response to message 115625

(moderation:

)

Quote:

Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Possibly. Actually this morning I internally proposed to test the OpenCL App on NVidia. The problem is that on NVidia the OpneCL App produces numbers pretty different from those of ATI cards (and CPU Apps); the results won't "validate". We'll make another attempt at this as soon as time allows, but currently we (developers) all have our plates more than full.

Beyond

Joined: 28 Feb 05

Posts: 121

Credit: 2376996212

RAC: 5725454

RE: RE: RE: Be

28 May 2013 14:50:22 UTC

Message 115627 in response to message 115626

(moderation:

)

Quote:

Quote:
Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Possibly. Actually this morning I internally proposed to test the OpenCL App on NVidia. The problem is that on NVidia the OpneCL App produces numbers pretty different from those of ATI cards (and CPU Apps); the results won't "validate". We'll make another attempt at this as soon as time allows, but currently we (developers) all have our plates more than full. BM

One would think CUDA would be generally faster than OpenCL on NV cards. Strange that OpenCL on NV won't validate, although it's been said that NVidia hasn't been putting the resources into OpenCL development that AMD has. Another possibility for speed improvement might be CUDA 4.2. GPUGrid received a large performance boost when they migrated their app to 4.2 at least.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250626132

RAC: 34473

RE: RE: RE: RE: Be

28 May 2013 14:58:58 UTC

Message 115628 in response to message 115627

(moderation:

)

Quote:

Quote:
Quote:
Quote:
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.

What I've informally noticed is that NVidia GPUs are taking a larger credit drubbing from BPR5 than ATI/AMD GPUs. OpenCL perhaps more efficient on the longer WUs than CUDA?

Possibly. Actually this morning I internally proposed to test the OpenCL App on NVidia. The problem is that on NVidia the OpneCL App produces numbers pretty different from those of ATI cards (and CPU Apps); the results won't "validate". We'll make another attempt at this as soon as time allows, but currently we (developers) all have our plates more than full. BM

One would think CUDA would be generally faster than OpenCL on NV cards. Strange that OpenCL on NV won't validate, although it's been said that NVidia hasn't been putting the resources into OpenCL development that AMD has. Another possibility for speed improvement might be CUDA 4.2. GPUGrid received a large performance boost when they migrated their app to 4.2 at least.

See here.

Edit: corrected link.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2960166065

RAC: 713613

RE: See here. BM Edit:

28 May 2013 15:15:51 UTC

Message 115629 in response to message 115628

(moderation:

)

Quote:

See here.

BM

Edit: corrected link.

Actually, I think you meant here.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7227841589

RAC: 1102525

Eric_Kaiser wrote:Do you have

28 May 2013 15:45:54 UTC

Message 115630 in response to message 115614

(moderation:

)

Eric_Kaiser wrote:

Do you have an explanation for this?

No, I certainly don't know the internals of your system, nor just how the various bits of hardware and software that decide who wins a resource competition make their decisions. What I do have is considerable evidence that the WUs themselves are closely equivalent in computational requirements (unlike SETI now, and some Einstein work in the past, where WU computational requirements have varied substantially).

Eric_Kaiser wrote:

I can observe that the estimated runtime for not started BRP4 WUs vary from 20 minutes up to 120 minutes.

BTW for BRP5 current range of estimated runtimes is from 3,5hrs up to 11hrs.

This seems downright odd, though perhaps I don't understand your meaning. On my hosts, the estimated runtime for unstarted BRP4 and BRP5 WUs is very nearly same for all work of a given type on a given host (often identical to the second). The estimate moves up and down over time, probably as completed (or reported?) work comes in over or under estimate, but I never see two unstarted BRP WUs of the same type on the same host showing appreciably different estimated times. I assume here we are talking about the column in the boincmgr task pane titled "Remaining (estimated)" or the equivalent column in BoincTasks Tasks pane titled "time left".

I actually do have a concrete investigative suggestion, in case you are either interested in checking my equivalence assertion on your own hardware, or interested in investigating effects. As the fundamental issue is competition for shared resources, the first step is to eliminate nearly all sharing. I suggest:

1. suspend all projects save Einstein.
2. use the Web page Computing preferences "On multiprocessors, use at most: Enforced by version 6.1+" parameter to allow only one single pure CPU job. I think for your 6-core i7-3930K running hyperthreaded that the value 8 (%) in this field would do the job. Note that this limitation does NOT limit in any way the CPU support tasks for the GPU jobs--but it will greatly reduce the competition for memory bandwitch and peripheral bus bandwidth on your motherboard. Probably more importantly, it should cut latency--the elapsed time after your GPU job wants some service until service actually begins.
3. cut back to a single GPU task. GPU tasks don't actually run "in parallel", but rather swap back and forth quite rapidly among the number you allow which can maintain most of their computing state within the GPU between swaps. If some effect causes one of the two active tasks to get the "attention" of the GPU back more quickly than the other, then the unattended one will report a longer elapsed time, despite consuming no more resource.

If you run that way for a very few tasks (it will go quicker with BRP4 tasks), I think you'll see that your hardware runs the tasks with very little elapsed time variation when it is not doing so very much adjudication of competing resource requests.

Then, if your curiosity extends to further investment of your time and loss of credit, you could gradually put back elements of your standard configuration, and observe the effects.

I'd understand and not disagree if your curiosity does not extend to this much work and credit loss.

Eric_Kaiser

Joined: 7 Oct 08

Posts: 16

Credit: 25699305

RAC: 0

@archae86 That's a good hint.

28 May 2013 16:56:10 UTC

Message 115631 in response to message 115630

(moderation:

)

@archae86
That's a good hint. I will give it a try.
And yes I'm meaning the column with the remaining time that's being updated durinng calculations.
And once again yes this remaining time extremely varies on WUs of same type directly after download and while they are waiting to be executed.
But I will change the settings for Einstein and give it a try.

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4360

Credit: 3216843843

RAC: 2043498

Last three of BRP5 WUs have

28 May 2013 18:08:05 UTC

Message 115632

(moderation:

)

Last three of BRP5 WUs have errored out on error 1008 (demodulation failed), before that 10 were crunched thru successfully. Here's the link http://einsteinathome.org/host/7034402/tasks&offset=0&show_names=1&state=0&appid=23.

I restarted Boinc before the last one errored, that obviously did not help. Now I've restarted the host so I'll see if that helps.

tbret

Joined: 12 Mar 05

Posts: 2115

Credit: 4863891961

RAC: 149728

RE: Here's a graph

28 May 2013 18:10:31 UTC

Message 115633 in response to message 115624

(moderation:

)

Quote:

Here's a graph showing the effect of the BRP5 introduction at 4k/task on the daily credit for my 9 hosts with NVIDIA GPUs.

How much, if any, of this is due to the increase in "Pendings" caused by the effect of adding a new project which is slower to "Validate?"

I guess I'm asking if you have a factor in there to adjust for that.

You would expect there to be a temporary dip in RAC until the average age of Pendings for BRP5s reached the same as for BRP4s, wouldn't you?

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

RE: RE: Here's a graph

28 May 2013 19:40:59 UTC

Message 115634 in response to message 115633

(moderation:

)

Quote:

Quote:

Here's a graph showing the effect of the BRP5 introduction at 4k/task on the daily credit for my 9 hosts with NVIDIA GPUs.

How much, if any, of this is due to the increase in "Pendings" caused by the effect of adding a new project which is slower to "Validate?"

The graph is based purely on the local job log, so it's effectively "real time" (and therefore isn't affected by pendings). Essentially the calculation is "this is what you get if your work is all validated as correct". It's an accurate guide, as long as the host has a low error rate.

Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Forums › Technical News

Comment viewing options

Forums › Technical News