Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958852863
RAC: 713702

My observations of total time

My observations of total time per task (both elapsed and CPU time) agree with archae86.

But I haven't looked at instantaneous CPU loading (or GPU loading, come to that) during the course of a run. That might be worth doing sometime.

And @ Stef - make a note of which task is using high CPU, which one low CPU, so you can match them with total time at the end of the run. I think you might have them the wrong way round.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

You were right, the slow one

You were right, the slow one is the CPU-hog.
However I haven't noticed that with the BRP6 tasks I ran before, they were running with very low CPU load.
It doesn't matter to me as I keep one CPU thread free for the GPU tasks anyway.

There is only one question left for me: is it still worth running n tasks in parallel. I guess nobody has tested that yet.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958852863
RAC: 713702

So, of the two I have running

So, of the two I have running at the moment,

PM0007_01111_94_0 looks to be fast-running, predicted to use very little CPU, and indeed shows around 2% CPU usage in Windows Process Explorer (of an 8-core CPU, so say ~15% of a core, for Linux comparisons)

PM0007_01071_224_0 is slow-running, predicted to use a lot of CPU, and shows 6% CPU / 50% core usage.

Process Explorer is only showing me the GPU usage of my HD 4000 Intel GPU, zero for the NV GTX 670. And GPU-Z shows the total load on the 670, but doesn't break it down by process.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225254931
RAC: 1041742

RE: But I haven't looked at

Quote:
But I haven't looked at instantaneous CPU loading (or GPU loading, come to that) during the course of a run.


My informal observation is that the CPU loading seems pretty consistent throughout the run of a particular WU (so long as the companion task characteristics remain the same).

I've not been watching GPU loading. I'm running 2X on my 660s and 750s, and 3X on my 970, and have not yet attempted to find the preferred multiple for this application. I don't plan to try until higher CUDA-level version comes out, or seems unlikely to be distributed for a long time.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729203928
RAC: 1196987

RE: RE: But I haven't

Quote:
Quote:
But I haven't looked at instantaneous CPU loading (or GPU loading, come to that) during the course of a run.

My informal observation is that the CPU loading seems pretty consistent throughout the run of a particular WU (so long as the companion task characteristics remain the same).

In general the CPU load will decrease during the runtime of a sub-workunit (BRP6 workunits consist of a bundle of two sub-units). By how much the load will decrease and how quickly is data-dependent, but the general trend should always be a sawtooth-like curve with two teeth, so to speak (for the two sub-workunits).

Cheers
HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958852863
RAC: 713702

RE: RE: RE: But I

Quote:
Quote:
Quote:
But I haven't looked at instantaneous CPU loading (or GPU loading, come to that) during the course of a run.

My informal observation is that the CPU loading seems pretty consistent throughout the run of a particular WU (so long as the companion task characteristics remain the same).

In general the CPU load will decrease during the runtime of a sub-workunit (BRP6 workunits consist of a bundle of two sub-units). By how much the load will decrease and how quickly is data-dependent, but the general trend should always be a sawtooth-like curve with two teeth, so to speak (for the two sub-workunits).

Cheers
HB


Do you know whether the two sub-workunits will always be of consistent 'chewiness', for want of a better word? If they were, then the transition point will always be in the middle of the run, which will help us with the analysis.

Or maybe, I presume, the transition will be at the 50% point by definition, even if the two halves have different duration.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3524404597
RAC: 1528338

Is there a similar

Is there a similar improvement in performance expected for BRP4G too, since it uses the same application?

-----

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

So the long v1.50 finished

So the long v1.50 finished over night.

Here is a first summary of running 2 tasks parallel on a 750Ti:

Using v1.39 the average of 40 workunits was:
20366s runtime and 2103s CPU time.

The long v1.50 task (PM0007_01161_126_1) was:
22643s runtime and 5254s CPU time. (!)

The other six v1.49/v1.50 tasks I've done so far have taken pretty much the same time each and in average:
15942s runtime and 550s CPU time.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729203928
RAC: 1196987

RE: Is there a similar

Quote:
Is there a similar improvement in performance expected for BRP4G too, since it uses the same application?

That is a very good question. It's using the same application, but different search parameters, and to make things more complicated, the BRP4G tasks go out to a very special breed of GPUs (Intel GPUs integrated in the CPU, not dedicated GPUs ). Too many variables for me to make a good guess, we will try this later.

HBE

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729203928
RAC: 1196987

RE: Do you know whether

Quote:

Do you know whether the two sub-workunits will always be of consistent 'chewiness', for want of a better word? If they were, then the transition point will always be in the middle of the run, which will help us with the analysis.

Or maybe, I presume, the transition will be at the 50% point by definition, even if the two halves have different duration.

I think I remember seeing cases where the sub-units had quite a different 'chewiness', so the sub-task switch can happen at points other than 50% of the total runtime.

HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.