Gamma-ray pulsar binary search #1 on GPUs

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Bernd,  I'm assuming if we

Bernd,  I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?

 

Zalster

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Bernd Machenschalk wrote:I'm

Bernd Machenschalk wrote:
I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.

I've done a very short test with this command line and the CPU utilization dropped from a full thread (shown as ~15% in windows task manager on my 4 core, 8 thread i7) down to 1-2%. But it also had the effect that the GPU utilization dropped from always being above 97% when running x2 down to mostly showing 0% with spikes up to ~50% every other second.
With these performance numbers I predict (based on very rough eyeballing) that tasks will take multiple hours to complete compared to ~30 min with a full CPU thread as support.

My conclusion as of now is to continue without the command line in place and let the GPU tasks have a full CPU thread as support.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250652518
RAC: 34411

Zalster wrote:Bernd,  I'm

Zalster wrote:
Bernd,  I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?

Yes, the computation code is identical to 1.18, and if you don't pass it any additional command-line arguments, it will behave exactly the same.

In the optimal case computation should not be slowed down by putting the CPU to sleep while the GPU works, but finding the optimal sleep times for a particular setup (GPU, CPU, HT, thread priority, parallel tasks) can be tedious. Thus the feature of "auto-tuning" (negative value to --sleepTimeFFT) was introduced, in the hope that it would be helpful. It did help on the one particular system that I tested it with. If it doesn't on yours, well, sorry for the confusion.

Thanks for testing anyway!

BM

juan BFP
juan BFP
Joined: 18 Nov 11
Posts: 839
Credit: 421443712
RAC: 0

I know each host are diferent

I know each host are diferent but in my particular host... https://einsteinathome.org/host/12316949

I try and noticed something interesting.  Aparently the 1.20 builds are slower than the 1.18 by a small diference, with or without the sleepTimeFFT.  With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.

My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?

lHj2ixL.jpg

 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250652518
RAC: 34411

I'm promoting 1.20 out of

I'm promoting 1.20 out of "Beta" status to avoid a work shortage because of "Beta" restrictions.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250652518
RAC: 34411

FWIW I had better run times

FWIW I had better run times when I used "--sleepTimeFFT -1000" rather than -20 (the numerical values is the time in microseconds that gets reserved for the measuring itself).

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250652518
RAC: 34411

With the additional

With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.

My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?

With the additional parameters not set, there is one little conditional (if ...) more to process by the CPU after each kernel launch, i.e. while the kernel is running on the GPU. This is the only difference between 1.18 and 1.20, and should only matter on a very slow CPU (and fast GPU).

As far as I can see the runtime difference is ~2%. We try to keep our WUs of equal size (at least the ones that get the same credit), but our prediction of the run time for a specific workunit isn't perfect. We usually tolerate a variation of up to 5-10%. Maybe you were just unlucky picking up tasks. How many tasks are these numbers based on?

The parameter puts the CPU to sleep while the GPU is working (on the FFT). If that is tuned correctly, this shouldn't affect the overall run time at all. If this is too large, the CPU sleeps too long to take back over after the GPU is done, and the overall run time increases. If the parameter is too small and the CPU wakes up too early, you see little or no effect on the CPU utilization.

BM

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3527690935
RAC: 1465464

In the recent past I have

In the recent past I have noticed that these tasks take slightly different times to finish when using the same app.
When I look at results of my Fury machine, initially a task took 520 s to finish, later for a few days (weeks?) it was 420-440 s, then since Feb-14 it's 520 s again. All is v1.18.

So the difference in run time observed might be due to such different task favors (different search params or work sets?) rather than the new application.

-----

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1593572333
RAC: 771188

I was very pleased with the

I was very pleased with the improvement in throughput that .18 gave my GTX660 running 2 at a time .  I ran the four .19 aps I was sent and saw no change and since then .20 apps. Throughput is the same as .18 but the screen lag is greatly reduced. As an Nvidea user I eagerly await the Cuda app. 

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4363
Credit: 3218457129
RAC: 2043628

What is the correct way to

What is the correct way to remove the cmdline from an application? If I remove it from the app_config.xml and restart Boinc, the cmdline is not removed from client_state.xml but the last version of it still remains there and it will be used by the application when Boinc is restarted. I am using Win 7 x64 Boinc 7.6.22. I tried also going from <app_version></app_version> to <app></app> tags which do not have the <cmdline></cmdline> available but that did not remove it either.

I had to edit the client_state.xml manually to remove the <cmdline></cmdline> and this is always risky.

Could I have typed <cmdline> </cmdline> with just an empty " " to remove it?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.