Bernd, I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?
I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.
I've done a very short test with this command line and the CPU utilization dropped from a full thread (shown as ~15% in windows task manager on my 4 core, 8 thread i7) down to 1-2%. But it also had the effect that the GPU utilization dropped from always being above 97% when running x2 down to mostly showing 0% with spikes up to ~50% every other second.
With these performance numbers I predict (based on very rough eyeballing) that tasks will take multiple hours to complete compared to ~30 min with a full CPU thread as support.
My conclusion as of now is to continue without the command line in place and let the GPU tasks have a full CPU thread as support.
Bernd, I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?
Yes, the computation code is identical to 1.18, and if you don't pass it any additional command-line arguments, it will behave exactly the same.
In the optimal case computation should not be slowed down by putting the CPU to sleep while the GPU works, but finding the optimal sleep times for a particular setup (GPU, CPU, HT, thread priority, parallel tasks) can be tedious. Thus the feature of "auto-tuning" (negative value to --sleepTimeFFT) was introduced, in the hope that it would be helpful. It did help on the one particular system that I tested it with. If it doesn't on yours, well, sorry for the confusion.
I know each host are diferent but in my particular host... https://einsteinathome.org/host/12316949
I try and noticed something interesting. Aparently the 1.20 builds are slower than the 1.18 by a small diference, with or without the sleepTimeFFT. With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.
My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?
FWIW I had better run times when I used "--sleepTimeFFT -1000" rather than -20 (the numerical values is the time in microseconds that gets reserved for the measuring itself).
With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.
My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?
With the additional parameters not set, there is one little conditional (if ...) more to process by the CPU after each kernel launch, i.e. while the kernel is running on the GPU. This is the only difference between 1.18 and 1.20, and should only matter on a very slow CPU (and fast GPU).
As far as I can see the runtime difference is ~2%. We try to keep our WUs of equal size (at least the ones that get the same credit), but our prediction of the run time for a specific workunit isn't perfect. We usually tolerate a variation of up to 5-10%. Maybe you were just unlucky picking up tasks. How many tasks are these numbers based on?
The parameter puts the CPU to sleep while the GPU is working (on the FFT). If that is tuned correctly, this shouldn't affect the overall run time at all. If this is too large, the CPU sleeps too long to take back over after the GPU is done, and the overall run time increases. If the parameter is too small and the CPU wakes up too early, you see little or no effect on the CPU utilization.
In the recent past I have noticed that these tasks take slightly different times to finish when using the same app.
When I look at results of my Fury machine, initially a task took 520 s to finish, later for a few days (weeks?) it was 420-440 s, then since Feb-14 it's 520 s again. All is v1.18.
So the difference in run time observed might be due to such different task favors (different search params or work sets?) rather than the new application.
I was very pleased with the improvement in throughput that .18 gave my GTX660 running 2 at a time . I ran the four .19 aps I was sent and saw no change and since then .20 apps. Throughput is the same as .18 but the screen lag is greatly reduced. As an Nvidea user I eagerly await the Cuda app.
What is the correct way to remove the cmdline from an application? If I remove it from the app_config.xml and restart Boinc, the cmdline is not removed from client_state.xml but the last version of it still remains there and it will be used by the application when Boinc is restarted. I am using Win 7 x64 Boinc 7.6.22. I tried also going from <app_version></app_version> to <app></app> tags which do not have the <cmdline></cmdline> available but that did not remove it either.
I had to edit the client_state.xml manually to remove the <cmdline></cmdline> and this is always risky.
Could I have typed <cmdline> </cmdline> with just an empty " " to remove it?
Bernd, I'm assuming if we
)
Bernd, I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?
Zalster
Bernd Machenschalk wrote:I'm
)
I've done a very short test with this command line and the CPU utilization dropped from a full thread (shown as ~15% in windows task manager on my 4 core, 8 thread i7) down to 1-2%. But it also had the effect that the GPU utilization dropped from always being above 97% when running x2 down to mostly showing 0% with spikes up to ~50% every other second.
With these performance numbers I predict (based on very rough eyeballing) that tasks will take multiple hours to complete compared to ~30 min with a full CPU thread as support.
My conclusion as of now is to continue without the command line in place and let the GPU tasks have a full CPU thread as support.
Zalster wrote:Bernd, I'm
)
Yes, the computation code is identical to 1.18, and if you don't pass it any additional command-line arguments, it will behave exactly the same.
In the optimal case computation should not be slowed down by putting the CPU to sleep while the GPU works, but finding the optimal sleep times for a particular setup (GPU, CPU, HT, thread priority, parallel tasks) can be tedious. Thus the feature of "auto-tuning" (negative value to --sleepTimeFFT) was introduced, in the hope that it would be helpful. It did help on the one particular system that I tested it with. If it doesn't on yours, well, sorry for the confusion.
Thanks for testing anyway!
BM
I know each host are diferent
)
I know each host are diferent but in my particular host... https://einsteinathome.org/host/12316949
I try and noticed something interesting. Aparently the 1.20 builds are slower than the 1.18 by a small diference, with or without the sleepTimeFFT. With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.
My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?
I'm promoting 1.20 out of
)
I'm promoting 1.20 out of "Beta" status to avoid a work shortage because of "Beta" restrictions.
BM
FWIW I had better run times
)
FWIW I had better run times when I used "--sleepTimeFFT -1000" rather than -20 (the numerical values is the time in microseconds that gets reserved for the measuring itself).
BM
With the additional
)
With the additional parameters not set, there is one little conditional (if ...) more to process by the CPU after each kernel launch, i.e. while the kernel is running on the GPU. This is the only difference between 1.18 and 1.20, and should only matter on a very slow CPU (and fast GPU).
As far as I can see the runtime difference is ~2%. We try to keep our WUs of equal size (at least the ones that get the same credit), but our prediction of the run time for a specific workunit isn't perfect. We usually tolerate a variation of up to 5-10%. Maybe you were just unlucky picking up tasks. How many tasks are these numbers based on?
The parameter puts the CPU to sleep while the GPU is working (on the FFT). If that is tuned correctly, this shouldn't affect the overall run time at all. If this is too large, the CPU sleeps too long to take back over after the GPU is done, and the overall run time increases. If the parameter is too small and the CPU wakes up too early, you see little or no effect on the CPU utilization.
BM
In the recent past I have
)
In the recent past I have noticed that these tasks take slightly different times to finish when using the same app.
When I look at results of my Fury machine, initially a task took 520 s to finish, later for a few days (weeks?) it was 420-440 s, then since Feb-14 it's 520 s again. All is v1.18.
So the difference in run time observed might be due to such different task favors (different search params or work sets?) rather than the new application.
-----
I was very pleased with the
)
I was very pleased with the improvement in throughput that .18 gave my GTX660 running 2 at a time . I ran the four .19 aps I was sent and saw no change and since then .20 apps. Throughput is the same as .18 but the screen lag is greatly reduced. As an Nvidea user I eagerly await the Cuda app.
What is the correct way to
)
What is the correct way to remove the cmdline from an application? If I remove it from the app_config.xml and restart Boinc, the cmdline is not removed from client_state.xml but the last version of it still remains there and it will be used by the application when Boinc is restarted. I am using Win 7 x64 Boinc 7.6.22. I tried also going from <app_version></app_version> to <app></app> tags which do not have the <cmdline></cmdline> available but that did not remove it either.
I had to edit the client_state.xml manually to remove the <cmdline></cmdline> and this is always risky.
Could I have typed <cmdline> </cmdline> with just an empty " " to remove it?