Observations on FGRBP1 1.18 for Windows

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225414931
RAC: 1043822
Topic 204689

A few hours ago, in the Technical News forum Bernd announced that version 1.18 has been released with beta status.  His announcement and many subsequent observations posted in that thread document an appreciable speedup compared to version 1.17 on a wide variety of hosts.  Very roughly a typical speedup might be 25%.

I have four hosts currently running Einstein GPU work, three of which have dual GPUs.  I suspended in-process 1.17 work in order to get an early look at 1.18 behavior on all seven GPU cards (Nvidia types 750Ti, 970, 1050, 1060, and 1070).  On all save one all seems well, with successful completions in appreciably shorted elapsed time.

As it happens, the single-GPU host, which has had some occasional troubles in recent weeks, but had run fine for days, ran OK for about 20 minutes with a single 1.18 task.  But very shortly after I added a second it failed.  Further it failed in such a way that subsequent 1.18 and 1.17 tasks failed very quickly.  In other words the system had somehow gotten into a lethal condition.  A full power-down reboot cleared this condition, and the system resumed apparently normal processing.  But shortly after I got brave and again allowed two 1.18 tasks, it failed again.  Now I can't do further testing as the project denies it new work until tomorrow on the grounds that the daily quota of 12 tasks is exceeded.  I understand that failures lower the task limit, but the system has only 27 error returns reported against it today, which I would not have expected to put me out of business.

Anyway, I can't experiment until the end of the task day (which I think is midnight UTC), but in the short term intend to forbid beta tasks, to see if 1.17 still works, and then to restrict to single task, in case two tasks is part of the issue here.

I imagine the tasks which actually ran for a while and then failed may have the most useful failure symptoms.  Here is the end of the stderr log for a few such:

Case 1:

% nf1dots: 31  df1dot: 3.344368011e-015  f1dot_start: -1e-013  f1dot_band: 1e-013
% Filling array of photon pairs
ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:923: clFinish failed. status=-36
ERROR: opencl_ts_2_phase_diff_sorted() returned with error 7343459
07:45:01 (4468): [CRITICAL]: ERROR: MAIN() returned with error '-36'
FPU status flags:  PRECISION
07:45:13 (4468): [normal]: done. calling boinc_finish(28).
07:45:13 (4468): called boinc_finish

Case 2:

% Filling array of photon pairs
Error during OpenCL FFT (error: -36)
ERROR: gen_fft_execute() returned with error 7343372
07:45:01 (4212): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:  PRECISION
07:45:13 (4212): [normal]: done. calling boinc_finish(69).
07:45:13 (4212): called boinc_finish

 

 

 

 

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Very interesting.  On your

Very interesting.  On your GTX 970, I see that the run times vary by a factor of 2, both for 1.18 and 1.17.  Do you know what might account for that (maybe running two WU at once)?  I haven't seen that much variation for my GTX 750 Ti's, which are on Win7 64-bit.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225414931
RAC: 1043822

Jim1348 wrote: On your GTX

Jim1348 wrote:
On your GTX 970, I see that the run times vary by a factor of 2, both for 1.18 and 1.17.

Well they don't, actually.  The issue is that the host (as two of my other three) has two dissimilar GPUs installed.  On that case BOINC only reports one upstream--generally the one with the higher CUDA capability, if there is a difference.  Often one can figure out which GPU actually ran a given WU by looking around in the stderr file for the string "GTX".  However the stderr files for the current Gamma Ray Pulsar search often (always?) lack this traditional entry.

Anyway, on that specific host the slow units ran on the GTX 750Ti, and the fast ones on the GTX 970.  Within model, the elapsed time variation from unit to unit is much, much less.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

archae86 wrote:Anyway, on

archae86 wrote:
Anyway, on that specific host the slow units ran on the GTX 750Ti, and the fast ones on the GTX 970.  Within model, the elapsed time variation from unit to unit is much, much less.

OK, that is actually more interesting to me, since I am considering switching work from the GTX 750 Ti's to either a GTX 960 or 970, and that shows the comparison directly.  Thanks.

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

On my HD7970 the "Validate

On my HD7970 the "Validate Error" rate is way up with 1.18.

So far it's 10 validate errors vs 21 valids. With 1.17 the validate error / validated ratio was much better.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117678639283
RAC: 35180239

archae86 wrote:... the slow

archae86 wrote:
... the slow units ran on the GTX 750Ti ...

What improvement are you seeing on the 750Ti?  Is it equivalent to what you get on the 970?

Thanks.

 

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

I've got 0 validate errors on

I've got 0 validate errors on my GTX970 for both 1.17 and 1.18.

A more optimized app will/should put more stress on the hardware so it might be a good time to check the running conditions and maybe make some adjustments to running parameters. Even a validate error once in a while might "invalidate" an overclock as the wasted time might be more than the gain from the overclock.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117678639283
RAC: 35180239

Logforme wrote:On my HD7970

Logforme wrote:

On my HD7970 the "Validate Error" rate is way up with 1.18.

So far it's 10 validate errors vs 21 valids ...

After this little scare, I've quickly checked a host that would have done the most 1.18s so far.  It has dual HD7850s.  It has 11 validated 1.18s, a lot more pending, and zero invalids.  So far so good, fingers crossed :-).

 

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225414931
RAC: 1043822

Gary Roberts wrote:What

Gary Roberts wrote:
What improvement are you seeing on the 750Ti?  Is it equivalent to what you get on the 970

 

Caveats: The sample sizes are not terribly large, and I clipped off a handful of outliers, believing them likely the result of mixed running as either to multiplicity or application, but without checking that to be true.  On the plus side, these are formal averages, not eyeball estimates.

The elapsed time of 1.18 as a fraction of 1.17 was considerably more improved for my 970

GTX 970 0.533

GTX 750Ti 0.642

Mind you, 0.642 is nothing to sneeze at, but I had not realized until you asked the question just how drastic the 970 improvement was.  I've not done the computation for my Pascal cards yet.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

I have two GTX 960s each fed

I have two GTX 960s each fed by a core of an i7-4790 running under Ubuntu 16.10.  Looking at the completion times (not all validated, but none invalid):

1.17 -> 2450 seconds

1.18 -> 1525 seconds

So the ratio is 0.622; very nice.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117678639283
RAC: 35180239

archae86 wrote:The elapsed

archae86 wrote:

The elapsed time of 1.18 as a fraction of 1.17 was considerably more improved for my 970

GTX 970 0.533

GTX 750Ti 0.642

Mind you, 0.642 is nothing to sneeze at, but I had not realized until you asked the question just how drastic the 970 improvement was.

Thanks very much for that!

The 970 value is pretty much in line with what Holmis reported in the Technical News thread.  I guess 970 owners in general will be highly delighted :-).

I have a couple of 750Tis so it will be good to see them improving their rather poor performance whilst using the previous version.  I might actually fire up a GTX650 and see if it gets any benefit.

 

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.