Observations on FGRBP1 1.18 for Windows

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Gary Roberts wrote:The 970

Gary Roberts wrote:
The 970 value is pretty much in line with what Holmis reported in the Technical News thread.  I guess 970 owners in general will be highly delighted :-).

And that was an eyeball estimate based on one task. Keeping up with the eyeballing I get the feeling that subsequent tasks have taken a bit more time so getting closer to Archae's calculated 53% runtime compared to 1.17.

And I do feel delighted and very happy about the new beta version! Laughing

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RX-480 running x3 (Ubuntu

RX-480 running x3 (Ubuntu 16.04 and  amdgpu-pro-16.50)

Average times from 1864s (1.17) to 1369s (1.18)  (over 20 tasks) 

That's also about 27%, well done Christophe.

This may tempt me to get the HD7990 out of retirement.

The new baby is off and running! 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225624931
RAC: 1051220

archae86 wrote:As it happens,

archae86 wrote:
As it happens, the single-GPU host, which has had some occasional troubles in recent weeks, but had run fine for days, ran OK for about 20 minutes with a single 1.18 task.  But very shortly after I added a second it failed.  Further it failed in such a way that subsequent 1.18 and 1.17 tasks failed very quickly.  In other words the system had somehow gotten into a lethal condition.  A full power-down reboot cleared this condition, and the system resumed apparently normal processing.  But shortly after I got brave and again allowed two 1.18 tasks, it failed again.  Now I can't do further testing as the project denies it new work until tomorrow on the grounds that the daily quota of 12 tasks is exceeded.  I understand that failures lower the task limit, but the system has only 27 error returns reported against it today, which I would not have expected to put me out of business.

So after midnight UTC I was indeed able to download fresh work to this problem host.  On examination I learned I had been overclocking it, to the tune of +150 core clock, +400 memory clock in Afterburner terms.  As I had several indications that the condition of the system had been marginal running previous work, I turned these down to +50/+250.

It has run a couple of complete 1.18 WUs at 1X.  A direct comparison of 1.17 vs 1.18 running 1X on this host shows the 1.18 running a reported 4 degrees C hotter.  If I was right at the edge before, this could easily have pushed me over.

However I had a disturbing incident while running 1.18 at 2X at these slower clocks.  As I had started two WUs together, I intended to pause one after both had got about 15% done, in order to get a better use of GPU by not having them in synch.  Promptly on suspending one WU (so another ready one would start) the screen went black for several seconds.  Afterward I got a pop up announcing a driver restart.  Also the active tasks (and the couple I had on deck but not suspended) promptly errored out.

Now that was really odd.  I think (but can't be sure) that I've lowered the clocks enough for it not to be a simple excess clock speed problem.  Possibly 1.18 is imperfectly compatible with this system in some other way.  Anyway, I intend to build some 1X time for confidence before exploring 2X behavior on this host again.

Meanwhile six other cards on three other hosts are happily running 2X 1.18 tasks at their unchanged clock settings.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

My two GTX 750 Ti's are each

My two GTX 750 Ti's are each fed by a core of an i7-4771 running under Win7 64-bit (373.06 drivers). 

These are minimally-overclocked cards running a single WU each at 1210 MHz, and are cool at 53 C. 

1.17 -> 4350 seconds

1.18 -> 2550 seconds

So the ratio is 0.586

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

Holmis wrote:IA more

Holmis wrote:
IA more optimized app will/should put more stress on the hardware so it might be a good time to check the running conditions and maybe make some adjustments to running parameters. Even a validate error once in a while might "invalidate" an overclock as the wasted time might be more than the gain from the overclock.

I'm sure you're right, the problem is not with the app (1.17 or 1.18), but with my card. It's very old by now and I guess it's heading for the recycle soon.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1887
Credit: 1411631173
RAC: 1183080

I hope this new version works

I hope this new version works faster for my cards so I can get back to work here to start my 13th year off full blast.

I tried a few on my 660Ti and 560Ti OC's and they took hours just running X2 on the quads

That was bad enough to not even try any on my 650Ti's or the 550Ti and they are all OC or SC

I miss the good old days with GPU only BRP's

n12365
n12365
Joined: 4 Mar 16
Posts: 26
Credit: 6491436572
RAC: 0

My 1060(6GB) is running two

My 1060(6GB) is running two at time with no CPU tasks on a i5-4690 under Windows 10 with 376.33 drivers.

1.17: 2529.29 seconds

1.18: 1610.58 seconds

Ratio: 0.637

 

walton748
walton748
Joined: 1 Mar 10
Posts: 94
Credit: 1512300642
RAC: 3463993

archae86 wrote: However I

archae86 wrote:

However I had a disturbing incident while running 1.18 at 2X at these slower clocks.  As I had started two WUs together, I intended to pause one after both had got about 15% done, in order to get a better use of GPU by not having them in synch.  Promptly on suspending one WU (so another ready one would start) the screen went black for several seconds.  Afterward I got a pop up announcing a driver restart.  Also the active tasks (and the couple I had on deck but not suspended) promptly errored out.

 

I observed the very same this host, it is running windows 7, no overclock on the card at all, all 4 processor cores nailed to turbo 4.2 GHz instead of 4.4GHz through bios and windows setting, ram at xmp setting, temperatures in the mid-60s degrees C max, mostly lower. The system sports a relatively new NVidia 1070 by MSI, NVidia driver 376.33.

The system had been up and crunching for a week or so without interruption or any other use. The errorred-out wus where standard 1.17s.

As I say it: one more observation is that I suffered these driver-resets by Windows when I shut down BOINC/Einstein a while ago, but with no effect on the Einstein-tasks then (well...they weren'd worked upon any longer when that happened, I'd say).

 

I could not reproduce the suspend error right now on my other host, but this one's been restarted recently, and the driver failure/reset on BOINC/Einstein shutdown applied here, too. I just did not pay too much attention then.
One more thing: it looks like validate failures start to occur after a week's uptime or so. This qualifies as an observation for me, as I am used to that my systems can crunch on for weeks prodocing valids only, but in reality I have seen it only twice now. For what that is worth. On the other hand, if I am right the problem kind of builds up so slowly that it is hard to diagnose fast.

 

Cheers,

Walton

 

 

walton748
walton748
Joined: 1 Mar 10
Posts: 94
Credit: 1512300642
RAC: 3463993

I could reproduce the suspend

I could reproduce the suspend failure on both hosts now. Uptime (= crunching time) is like one hour, so my suspicion probably points into the wrong direction.

<edit>Wow, that really wrecked it - did not recover after the driver reset</edit>

<edit2>I mean, the system as such recovered, but processing Einstein FGRBP1 did not. Sorry for being unspecific.</edit2>

Cheers,

Walton

Shafa
Shafa
Joined: 31 May 05
Posts: 53
Credit: 627005014
RAC: 0

For 64bit/Linux @ AMD, DDR2

For 64bit/Linux @ AMD, DDR2 or DDR3

Ratio 1.18 / 1.17 is mostly:

nVidia Fermi:  0.62-0.68  (GTX460,570,580,590)

nVidia Kepler: slightly below 0.6  (GTX760)

nVidia Quadro (Maxwell, M1000M, laptop): 1.18 N/A, always finish with errors after 17 seconds

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.