A walk to the AMD side

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7221544931
RAC: 972157

Peter van Kalleveen wrote:How

Peter van Kalleveen wrote:
How does this respond to the settings of amount of cpu threads you can specify in the boinc App_config? Did you vary this with the affinity or did you use the same ratio while testing?

I don't run any BOINC work on the machine save for Einstein Gamma-Ray Pulsar GPU units.  So there is no opportunity for BOINC to manage the amount of non-GPU work launched--it is not launching any.

To answer your question more directly, I had to go look to see what I had in the way of an app_config.xml.  The answer is that I have not changed it since May, 2017, when I configured <fraction_done_exact/> for three Einstein applications.  So nothing at all about "settings of amount of cpu threads" there on my machines. 

Peter van Kalleveen
Peter van Kalleveen
Joined: 15 Jan 19
Posts: 45
Credit: 250329645
RAC: 0

archae86 wrote:Cores 

archae86 wrote:

Cores   Average elapsed time
1       31:14
2       30:18
3       30:12
4       30:14
5       30:16
6       30:17

You’re experiment with feeding the gpu with different amounts of cpu percentage got me thinking.

I already noticed a bit of a boost disabling SMT on the CPU, but you lose half of the amount of threads so overall productivity, besides running bionic suffers a decent amount. So I just kept it on after that and gave one cpu thread per GPU WU to crunch.

Now my threadripper 2950x has a lot of cores, but it probably has less throughput on a thread then the I5 with a complete core.

Also the Radeon VII has a much higher throughput then the RX 570, so that would mean that the gpu still runs under fed.

Since you had a decent boost going from one to two cores I decided to follow.

I now have 4 threads/2cores assigned to each WU on the gpu. I run 2 WU units because at 3 the VII would get instable over longer time and give more errors or invalids.

I used to have in the upper 6min or low 7 min for a FGPR (I also run underclocked on the gpu so I know it’s a bit on the longside).

I got it down to 6.17 min, so that’s anywhere between half a minute and almost a full minute faster on average.

That’s worth the extra cores/threadsLaughing

cecht
cecht
Joined: 7 Mar 18
Posts: 1533
Credit: 2902258871
RAC: 2177360

On my Linux system with a

On my Linux system with a 2-core, 4-thread Pentium G5600, I have been running two RX 570s in mining bios with 0.5 GPU and 0.25 CPU specified in app_config.xml. In accord with past discussion threads, I found no real difference in task times when I increased to 1 CPU per task:

Cores   Average elapsed time
0.25    20:06
1       20:04

At 1 core/task, CPU utilization is ~11% on each core (thread).

How does no difference with fractional cores gibe with faster times with >1 cores per task?

I'm going to try 0.33 GPU per task now.

EDITED;Addition:
With 3x tasks, times were a little better for a few extra watts:

Cores   Average elapsed time
0.5     29:40

and CPU utilization was about 20%

With 2x tasks running on 2 CPU each (could run only one GPU), task times essentially were the same as with other 2x runs:

Cores   Average elapsed time
2       20:07

 

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3395826540
RAC: 2900097

archae86 wrote:A set of

archae86 wrote:

A set of observations on CPU affinity restrictions:

I recently raised the multiplicity on my second RX 570 from 2X to 3X, and got a nice little productivity improvement, with small enough power consumption increase to make me decide to keep it.  While I was fiddling, I decided to spend a couple of days running restriction (by the CPU affinity controls in Process Lasso) to varying numbers of cores on the 6-core non-hyperthreaded i5-9400F.  I averaged each test condition over several hours.  I continue to run the RX 570 at a -20% power limitation, imposed by MSIAfterburner, and with an Afterburner fan curve which has it reporting 62C GPU temperature most of the time.  This is an XFX brand RX 570 with the BIOS switch on the "mining" position.

The results were simple: restricting to a single core does noticeable harm, but the other five options are surprisingly similar with the (slightly) best result observed with three allowed cores (the same as the number of GPU tasks).  My longstanding observation that restricting the GPU support task to anything less than all available cores always does harm was not borne out.  This does support my long-standing advice that it is better to test than just to invoke "known truths" for these settings.

Cores   Average elapsed time
1       31:14
2       30:18
3       30:12
4       30:14
5       30:16
6       30:17

I can see the more concurrent tasks a GPU is running the more CPU threads you'll need dedicated. The crunch time at the end is a period where limited CPU time for GPU tasks can slow things down.

I run 2x tasks with 1 CPU thread open via Process Lasso. I didn't check GPU times like you did but for the most part GPU util stayed pegged with 1 CPU open with dips turning task switching/etc.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7221544931
RAC: 972157

My second RX 570 machine had

My second RX 570 machine had an anomaly during the night.  When I did my morning productivity logging I noticed a downtick in fleet production.  All three current tasks on the affect machine showed about six hours of run time, the GPU reported temperature was down by tens of degrees from usual, and the wall power meter showed 45 watts instead of 168.

Rebooting put the power levels up part way, but not the full way, so I downloaded and installed the latest AMD driver, selecting "clean install" in case some setting had gone amiss.

After all that, I have over an hour of successful running.  Oddly, the elapsed times which had been running about 30:10, are now improved to about 28:02.

As rebooting gave me some of the usual Windows indications of update activity, I speculate that perhaps my middle of the night "failure" may have been a consequence of Windows updating activity replacing or altering the state of my AMD driver in a way inconsistent with Einstein success.

 

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12682
Credit: 1839086599
RAC: 3840

archae86 wrote:My second RX

archae86 wrote:

My second RX 570 machine had an anomaly during the night.  When I did my morning productivity logging I noticed a downtick in fleet production.  All three current tasks on the affect machine showed about six hours of run time, the GPU reported temperature was down by tens of degrees from usual, and the wall power meter showed 45 watts instead of 168.

Rebooting put the power levels up part way, but not the full way, so I downloaded and installed the latest AMD driver, selecting "clean install" in case some setting had gone amiss.

After all that, I have over an hour of successful running.  Oddly, the elapsed times which had been running about 30:10, are now improved to about 28:02.

As rebooting gave me some of the usual Windows indications of update activity, I speculate that perhaps my middle of the night "failure" may have been a consequence of Windows updating activity replacing or altering the state of my AMD driver in a way inconsistent with Einstein success. 

One would think that by now MS KNOWS of the problem and would STOP doing it, but noooo the idiots continue on their merry way like mice following some guy with a flute!!

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7221544931
RAC: 972157

cecht wrote:Sorry for your

cecht wrote:
Sorry for your down time. Hope you get that new card crunching soon.  When you do, don't forget to flip the dual BIOS switch to the mining position.  As far as I know, XFX are the only 570 cards that offer that golden opportunity for faster tasks and lower power with a pre-loaded mining BIOS.

So today I finally tried to flip the switch on my first RX 570, and I think it is now worse (whereas the second one eventually got noticeably better).  Possibly it was delivered to me in mining position.  Which way is which?

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1688497555
RAC: 824777

On the Sapphire RX580 the

On the Sapphire RX580 the mining or silent BIOS comes with a 122W power limit, while the default BIOS was 170-180'ish... So flipping the switch and booting the machine, it was instantly clear that it worked. Temps were low, clocks were lower, PL was fixed at 122W, but just using 80W under load. Awesome!

cecht
cecht
Joined: 7 Mar 18
Posts: 1533
Credit: 2902258871
RAC: 2177360

archae86 wrote:Possibly it

archae86 wrote:
Possibly it was delivered to me in mining position.  Which way is which?

Mining position is toward the ports (what I call the front of the card, which is at the back of the machine).  It's odd though - you should have seen a big immediate difference, as Koschi described it, whichever way the switch was flipped.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7221544931
RAC: 972157

I currently operate a

I currently operate a flotilla of just three Einstein machines, all single-GPU with AMD GPU(one Radeon VII and two RX 570).  As of yesterday, I've given up on all three of them as regards running 3X on current Einstein Gamma-Ray Pulsar (1.18) work.

The detailed symptoms vary a bit from machine to machine, and as all three can run successfully (with improved productivity and power efficiency compared to 2X on the same work) for hours or days, I am not dead sure as to cause and effect.

On the two RX 570 machines, the two primary seemingly 3X-related issues of concern are:

1. "Validate error"  -- the kind that error out without comparison when the sanity check done after quorum fulfillment flunks.
2. Sloth Mode -- From one minute to the next, the GPU reported temperature drops by roughly 10C, the CPU consumption by the support task drops by over a factor of ten, and the completion elapsed time for tasks about triples or worse, but the core clock and memory clock rates reported by GPU-Z remain unchanged

In many months of running, on five different Nvidia card models, and two different AMD card models, I'm accustomed to getting zero "validate error" cases (Yes, I get very roughly 1% "Completed, marked as invalid" but that is something different) so that alone I think troublesome.

The Sloth Mode behavior is puzzling, and I frankly suspect the clock rate reported is false, as otherwise the low power consumption seems implausible.  As I have sometimes not noticed it for hours, and it requires at least a reboot, and sometimes a driver re-install to escape, Sloth mode is unacceptable to me at any appreciable rate.  I've not gotten so much as three straight days of running 3X on an RX 570 machine without dropping into Sloth Mode.

On my Nvidia cards running this application, 3X gave too little a productivity boost to leave me wanting to let the third instance have enough CPU to be happy, but the AMD cards all gave a nice little increment of performance.  It is a pity to give it up.

Perhaps others might mention here whether they succeed or fail in running 3X for Einstein GRP work on Polaris (e.g. RX 570) or Radeon VII cards.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.