GPU TDP as proxy for comparing task efficiency?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18747916143
RAC: 7070718

Regarding stats "days" at

Regarding stats "days" at BoincStats or FreeDC depends on when they pull the exported stats files generated each day by the project. If you go to BoincStats and select the Einstein project in the "Project Stats info" menu choice on the left and then click on the "Last Update" column will take you to the actual date and times of the export files.  You can determine the same thing on when FreeDC scraped the project stats files.

Based on which stats website you are going to use for your daily totals, use the cumulative kW/hr counter on the Kill-A-Watt meter and reset it to zero just after the stats site has updated the stats file.  Then check the tally on kwh used the next day after 24 hours has elapsed.  That way you will know exactly how much power was used on the host to generate the 24 hour credit total.

 

solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1577601307
RAC: 20609

cecht schrieb:GPU mods (OC,

cecht wrote:
GPU mods (OC, power limit, under V, etc.) - A big bucket of worms. I'll either run default settings for each card or pre-optimize settings for each card and run the evaluations with those. Anybody out there have any thoughts on the best approach for wrestling with these types of variables?

When comparing a given GPU before and after undervolting, I prefer a quick look at the GPU  temperature as a proxy. Undervolting results in a temperature drop of about 5 percent with stable crunching times. Most welcome! A cooler GPU can only draw less power. The gain seems to come from cutting power consumption spikes, someone argued in another board. For the actual overall system power draw of course, Archae's advice is the sound approach.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225464931
RAC: 1046540

Actual production numbers are

Actual production numbers are compromised by a number of vexing effects.

1. Not all tasks take the same time.

2. Not all tasks validate.

3. delay between completion and validation varies considerably, mostly because of quorum partner return time and success, and occasionally because of Einstein site software status.

4. The common statistics sites update from a bulk-produced file made available at Einstein.  There are time-sampling variations both in production of the file and in the reading and post composition by the web sites.

I've used two specific approaches:

1. During short tuning adjustments, when I want high accuracy and resolution, I populate an Excel spreadsheet with data transcribed from the history tab of a copy of BOINTTasks running on the same machine.  That lets me get actual elapsed time averages for the tasks run at specific conditions of interest (not necessarily on date boundaries).  It also gets a handy data set to check whether the current Einstein files have within file work content variation which must be controlled for to make accurate comparison.  I ignore the statistics sites and even my own Einstein account pages completely when working this way.

2. For ongoing health monitoring, I use a much less laborious method.  It still involves a spreadsheet, but I enter into it for each logging point the date and time, and my cumulative credit on the machine in question as displayed by the Einstein web page for that machine at my account.  To correct for variation in pending queue depth, I also log the pending task count (as displayed in the recently resumed counter on the Einstein tasks page for that host), and multiply the current value per task by the number pending for addition to my official credit.  

I should mention that in general I've not undertaken to control for error caused by invalid results.  On a fleet with a current RAC of between 1 and 2 million, the web page commonly shows between 5 and 25 invalid results.  I think the rate is low enough, and sufficiently unrelated to the experimental conditions imposed on my machines, that ignoring it improves accuracy, as I think much of the variation comes from quorum partner variations.

Regarding power, it is a little late for me to mention it, but there is an advantage to meters which have a reset button (so the measurement can start at a moment of your choosing when the system is stably running the condition of interest) and have appropriate averaging and totalization.  Some do, some don't.

 

cecht
cecht
Joined: 7 Mar 18
Posts: 1535
Credit: 2908745421
RAC: 2134148

Thank you all for the

Thank you all for the excellent suggestions. This should keep me busy for a while!

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 1535
Credit: 2908745421
RAC: 2134148

UPDATE: Well, I haven't

UPDATE: Well, I haven't gotten a second host, but I did get an AMD RX570 (4GB) to compare power efficiencies to my RX460 (2MB). My host (https://einsteinathome.org/host/12642118) only has one PCIe x16 and one PCIe x8 slot, hence those two cards. (For some reason, this host is listed on E@H as having two RX 570 coprocessors.)  In fact, once reality set in, I scrapped most of my original testing plan. I dropped the idea of calculating power efficiency as a function of BOINC credits and instead measured tasks per KWH, which I realized is all that is necessary to compare relative GPU power efficiencies, which is how HAL9000 at Seti@Home does it.  

My prior idea was to run all comparisons using the cards' default parameters, but running dual cards at default settings gave me an unstable system, so I had to under-clock the cards in dual configuration. Solling2, you mentioned using undervolting to increase power efficiency, but I found that underclocking has similar benefits, especially when running simultaneous (2x) tasks (details below).

METHODS & RESULTS:
My host has a 4-core CPU, Windows 7, with AMD 18.9.3 Adrenalin drivers. My E@H web preferences were set to run only FGRPB1G tasks. BOINC app_config was set up for a cpu_usage of 0.4 (with all cores available to BOINC), and gpu_usage was set to either 1 (1x = one task at a time) or 0.5 (2x, two concurrent tasks). I did get a power meter to measure kWh, so that part of the plan held up! Tests were run with the LATeah1022L, 1023L, 1034-1039L workunit series, which all had comparable computation times.

I calculated number of tasks per 24hr by using the average task run time, in seconds, sampled from E@H results logged during the run time of the experiment (samples: n=20 for single GPU, n=70 - 80 for dual GPUs). This way of measuring task production gave accurate comparisons among various GPU run conditions and time intervals.

The cards were initially run at their default factory settings: RX460 @ 1125MHz; RX570 @ 1286MHz; AMD's Global WattMan settings were also default.

Results of cards in single configuration:

card RX460 RX 460 RX 570 RX570
GPU mods default default default default
gpu_usage 1 0.5 1 0.5
tasks/24hr 65.1 66.1 130 138
avg sec/task 1328 1308 667 624
avg PC W 177 183 280 291
TDP 75 75 125 125
24hr PC kWh 4.25 4.4 6.73 6.98
tasks/kWh 15.3 15 19.2 19.8
tasks/TDPkWh 36.1 36.7 43.2 46.2

(average PC watts with no BOINC running was 100W with either card)

When comparing 2x vs. 1x tasks, the RX460 had a modest 1.5% gain in task productivity (tasks/24h), but lost 2% task efficiency (tasks/kWh) because of 3.5% extra wattage. The RX570 had a nice 7% increase in tasks/24h, and so gained 3% in tasks/kWh with it's 3.7% extra power draw at 2x.

For tasks/kWh card comparisons at 1x, the RX570 was 26% more efficient than the RX460; at 2x it was 32% more efficient.
 
My original query on this thread was whether cards' TDPs could be used as proxies for comparing task power efficiencies. Using TDP values to calculate 24h TDP kWh, tasks/kWh at 1x for the RX570 was 19% more efficient than the RX460 and at 2x was 26% more. So, I can't say TDP is a great proxy, but I suppose not too bad in a pinch.

The interesting part came when I tried to run the cards together. At the cards' default settings, my host was pulling 367W. The power supply is rated for 378W output, but one of the cards crashed within about 12 hour, rebooting the system. I assumed the reboot was caused by a power availability issue. To cut the cards' power usage, I tried adjusting settings in WattMan for power limits and maximum clock speeds. I settled on the following "good enough" values, though I doubt they are optimized:
RX460 @ 1104 MHz (-2%) and 0% power limit; RX570 @ 1210 MHz (-6%) and -5% power limit.
(In the Windows task bar, the AMD controller's graphics profile was set to "Optimize performance", but I don't know whether customized settings in AMD WattMan overrode this.)

Results of RX460 & RX570 cards in dual configuration:

GPU mods default underclocked underclocked
gpu_usage 1 1 0.5
tasks/24hr 198 189 207
avg sec/task 871 916 835
avg PC W 367 330 338
24hr PC kWh 8.81 7.93 8.11
tasks/kWh 22.5 23.8 25.5

(average PC watts with no BOINC running was 124W)

Compared to default GPU settings for 1x tasks, underclocking lowered tasks/24h by 5%, but increased tasks/kWh by 6% (from a 10% wattage savings). Running 2x tasks, however, increased PC wattage by only 2% and increased tasks/24h by a whopping 10% to give a tasks/kWh efficiency gain of 7%. So, if I interpret this right, capping the power requirements of these AMD cards while running 2x tasks makes them not just more productive but also more power efficient. Win-win! I didn't take measurements of the individual underclocked cards, but I think the take-home lesson with the combined cards is clear. No? Maybe? Anybody else ever notice this effect of underclocking AMD cards? Do NVIDIA cards respond the same way? Would a RX460 perform better with 4GB memory instead of 2GB?

I did run the RX460 on this host under Windows 10 and it performed practically the same as under Win7. There was a smidgen of performance increase with Win7, but am assuming that was measurement noise. I never could get the 460 to run 2x tasks under Win10.

I haven't run the cards on Linux yet, but hope to do to have that comparison in the not too terribly distant future.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225464931
RAC: 1046540

Nice work.  I'll comment on

Nice work.  I'll comment on some points you raised.

cecht wrote:
For some reason, this host is listed on E@H as having two RX 570 coprocessors.

I think it is a limitation of the current scheme that hosts with multiple disparate GPUs get reported with the right total number of cards but just one model identification.  For Nvidia cards, I think it reports the one with the highest CUDA "capability level", which is more an indicator of how recent the design is than of relative performance.

Quote:
The interesting part came when I tried to run the cards together. At the cards' default settings, my host was pulling 367W. The power supply is rated for 378W output, but one of the cards crashed within about 12 hours, rebooting the system. I assumed the reboot was caused by a power availability issue.

Going to that high a fraction of rated output is really pushing your luck.  I don't know about AMD GPUs, but my Nvidia GPUs have quite a lot of short-term variation, so they definitely spike power consumption well above the average.  They don't put very big output filter capacitors on power supplies these days, so you can't count on them to average the power consumption over a time period of even one second.  The faster-responding monitoring I've used, the higher the spikes, and I don't know the limit.  Personally, I'd be reluctant to run a system at higher than about 75% of rated power for the output I'm using, which can be well under the total rated power, or not, depending on the model.

Quote:
Anybody else ever notice this effect of underclocking AMD cards? Do NVIDIA cards respond the same way?

I think on Nvidia cards that undervolting is a highly effective power efficiency improvement--right up to the point it gives errors.  This is just what one would expect from the basic physics of CMOS operation.  However, the user interface for those cards does not give direct control of clock frequency, but instead a sort of nudge.  One can also nudge the power consumption.  Taking the two together, plus parameters measured from current card operation, the card firmware (or maybe the driver) decides moment to moment where to set both the supply voltage and the clocks.  In my personal case, my recent practice has been to push up both the core clock and memory clock overclock request "nudge" parameters until I find failure, then back down a couple of steps.  Then, if I am interested in saving some power, I leave the overclock numbers alone and start lowering the "power limit" slider.  As it happens, I did that just this week.  Time-averaged observation of the core clock, voltage, and power consumption numbers showed clearly that on average the card ran at a lower voltage, lower core clock speed, and higher power efficiency for each step down the power limit that I tried from 90% down through 55%.  As I've been thinking I'm using a bit more of my household power than I like on Einstein computing, I chose the 65% step.  I confess that for this test I used average TDP as reported by GPU-Z, as getting better averages out of my power meter is a bit laborious, and separating the effects of the two different cards would have required a more tedious test plan.

This particular experiment was on a host that currently had a GTX 1070 and a GTX 1060 6GB, as the RTX 2080 was swapped out for want of suitable work.

cecht
cecht
Joined: 7 Mar 18
Posts: 1535
Credit: 2908745421
RAC: 2134148

archae86 wrote:I think it is

archae86 wrote:
I think it is a limitation of the current scheme that hosts with multiple disparate GPUs get reported with the right total number of cards but just one model identification.

But when the GPUs are really disparate the scheme can identify each card, because when I was running the RX 460 with a nvidia GTX750 in this host, both cards were listed.   The mystery deepens...

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Different vendors are

Different vendors are displayed, eg. AMD, Nvidia and Intel GPUs, but more than one from the same and it's only one that gets listed with the associated (x) numbering telling how many from the same vendor but not the models.

cecht
cecht
Joined: 7 Mar 18
Posts: 1535
Credit: 2908745421
RAC: 2134148

UPDATE: I got a Ubuntu/Linux

UPDATE: I got a Ubuntu/Linux system (Lunbuntu 18.04, AMDGPU 18.5 All-Open drivers) running on that same host and ran comparisons with Windows 7 for the RX 570 card. As expected, the Linux system performed better, with 5% greater task productivity (#tasks/24hr, calculated as described earlier) and about 3% greater task power efficiency (#tasks/kWh). The Lunbuntu task series was LATeah1042L, which had similar run profiles as the series used for the prior Win7 runs. Tasks were run at the BOINC default of 1 concurrent task.

RX 570    
OS Windows 7 Lubuntu 18.04
GPU mods default default
tasks/24hr 130 136
avg sec/task 667 637
avg PC W 280 285
24hr PC kWh 6.73 6.83
tasks/kWh 19.2 19.9

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.