EINSTEIN: Power/Production Ratio

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

Ah, okay, I was already

Ah, okay, I was already wondering...

Picking in the blind, I can only assume that Einstein doesn't take much advantage from high clocked VRAM and VRAM bandwidth (which is essential for gaming performance and boosts all the fast cards in Games).

So if the GPU is just turning out lots of GLOPS, that seems to suffice for Einstein tasks. VRAM bandwidth doesn't seem the bottleneck, only the GPU (hence the extremely well performing "smaller" Video cards plus the fairly low scaling beyond 2 parallel GPU tasks even on very high-end Cards despite tons of surplus VRAM bandwidth).
The only exception seem to be IGPs w/out own VRAM and APUs, their bandwidth is severely limited by System RAM by design; having any CPU cores perform CPU tasks in addition to that is competing for their already limited bandwidth.

I've been sent a nice, current GPU specs list that contains all relevent data (German Language).
Seems those cards (when sorted for GFlops) are a very good indicator what to expect, plus it shows their rated peak Power consumptions.

http://www.pc-erfahrung.de/grafikkarte/vga-grafikrangliste.html

That Webpage Google Translated (English)

I'll take my shot with two HD7790 next month. Rated at 1790GFLops Single Precision, DP capable (rated 1/16 of SP performance), while drawing only 85W peak... seems like a sweet spot to me. I also hope running two midrange cards is indeed more efficient than betting all onto one, power-hungry monster (which, if my theory is correct, can't give its extreme VRAM bandwidth advantage to Einstein while sucking lots of Power on its GPU which is responsible for most energy consumption - other projects may differ, however).

Matt Giwer
Matt Giwer
Joined: 12 Dec 05
Posts: 144
Credit: 6891649
RAC: 0

OK back to our regularly

OK back to our regularly scheduled topic with one last thing.

Back years ago when power came up on SETI I did a quick back of the envelop estimate.

For only 17 cents a day you can help ET find a friend in this lonely universe.

David Rapalyea
David Rapalyea
Joined: 3 Jan 13
Posts: 79
Credit: 63886821
RAC: 0

ET I don't know how to use

ET

I don't know how to use the 'quote' feature properly. I will experiment with it later. However, I agree with you on just about everything. You also wrote: "I'm not saying my situation in Germany applies to everyone either.. but might give you a different perspective on this issue: I don't know anyone around here who's got an AC in their private home."

I spent time in Frankfurt a few decades ago. [Like most fellow countrymen I am mon-lingul. And of course I lived and worked with English speakers almost exclusively. True: "Ich studiert Deutch drei monad in Folkhoschule, aba sprech kiene Deutch. ie Mein Rifen ist mit luft nicht." The taxi drivers said "Sprechen sie klaar.."] After that I just kept my mouth shut.

Anyway, I do not remember a single hot day. If you want hot go to DisneyLand Florida May to September. I don't know how people live there in the Summer. In point of fact, a lot of them don't; instead they have second homes where I live! I am retired here in the North Georgia mountains at 2,200 feet elevation in an inexpensive college/resort area. Atlanta is two hours South, 1,000 feet lower, usually close to ten degrees hotter and WAY more humid.

This year was warm in the mountains with many days of clear skies and 90F, humidity maybe 30-40%. People who live in the shade often do not use A/C, but I am perched on a small mountain with full Southern exposure. Even though nights drop to 70F I do not open the house up like most people. I like a constant 70F all the time. Its an indulgence. During a hot spell I pay $2 - $3 per day for full A/C. That doubles or triples my electic cost for three or four months. We pay about ten or eleven cents per kwh. I actually spend substantially more then that on my daily beer ration. Incidentally, I can not find Binding Export or Kaiser Pilsner Private to save my life. More is the pitty.

But no one uses resistance heating in the cooler months, though several of my neighbors have heat pumps for the Winter. I moved here from Chicago ten years ago and am amused when the Florida people go back down South to avoid our 'Winters'. Winters? HAH! I remember returning home from a foodie club on the North side of Chicago. I stopped to pay the toll and the electric window would not go up. It was 5F and forty five minutes drive home. Blustery dry snow. Even with full heat and driving slow we were within 10 minutes of taking a motel room.

Germany does not have Summer, and the North Georgia mountains do not have Winter. IMHO.

Arecibo 19 Oct 2012
Just Because The Space Alien Is Green
Does Not Mean You Should Go

David Rapalyea
David Rapalyea
Joined: 3 Jan 13
Posts: 79
Credit: 63886821
RAC: 0

Hi Matt, You wrote: "Back

Hi Matt,

You wrote: "Back years ago when power came up on SETI I did a quick back of the envelop estimate. For only 17 cents a day you can help ET find a friend in this lonely universe."

Less then a year ago I started looking for ET with my daily PC. I really really liked the real-time analytical screen saver. My photo ID is a screen shot of a good triplet, for instance. I became enthusiastic and decided to get more serious. At the time I had never heard of GPU processing, so I decided just to throw lots of PC power at the problem. My investigations showed the Intell Core Duo 2.8 ghz was the best cost/production choice so I got on EBAY and got crazy.

How crazy, you might ask? How about TWENTY HP Small Form Factor business computers at $100 each from EBAY, shipping included. I must say it was one hell of a lot of fun rigging them up all over the house with WiFi etc etc. And they were not power hogs. They were sold as 'life cycle cost' units, and only drew about 70 Watts each to RAC at, well, 1,500 cobblestones per day. I remember how proud I was when I hit 250,000.

Jeezus Fracking Kryst. And my GPU initiation was not smooth either. I got a couple of GT 610s and was impressed. So I got a couple of GT 620s and they were a bust. Subsequently I acquired several GT 630's. These actually had a RAC of about six or seven thousand. That got my attention. And I got serious about my studies purchasing a variety of older Nvidia cards. Seeing how well they performed meant I was ready to go for a GPU dedicated machine. And that machine now RACs about 80,000 per day at 225 watts.

Now I have one or two dozen HP machines for donation to the local thrift shop. There are more costly hobbies, for sure. Ask me about boats or sports cars....

Arecibo 19 Oct 2012
Just Because The Space Alien Is Green
Does Not Mean You Should Go

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: I don't know how to use

Quote:
I don't know how to use the 'quote' feature properly.


When you write a post there is a link to the left of the editing area about how to use BBCode, check it out and I'm sure you'll get the hang off it! =)
Just try not to quote more than necessary as it can take a lot of screen space and be a bit confusing.
Here's the same link: Use BBCode tags to format your text

Sorry for being a bit off topic here...

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577510215
RAC: 196478

@FalconFly: I strongly

@FalconFly: I strongly disagree that Einstein doesn't need memory bandwidth. Just looking at GPU memory controller utilization I get this:

- GTX660Ti: ~40% at GPU-Grid, 60+% at Einstein
- GT640: 60+% at GPU-Grid, 99% at Einstein

Both cards feature slight core OCs to a bit over 1.1 GHz (so need a bit more bandwidth) and are more bandwidth-constrained than regular cards. And indeed the GTX660Ti being held by at GPU-Grid by memory bandwidth: the GTX660, which has the same bandwidth but ~30% less raw performance, is only outperformed by ~20%. This is a rather strong limitation for GPU-Grid, usually the single precision Flops translate directly into performance, within a generation (CC level to be more precise).

Einstein uses different code.. but placing more than a 50% higher load on the memory controller is far from "bandwdith doesn't matter".

What probably made you think about this assumption is that the big cards don't seem to do as well at Einstein as they should. Or put another way: actual throughput scales sub-linearly with theoretical Flops. And we can not even get the big GPUs fully loaded at all.

I suppose this scaling is an issue of some serial portions in the code or maybe communication between GPU and CPU. If this is true several medium-sized GPUs can be more efficient than a single big GPU by distributing these problematic parts among them.

And it's pretty normal for GP-GPU apps to draw less power than typical games. The obvious reason is that texture units, raster output units (ROPs) etc. are of no use in general computing and are hence not being used. But it's also difficult to port general algorithms to GPUs, make them scale among so many relatively slow processors (shaders) and extract good performance from them. Games are an "embarrassingly parallel" problem where this works perfectly (no exactly a coincidence, considering what these chips were made for). But as soon as an app struggles to employ all shaders all the time, it will automatically consume less power than typical games. Einstein is not very power-hungry, and may at least partially be limited by memory bandwidth.

Ahhh, now that I'm writing it: guys, Einstein needs GPU memory bandwidth!
It's right there in the numbers DSKAG. The credits and WUs may have changed a bit since then, but I don't think the fundamental algorithms have changed considering the small version increment. What I'm seeing there:

- GTX660 and GTX660Ti are almost tied. Both have the same bandwidth, but the Ti could do ~30% more Flops (if it could, that is..)
- GTX670 is barely faster than these two. It has 33% more bandwidth and the same Flops as the 660Ti. BUT the numbers come from Linux, which at that time seemed to be significantly slower on nVidia, roughly 20%.
- HD7850 is almost 50% faster than HD7790! Both cards have comparable Flops, but the HD7850 has a bit over 50% more bandwidth.

So I'd like to discourage you from buying HD7790's for Einstein. Sure, on paper it will consume less, just about as much as it would crunch less. But 130 W is nto the real power consumption of that card, I have no idea why AMD quotes this number. Looking at typical power draw in games it's 96 W vs. 81 W. tthis makes the HD7850 by far the more efficient one. And they're even priced roughly equally, since their gaming performance is close. Note, however, that this applies to the non-boost version. The AMD boost is still quite power-inefficient. I'd rather get one without it, or turn it off manually and adjust sustained clocks & voltages as I want it.

@David: Nooo.. you're destroying my conveniently simple prejudices about Americans! Or, actually, that sentence of mine was written with a bit of provocation intended. Thanks for adding some more perspective. And your 3-month-German is better than my 5-year-Russian from school :D

MrS

Scanning for our furry friends since Jan 2002

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

I found out today you are

I found out today you are correct.

A gave a fast variant of the HD7790 a quick spin today, the preliminary data looks as follows :
(both cards ran 2 WorkUnits)

Binary Radio Pulsar Search, Arecibo

HD7850 (860MHz/1200MHz 256bit GDDR5) ~130W, 1760GFLops : 3800s =45473 Cr/day
HD7790 Turbo (1075MHz/1600MHz 128bit GDDR5) ~90W, ~2025GFlops : 5550s =31135 Cr/day

HD7790 Turbo (90W, 102.4GB/s) vs HD7850 (130W, 153.6GB/s) :
- Energy -31%
- GFlops +15%
- VRAM bandwidth -33%
- Performance -46%

The thing looks a bit different with the other WorkUnit type, though :

Binary Radio Pulsar Search, Perseus
HD7850......... 12500s =46075 Cr/day
HD7790 Turbo... 15700s (prognosis after 46% complete) =36684 Cr/day

- Performance -26%

I only have a very few workunits done so far, so the performance figure might still vary a bit - but the rough dimensions are pretty clear already, despite the HD7790 being clocked quite a notch higher and having higher GFlops potential.

Looks like I was already in the ballpark with the existing HD7850 indeed...
I'll see if I can build a machine holding three HD7850's and the APU - running 8 GPU tasks only on the QuadCore and keeping RAM bandwidth as free as possible for the APU at the same time, maximizing its performance additionally. We'll see were that gets me :)
(I'll take some power measuring before that to see where the actual consumption goes for what I hope to become a 150000Cr/day machine)

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

After getting some more

After getting some more Workunits through, this is what it seems to look like (also had a calc error in my previous post) :

Binary Radio Pulsar Search, Arecibo
HD7850 (860MHz/1200MHz 256bit GDDR5) ~130W, 1760GFLops : 3800s =45473 Cr/day
HD7790 Turbo (1075MHz/1600MHz 128bit GDDR5) ~90W, ~2025GFlops : 5400s =32000 Cr/day
(both running 2 Workunits parallel)

Reference is HD7790 Turbo (90W, 102.4GB/s) vs HD7850 (130W, 153.6GB/s) :
- Energy -31%
- GFlops +15%
- VRAM bandwidth -33%
- Performance -30%

Binary Radio Pulsar Search, Perseus
HD7850......... 12500s =46075 Cr/day
HD7790 Turbo... 15200s =37890 Cr/day

- Performance -18%

This card is actually holding up relatively well when running Perseus Arm Survey Workunits, otherwise it makes almost no difference when measured vs. Energy Consumption (lower consumption about equals lower performance).
So after all, at least it's not the total loss I feared yesterday, I'm therefor thinking about keeping it.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577510215
RAC: 196478

Nice to see that the card is

Nice to see that the card is not as bad at Einstein as feared. I'm pretty sure though that the power comparison won't be as favorable: both GPUs are based on the same GCN architecture manufactured in the same 28 nm process. HD7850 has 8/7 = 14.3% more shaders, which would increase power draw and performance linearly, HD7790 is clocked 25% higher (using your clocks). This alone would make the HD7790 actually draw more power, but we're not finished yet: to reach these clock speeds the smaller card surely has to use a higher voltage, which increases power draw even further. And if they use the same voltage, HD7850 could easily be clocked to 1 GHz as well or have its voltage reduced.

One could then argue that the additional memory controller sand chips on the HD7850 consume some more power - but on the other hand HD7790 uses higher clocked memory, which again counters this at least partly. So just from looking at the similarities and differences between these cards I wouldn't expect much of a performance difference - which is just what's being observed in games.

The only reason I could see the difference in power draw being higher at Einstein is that the smaller card may be bottlenecked by its memory bandwidth so far that its shaders are significantly underutilized compared to the bigger chip. Anyway, actual measurements would be nice :)

MrS

Scanning for our furry friends since Jan 2002

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

Bare in mind that the GPUs

Bare in mind that the GPUs are actually quite different, with the Bonaire GPU being a new own design and having much more extended power saving features included into the GCN 1.1 architecture.

And other sites already confirmed that it indeed draws much less power (consistent with its lower TDP) than the older HD7850 using GCN 1.0 and a different design.

In the end, for Einstein if the slightly lower performance is acceptable, the Bonaire is a more efficient design than the HD7850, as it chunks out more performance/Watt (especially with the Perseus WorkUnits).
Additionally, it requires less cooling effort (Fans take a few Watts as well) - although that won't become a factor until you use more than one card.

For projects with lower VRAM bandwidth requirements (which should be most, Einstein seems to be among the most demanding in bandwidth requirements from what I've read), it is even a more clearly preferrable choice over the HD7850 for these reasons (additional plus : it's much cheaper as well).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.