R9 280X vs 7970 and PCIe 2.0 and 3.0

disturber
disturber
Joined: 26 Oct 14
Posts: 30
Credit: 57155818
RAC: 0
Topic 197955

I purchased an r9 280X a few weeks ago to replace a 660ti. It went into a Haswell motherboard with an i5-4590. I disabled the Intel gpu because it was consuming too much PCI bandwidth, and created about 5% invalid wus. Running 3 Perseus arm wu this new gpu crunches each in about 125 minutes. I also run 1 cpu job at the same time. The AMD gpu application uses more CPU than the Nvidia.

The other computer with the 7970 and a i5-2500k @ 4.3 GHz ran 3 Perseus arm wu in 150 minutes. Since the motherboard is a Z67 Gen3, I replaced the cpu with an i7-3770k I got on sale, setting it up also @ 4.3 GHz. I expected some improvement from the change of PCIe 2.0 to 3.0. Running 3 Perseus arm wu now take only 132 minutes, a saving od roughly 18 minutes each.
It seems that the cpu is not an important part of the computation speed, the difference in timing I attribute to the memory clock difference of 1400 MHz and 1500 MHz between the two video cards.

So in summary this really proves that E@H is PCI bus bound at PCIe 2.0, and by just replacing the cpu to change from PCIe 2.0 to 3.0 was a good comparison as it kept everything else the same. In retrospect the change may not be worth the price I paid, but I have other uses for the cpu.

The impressive part for me was that Windows 7 started, adjusted the kernel for a different cpu and after a restarts everything was working properly.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

R9 280X vs 7970 and PCIe 2.0 and 3.0

I wonder if that would be the case with your GTX 660 Ti. That is, would a Kepler running CUDA be bus limited as much as a comparable AMD card running OpenCL. I somehow get the impression (don't know why) that AMD/OpenCL takes more bus bandwidth. That may just be because of the projects I am familiar with and the amount of data that they have to move around. Also, the Maxwells tend to have large caches, which could also affect that situation.

Thanks for the info.

mikey
mikey
Joined: 22 Jan 05
Posts: 12809
Credit: 1879705749
RAC: 1292524

RE: The impressive part

Quote:

The impressive part for me was that Windows 7 started, adjusted the kernel for a different cpu and after a restarts everything was working properly.

I run Win7 b4bit Ultimate and replaced an older dual core Intel cpu and motherboard with an AMD 6 core cpu and motherboard and Windows also only took one reboot to reconfigure itself to work just fine. This is a cruncher only machine for me, but I was SURE I was going to have to reload Windows to make it work! They must have been working on Windows in the right way for a change!

As for your pci bus theory that is interesting, most apps barely utilize all of the pci bandwidth at 8bit, let alone the 16bit ones in most -e slots. It seems like Einstein has been doing good things too!

disturber
disturber
Joined: 26 Oct 14
Posts: 30
Credit: 57155818
RAC: 0

RE: I wonder if that would

Quote:

I wonder if that would be the case with your GTX 660 Ti. That is, would a Kepler running CUDA be bus limited as much as a comparable AMD card running OpenCL. I somehow get the impression (don't know why) that AMD/OpenCL takes more bus bandwidth. That may just be because of the projects I am familiar with and the amount of data that they have to move around. Also, the Maxwells tend to have large caches, which could also affect that situation.

Thanks for the info.

I did move the 660ti from the i5-2500k with PCIe 2.0 to the Haswell i5-4590 and the times decreased from 75 to 70 minutes running 2 BRP4G wu. But I also was running the Intel gpu on the i5-4590 at the same time, so without it, the times would be even lower. And from my recollection, the Nvidia cards use less CPU also.

I hope this helps.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

disturber, Yes, that is a

disturber,

Yes, that is a very nice comparison.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 584018126
RAC: 140144

Yes, Einstein needs quite

Yes, Einstein needs quite some PCIe bandwidth, especially on fast cards. It doesn't need that many CPU cycles (the AMDs needing more is the OpenCL tax), but what it also needs is main memory bandwidth. All that data sent over the PCIe bus has to come from somewhere, after all.

This is also the point where using the iGPU can hurt crunching on the main GPU: the iGPU needs lot's of main memory bandwidth for itself. It's not using any PCIe bandwidth, though, as it's attached straight to the "ring bus" within the CPU.

MrS

Scanning for our furry friends since Jan 2002

tbret
tbret
Joined: 12 Mar 05
Posts: 2115
Credit: 4870580804
RAC: 192942

I don't have any good numbers

I don't have any good numbers for a comparison, so this is an "impression" rather than an observation.

Very weirdly, when I went from a 6-core Phenom II to an 8-core FX processor that was only marginally faster than the Phenom II, I experienced an increase in the speed of GPU work unit completion.

I went looking for a good answer and never found one so I didn't come and report it objectively.

But the GPU "usage" reported by Precision X did not change.

That makes me think that information up and down the PCIe bus wasn't the problem. Instead, I'm *completely* guessing that it was instead the on-chip memory controller. It could be the on-chip scheduler, too, I suppose.

I don't really have any idea why, but the effect did happen.

Oh -- the machine runs GPU work units only and I didn't change anything about the configuration. The memory "says" it is running at the same speed, etc., although the multiplier did change.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.