CUDA Performance Disparity

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341982144
RAC: 969
Topic 201921

Something weird is going on with CUDA.  I have three machines running NVIDIA cards, and their performance running CUDA apps does not match up with their specs.

I have an i7-3720QM (laptop) running Mac OS Sierra with a GTX650M card that benches 1,249.  It averages 14,708 seconds for BRP6, and 4,581 seconds for BRP4G tasks.

An i3-4160 running Windows 10 with a GTX 950 card that benches 5,340 runs BRP6 in about 8,066 seconds, and BRP4G in 2,504 seconds. 

As you would expect, the GTX 950 does the tasks in about half the time, although on paper it's about 4x more powerful by the G3Dmark benchmark.  But this is where it gets weird:

My i7-5820k running Ubuntu Linux with a GTX 960, which benches 5,987, but averages *37,611* seconds for BRP6 and *3,465* seconds for BRP4G--slower than the GTX950 and the GTX650M.  Way slower. 

These cards are stock, and none of them have been overclocked or tweaked.  The GTX 950 and 960 allegedly have the same memory and everything.  

System RAM can't be an issue.  The Windows box has 16 GB and the 5820k has 32MB with quad channel in effect. 

I have nouveau blacklisted and turned off in Ubuntu, and have also killed the GUI. 

Does CUDA performance vary this much across various platforms, or what is going on with this?  Any ideas how to speed up that GTX 960? 

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3426936540
RAC: 3882190

BRP6 is done so theres no use

BRP6 is done so theres no use comparing with that app anymore. BRP6 WUs are 4.4x times the points as 4g but do not take 4.4x as long.

There are several 4g app versions out there right now. Make sure to compare the same version.

Gaming benchmarks don't always compare to GPU Compute performance.

GPU memory clocks can affect completion times.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341982144
RAC: 969

I know there are no more BRP6

I know there are no more BRP6 left (actually, I am still running some).  But I have a mountain of data on these, which is useful for comparison.

I also know the comparisons have flaws, not least with the gaming benchmarks, but also because all three cards are running on completely different operating systems!  So it's impossible as well to compare the exact same CUDA app.

That said, there is no damned way the GTX 960 should be slower than the GTX 950, or so close to the GTX650M.  No way.  Something is wrong.  Instead of being about 12% faster, it's almost 60% slower.  All of the clock speeds on the 960 are higher than the 950 in the stock configuration, along with a third more CUDA cores.

Sebastian M. Bobrecki
Sebastian M. Bo...
Joined: 20 Feb 05
Posts: 63
Credit: 1529603097
RAC: 103

From what I see host with 960

From what I see host with 960 is also doing CPU task, while this with 950 aren't. Try to free some CPU cores to allow proper feed of GPU.

WhiteWulfe
Joined: 3 Mar 15
Posts: 31
Credit: 62249506
RAC: 0

One curiosity of mine is how

One curiosity of mine is how many work units are being run at once?  Some would say that for a GTX 980 Ti running at 1341 MHz core a time of 2,700 or so seconds is bad...  Until one realizes that in that time frame my card has crunched four work units, as I run four at a time to keep the GPU's load steady at 92-94%.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

Since you have a single GPU,

Since you have a single GPU, you would want to have a single physical CPU core available for the GPU application. I noticed that you have an Intel 5820K which is a 6-core CPU with HT turned on. In this case, I would run no more than 10 CPU tasks and leave the remaining 2 threads for the GPU application. I think the setting in BOINC computing preferences would be 84% of the CPUs. I would suggest starting with running just one task per GPU with GPU utilization factor of BRP apps set to 1.00 in the project preferences.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3426936540
RAC: 3882190

Setting the gpu applications

Setting the gpu applications .exe to have a slightly higher priority than CPU applications exes is perfectly fine. I ran 6 GPU threads and 8 CPU threads on a 3770k and there was no slowdown on the GPUs. They took any CPU cycles when needed to feed the GPUs.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341982144
RAC: 969

The machine with the 950 is

The machine with the 950 is doing CPU tasks--for other projects.  Until recently it was also using its Intel HD4400 GPU for SETI, which really killed CPU performance.

But even though the 950 machine is using the CPU, it's only doing 4 threads on its 2 cores, while the 960 machine is doing 12 threads on 6 cores.  But it also has quad channel memory and a huge amount of RAM.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341982144
RAC: 969

Ok, I kind of feel stupid...I

Ok, I kind of feel stupid...I didn't know you could do that.  Mine is only running one GPU task at a time.  When I run top on the command line, it doesn't even look like it's running it all of the time (it's probably just the times the GPU checks in with the CPU).  The temp on the 960 card is only like 39C so it's clearly not working very hard.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341982144
RAC: 969

Thanks!  I'll give that a

Thanks!  I'll give that a shot.  As I said above, it doesn't look like the GPU is working very hard at all, so there must be some kind of bottleneck somewhere.  Maybe that's it.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341982144
RAC: 969

Again, I feel like a moron. 

Again, I feel like a moron.  I did not know you could do that.  Is that the GPU utilization preference in the project preferences?  The one with all of the nasty warnings about how you will end the world if you change the setting?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.