Something weird is going on with CUDA. I have three machines running NVIDIA cards, and their performance running CUDA apps does not match up with their specs.
I have an i7-3720QM (laptop) running Mac OS Sierra with a GTX650M card that benches 1,249. It averages 14,708 seconds for BRP6, and 4,581 seconds for BRP4G tasks.
An i3-4160 running Windows 10 with a GTX 950 card that benches 5,340 runs BRP6 in about 8,066 seconds, and BRP4G in 2,504 seconds.
As you would expect, the GTX 950 does the tasks in about half the time, although on paper it's about 4x more powerful by the G3Dmark benchmark. But this is where it gets weird:
My i7-5820k running Ubuntu Linux with a GTX 960, which benches 5,987, but averages *37,611* seconds for BRP6 and *3,465* seconds for BRP4G--slower than the GTX950 and the GTX650M. Way slower.
These cards are stock, and none of them have been overclocked or tweaked. The GTX 950 and 960 allegedly have the same memory and everything.
System RAM can't be an issue. The Windows box has 16 GB and the 5820k has 32MB with quad channel in effect.
I have nouveau blacklisted and turned off in Ubuntu, and have also killed the GUI.
Does CUDA performance vary this much across various platforms, or what is going on with this? Any ideas how to speed up that GTX 960?
Copyright © 2024 Einstein@Home. All rights reserved.
BRP6 is done so theres no use
)
BRP6 is done so theres no use comparing with that app anymore. BRP6 WUs are 4.4x times the points as 4g but do not take 4.4x as long.
There are several 4g app versions out there right now. Make sure to compare the same version.
Gaming benchmarks don't always compare to GPU Compute performance.
GPU memory clocks can affect completion times.
I know there are no more BRP6
)
I know there are no more BRP6 left (actually, I am still running some). But I have a mountain of data on these, which is useful for comparison.
I also know the comparisons have flaws, not least with the gaming benchmarks, but also because all three cards are running on completely different operating systems! So it's impossible as well to compare the exact same CUDA app.
That said, there is no damned way the GTX 960 should be slower than the GTX 950, or so close to the GTX650M. No way. Something is wrong. Instead of being about 12% faster, it's almost 60% slower. All of the clock speeds on the 960 are higher than the 950 in the stock configuration, along with a third more CUDA cores.
From what I see host with 960
)
From what I see host with 960 is also doing CPU task, while this with 950 aren't. Try to free some CPU cores to allow proper feed of GPU.
One curiosity of mine is how
)
One curiosity of mine is how many work units are being run at once? Some would say that for a GTX 980 Ti running at 1341 MHz core a time of 2,700 or so seconds is bad... Until one realizes that in that time frame my card has crunched four work units, as I run four at a time to keep the GPU's load steady at 92-94%.
Since you have a single GPU,
)
Since you have a single GPU, you would want to have a single physical CPU core available for the GPU application. I noticed that you have an Intel 5820K which is a 6-core CPU with HT turned on. In this case, I would run no more than 10 CPU tasks and leave the remaining 2 threads for the GPU application. I think the setting in BOINC computing preferences would be 84% of the CPUs. I would suggest starting with running just one task per GPU with GPU utilization factor of BRP apps set to 1.00 in the project preferences.
Setting the gpu applications
)
Setting the gpu applications .exe to have a slightly higher priority than CPU applications exes is perfectly fine. I ran 6 GPU threads and 8 CPU threads on a 3770k and there was no slowdown on the GPUs. They took any CPU cycles when needed to feed the GPUs.
The machine with the 950 is
)
The machine with the 950 is doing CPU tasks--for other projects. Until recently it was also using its Intel HD4400 GPU for SETI, which really killed CPU performance.
But even though the 950 machine is using the CPU, it's only doing 4 threads on its 2 cores, while the 960 machine is doing 12 threads on 6 cores. But it also has quad channel memory and a huge amount of RAM.
Ok, I kind of feel stupid...I
)
Ok, I kind of feel stupid...I didn't know you could do that. Mine is only running one GPU task at a time. When I run top on the command line, it doesn't even look like it's running it all of the time (it's probably just the times the GPU checks in with the CPU). The temp on the 960 card is only like 39C so it's clearly not working very hard.
Thanks! I'll give that a
)
Thanks! I'll give that a shot. As I said above, it doesn't look like the GPU is working very hard at all, so there must be some kind of bottleneck somewhere. Maybe that's it.
Again, I feel like a moron.
)
Again, I feel like a moron. I did not know you could do that. Is that the GPU utilization preference in the project preferences? The one with all of the nasty warnings about how you will end the world if you change the setting?