I am running an intel I7 3.4Ghz with one core free for the GPU and other threads. I was suprised when I first installed this card, an Asus ENGTX560 DCII and was running twice as fast as the 550Ti I have running in a q6600 2.4Ghz quad core machine where 1Wu takes 3900s and 2 Wus take approx 6720s.
I have run the following tests to see what the impact of freeing up cpu's had on gpu performance. I have GTX560 running two BPR4 tasks and all remaining active cpu's running gamma-ray pulsar searches for all but the last test. Following is the link for the test machine:
After each change in free cpus, the currently running 2 wu's were allowed to finish and the following 4 wu's were averaged to get the run time.
1 free cpu 3500s
2 free cpu 2898s
3 free cpu 2842s
4 free cpu 2774s
4 Gravitational Wave S6 Line Veto
four free cpus 2664s
This suggests that to achieve the best gpu performance, there should be a free cpu for each gpu task. It all makes sense, a BPR4 process has five threads running, gamma-ray and an S6LV processes have 3 threads each running. The cpu's are already busy running threads for their respective processes and everybody waits.
A side effect of freeing cpus is that the cpu processes speed up, runtimes are very consistent and the ratio of runtime to cpu time goes down significantly. At the 4/4 ratio a gamma-ray pulsar search went from approx. 9.5 to 6 hours and S6LV went from approx 5.5 to 3.5 hours.
It would be interesting to see if this holds true with a gpu capable of running 3 more tasks, mine is memory constrained to 2 tasks.
In case anyone wondered, along with the Cuda's I also run a LHC and a 2-core T4T 24/7 on this quad-core with the nVidia GeForce GTX 550 Ti
And on this laptop I am on right now which is also a quad-core I run the same tasks and this one has the NVIDIA GeForce 610M (2048MB) driver: 28564 running the Cuda's in an average of 7,800.00 seconds (2hrs and 10mins) each.
I have better results for your list with a stockclocked standart 560TI from Zotac (feed by a not so slow C2D E8400 @ 3,6Ghz):
With 1WU runtime is ~1900secs
With 2WUs runtime is 3094secs (up to 90% GPU Load peak, ~35% CPU Load)
With 3WUs runtime is 3961secs (up to 97% GPU Load peak, up to 51% CPU Load)
And i think in your list it is only meant 8800GT is not openCL usable in THIS project (here is OpenCL1.1 needed or?), cos im running 8xxx and 9xxx (and a HD4850) cards in OpenCL1.0 projects (like POEM) and they do fine.
I am running an intel I7
)
I am running an intel I7 3.4Ghz with one core free for the GPU and other threads. I was suprised when I first installed this card, an Asus ENGTX560 DCII and was running twice as fast as the 550Ti I have running in a q6600 2.4Ghz quad core machine where 1Wu takes 3900s and 2 Wus take approx 6720s.
overclocked GT240 with GDDR5
)
overclocked GT240 with GDDR5 (core clock 750MHz/shader clock 1500 Mhz) best result: 4,035.16 s.
w/o overclocking: ~4500 s.
FirePro 3D V4800 (one core
)
FirePro 3D V4800 (one core free for GPU): avg. time from 4 WU: 10620 sec. for 1 WU (Win 7)
GT 640, Linux, 1 WU =~ 5700
)
GT 640, Linux, 1 WU =~ 5700 sec.
GT 520, Linux, 1 WU =~ 9600 sec. (retired)
3,021.93 - 3,065.20 NVIDIA
)
3,021.93 - 3,065.20 NVIDIA GeForce GTX 550 Ti
Average 50 minutes
I have run the following
)
I have run the following tests to see what the impact of freeing up cpu's had on gpu performance. I have GTX560 running two BPR4 tasks and all remaining active cpu's running gamma-ray pulsar searches for all but the last test. Following is the link for the test machine:
http://einsteinathome.org/host/5460148
After each change in free cpus, the currently running 2 wu's were allowed to finish and the following 4 wu's were averaged to get the run time.
1 free cpu 3500s
2 free cpu 2898s
3 free cpu 2842s
4 free cpu 2774s
4 Gravitational Wave S6 Line Veto
four free cpus 2664s
This suggests that to achieve the best gpu performance, there should be a free cpu for each gpu task. It all makes sense, a BPR4 process has five threads running, gamma-ray and an S6LV processes have 3 threads each running. The cpu's are already busy running threads for their respective processes and everybody waits.
A side effect of freeing cpus is that the cpu processes speed up, runtimes are very consistent and the ratio of runtime to cpu time goes down significantly. At the 4/4 ratio a gamma-ray pulsar search went from approx. 9.5 to 6 hours and S6LV went from approx 5.5 to 3.5 hours.
It would be interesting to see if this holds true with a gpu capable of running 3 more tasks, mine is memory constrained to 2 tasks.
RE: It would be interesting
)
3 or more tasks? (just guessing)
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
Yes, re-read it several times
)
Yes, re-read it several times and still missed it.
RE: 3,021.93 - 3,065.20
)
In case anyone wondered, along with the Cuda's I also run a LHC and a 2-core T4T 24/7 on this quad-core with the nVidia GeForce GTX 550 Ti
And on this laptop I am on right now which is also a quad-core I run the same tasks and this one has the NVIDIA GeForce 610M (2048MB) driver: 28564 running the Cuda's in an average of 7,800.00 seconds (2hrs and 10mins) each.
Also runs 24/7
hi!! I have better results
)
hi!!
I have better results for your list with a stockclocked standart 560TI from Zotac (feed by a not so slow C2D E8400 @ 3,6Ghz):
With 1WU runtime is ~1900secs
With 2WUs runtime is 3094secs (up to 90% GPU Load peak, ~35% CPU Load)
With 3WUs runtime is 3961secs (up to 97% GPU Load peak, up to 51% CPU Load)
And i think in your list it is only meant 8800GT is not openCL usable in THIS project (here is OpenCL1.1 needed or?), cos im running 8xxx and 9xxx (and a HD4850) cards in OpenCL1.0 projects (like POEM) and they do fine.
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]