From my own experience I had almost a 50% performance increase (from 3,800 to 2,100 seconds) over the previous version. I'll be updating the table with the new times as they come in...will look at listing the old times alongside the new for comparisons sake.
Well the UPS truck just delivered my EVGA GTX 660 Ti Superclocked and the 8GB Ram but it looks like I have to wait until monday for the power supply needed before I can install the 660Ti.
I put the Ram in and that already makes a difference having 12GB instead of just 4GB
So I hope to have tests and some numbers by monday night.
Run time info for HD7950 - both machines 8GB RAM, v1.28, Cat 12.6, BOINC 7.0.28, Win7 Pro 64bit
System 1: E5300, G41 chipset, 1 wu ~ 1840 secs, CPU ~ 620 Secs
System 2: i5-2500K, H67 Chipset, 1 wu ~ 1145-1160 secs, CPU ~ 255 secs
Both systems are not overclocked. v1.28 is much improved over v1.24! Looks like the CPU/chipset/bandwidth of System 1 severely limits the HD7950. I will probably move that card to another faster system.
Right now System 1 is dedicated to Einstein and System 2 is dedicated to Milkyway. Looks like I'm shortchanging Einstein, but I'll fix that.
I'm really impressed with the HD7950s. The double precision performance in Milkyway is incredible.
Yesterday I upgraded my GPU from a GeForce 9600GT to a EVGA GTX 660Ti, to be precise the model number is GV-N66TOC-2GD-EU. This is a factory over clocked card, where the core runs @ 1033 MHz (Boost to 1111 MHz) while the memory clock is left to the default speed of 6008 Mhz.
This is paired with a i7 GHz on a PCI-E 3.0 x16 bus.
Here are some early results running BRP4 v1.25 and 8 Einstein CPU-units:
9600 GT one WU at a time ~4430 secs (average over 50 units)
GTX 660Ti one WU at a time ~1700 secs (average over 5 units)
GTX 660Ti two WU at a time ~2900 secs (average over 35 units)
GTX 660Ti three WU at a time ~4500 secs (average over 6 units)
Checking GPU-Z while crunching the sensor-page claims that the core clock is 1201.9 MHz and the average load running 2 at a time is ~81%. I've used Process Lasso to raise the priority of the BRP-app to above normal to get a bit more performance, without raising the priority the load running 2 was ~70%.
9600 GT one WU at a time ~4430 secs (average over 50 units)
GTX 660Ti one WU at a time ~1700 secs (average over 5 units)
GTX 660Ti two WU at a time ~2900 secs (average over 35 units) GTX 660Ti three WU at a time ~4500 secs (average over 6 units)
Checking GPU-Z while crunching the sensor-page claims that the core clock is 1201.9 MHz and the average load running 2 at a time is ~81%. I've used Process Lasso to raise the priority of the BRP-app to above normal to get a bit more performance, without raising the priority the load running 2 was ~70%.
that's interesting to say the least - i have a dual GTX 560 Ti machine that crunches 6 BRP4 tasks in parallel (3 per GPU). this is a Win7 x64 platform w/ a Phenom II X6 1090T CPU, 8GB of DDR31600, and a PCIe 2.0 bus. averaged over hundreds of units (again, 3 at a time), the run times are ~5200s. your run times are only ~13.5% shorter than mine. i wonder if that's fairly indicative of the performance increase expected when going from a GTX 560 Ti to a GTX 660 Ti...
GTX 660Ti one WU at a time ~1700 secs (average over 5 units)
GTX 660Ti two WU at a time ~2900 secs (average over 35 units)
GTX 660Ti three WU at a time ~4500 secs (average over 6 units)
Could you please try 6 WUs at a time ?
My time for 560 Ti is ~7750 secs.
i wonder if that's fairly indicative of the performance increase expected when going from a GTX 560 Ti to a GTX 660 Ti...
Might be, in some of the reviews I read before purchasing this card they claimed that the 192 bit wide memory bus would slow this card down a bit. And if I were to guess the BRP4-app does a fair bit of memory transfers when running on the card and to and from the main system.
I've run some tasks over at Albert@home where a new CUDA-app is being tested and that shaved of about 700 secs on the runtime and more than halved the CPU-time, GPU load on the beta-app were over 95% when running 2 at a time. Here's hoping that app gets released over here soon!
I've begun running three at a time for a while and will see how it goes.
Quote:
Could you please try 6 WUs at a time ?
Don't think this card will like 6 at a time, but I plan on increasing the number of parallel task over the next few days and will report back in due time.
And here is another
)
And here is another review/test I read late last night.
EVGA GTX 660 Ti Superclocked
For those who don't know,
)
For those who don't know, they released the new improved OpenCL app version for Einstein after testing on Albert http://albert.phys.uwm.edu/forum_thread.php?id=8912&nowrap=true#112191
From my own experience I had almost a 50% performance increase (from 3,800 to 2,100 seconds) over the previous version. I'll be updating the table with the new times as they come in...will look at listing the old times alongside the new for comparisons sake.
Let me know your new times!!
RE: Let me know your new
)
HD6950 2wu's ~3,500 790MHz GPU 1250MHz Mem
HD5850 2wu's ~6,085 (pcie x8 slot) 765MHz GPU 1125MHz Mem
HD5830 1wu ~2,916
AMD A8 3870 APU: 1wu 6,489.60
RE: RE: Let me know your
)
I just started to crunch on 2 HD5870 GPUs.
CPU (I7-2600) is doing Docking@home now and SETI MB/AstroPulse work.
Doing 1 per_device (GPU), I'll post runtimes+CPU-times.
Well the UPS truck just
)
Well the UPS truck just delivered my EVGA GTX 660 Ti Superclocked and the 8GB Ram but it looks like I have to wait until monday for the power supply needed before I can install the 660Ti.
I put the Ram in and that already makes a difference having 12GB instead of just 4GB
So I hope to have tests and some numbers by monday night.
Run time info for HD7950 -
)
Run time info for HD7950 - both machines 8GB RAM, v1.28, Cat 12.6, BOINC 7.0.28, Win7 Pro 64bit
System 1: E5300, G41 chipset, 1 wu ~ 1840 secs, CPU ~ 620 Secs
System 2: i5-2500K, H67 Chipset, 1 wu ~ 1145-1160 secs, CPU ~ 255 secs
Both systems are not overclocked. v1.28 is much improved over v1.24! Looks like the CPU/chipset/bandwidth of System 1 severely limits the HD7950. I will probably move that card to another faster system.
Right now System 1 is dedicated to Einstein and System 2 is dedicated to Milkyway. Looks like I'm shortchanging Einstein, but I'll fix that.
I'm really impressed with the HD7950s. The double precision performance in Milkyway is incredible.
Yesterday I upgraded my GPU
)
Yesterday I upgraded my GPU from a GeForce 9600GT to a EVGA GTX 660Ti, to be precise the model number is GV-N66TOC-2GD-EU. This is a factory over clocked card, where the core runs @ 1033 MHz (Boost to 1111 MHz) while the memory clock is left to the default speed of 6008 Mhz.
This is paired with a i7 GHz on a PCI-E 3.0 x16 bus.
Link to the host
Here are some early results running BRP4 v1.25 and 8 Einstein CPU-units:
9600 GT one WU at a time ~4430 secs (average over 50 units)
GTX 660Ti one WU at a time ~1700 secs (average over 5 units)
GTX 660Ti two WU at a time ~2900 secs (average over 35 units)
GTX 660Ti three WU at a time ~4500 secs (average over 6 units)
Checking GPU-Z while crunching the sensor-page claims that the core clock is 1201.9 MHz and the average load running 2 at a time is ~81%. I've used Process Lasso to raise the priority of the BRP-app to above normal to get a bit more performance, without raising the priority the load running 2 was ~70%.
RE: 9600 GT one WU at a
)
that's interesting to say the least - i have a dual GTX 560 Ti machine that crunches 6 BRP4 tasks in parallel (3 per GPU). this is a Win7 x64 platform w/ a Phenom II X6 1090T CPU, 8GB of DDR31600, and a PCIe 2.0 bus. averaged over hundreds of units (again, 3 at a time), the run times are ~5200s. your run times are only ~13.5% shorter than mine. i wonder if that's fairly indicative of the performance increase expected when going from a GTX 560 Ti to a GTX 660 Ti...
RE: GTX 660Ti one WU at a
)
Could you please try 6 WUs at a time ?
My time for 560 Ti is ~7750 secs.
RE: that's interesting to
)
Might be, in some of the reviews I read before purchasing this card they claimed that the 192 bit wide memory bus would slow this card down a bit. And if I were to guess the BRP4-app does a fair bit of memory transfers when running on the card and to and from the main system.
I've run some tasks over at Albert@home where a new CUDA-app is being tested and that shaved of about 700 secs on the runtime and more than halved the CPU-time, GPU load on the beta-app were over 95% when running 2 at a time. Here's hoping that app gets released over here soon!
I've begun running three at a time for a while and will see how it goes.
Don't think this card will like 6 at a time, but I plan on increasing the number of parallel task over the next few days and will report back in due time.