The only obvious Q8400 disadvantage is four cores sharing one RAM interface, as opposed to two. Perhaps the ABP2 ap is far more RAM-access bound than most? Were that true, your Q8400 rig would presumable respond to RAM interface timing tweaking more than you are accustomed to see on other aps.
Just musing aloud. As it happens, I have a matched 2-core, 4-core pair of Conroe-class hosts, so once the ABP2 revisions settle down enough to get work on both, I can look for a similar effect there, though they differ from your Wolfdales profoundly enough that a difference would not surprise me. On existing workloads over the last couple of years on multiple SETI and Einstein aps, they have seldom differed appreciably in typical CPU seconds per result--certainly not by the amount you see. I've believe them not generally to be much RAM-bound, and spent no energy on twisting the tail on the RAM settings.
Early returns are in, and the hypothesis that ABP2 has a higher tendency to get memory-bound than other recent Einstein and SETI aps looks yet more likely.
I have a very close matched pair of hosts with quad-core Conroe (Q6600) and dual-core Conroe (E6600) running on the same model motherboard, same RAM, same clock frequency (stock 2.4 GHz)... Historically, the quad has taken slightly longer on average to do comparable work, but well under 10%.
So far, however, ABP2 running on all four cores is taking about 3600 seconds on the Q6600, vs 2800 on the E6600--a much bigger difference than I am used to seeing.
I also have a Q9450 Penryn-class quad running stock at 2.83 GHz. Historically it enjoys somewhat better than clock-rate advantage over the Q6600, possibly because of architectural advantage, possibly because of considerably higher RAM bandwidth.
Here it is taking about 2600 seconds for ABP2 running 4-up, but a couple of results which ran against GW or SETI work took only 2200. Taken together, these results suggest that even the higher Penryn bandwidth is burdened by the ABP2 RAM demands in a 4-up configuration, but less so than the Conroe.
As I run my RAM dead stock, if my initial guesses are right (which depend, in part, on the notion that unlike GW, ABP2 results are very similar to one another in computation requirement) this all suggests I have an opportunity by twisting the tail on RAM. Sadly, the major opportunity would be on my Q6600 system. But that is my daily driver, host to my serious audio hobby, and all my financial affairs. I'm not ready to go through a "try until fail and back down" sequence on it. So the notion will remain unverified, at least by me.
The spread on my i7 is enormous. If I'm right in thinking ABP2 has a chaotic memory access pattern the much larger amount of cache available and the massive bandwidth that DD3-1600 provides is probably responsible.
The wide spread in performance differentials makes picking the proper credit value somewhat difficult, but it should be a value where some computers are faster with S5R6 and others faster with ABP2 instead of one where some are slightly faster with ABP2 and other enormously faster.
But you are comparing CPU and CUDA GPU performances on the i7 here, right? That's not quite fair . The ABP1 WUs are identical for CPU and GPU, and wiull earn 40 credits no matter if crunched on a CPU or GPU.
All the other examples for different CPUs show that you can't get it exactly equal across all platforms.
I wouldn't call the memory access pattern of ABP2 "chaotic", a significant part of it is FFT which has a rather regular access pattern, but still too complex for most CPUs prefetching logic.
But you are comparing CPU and CUDA GPU performances on the i7 here, right? That's not quite fair . The ABP1 WUs are identical for CPU and GPU, and wiull earn 40 credits no matter if crunched on a CPU or GPU.
All the other examples for different CPUs show that you can't get it exactly equal across all platforms.
I wouldn't call the memory access pattern of ABP2 "chaotic", a significant part of it is FFT which has a rather regular access pattern, but still too complex for most CPUs prefetching logic.
CU
HB
argh! Did the no GPU setting get reset when ABP2 came out? I could've sworn I had it turned off before, as opposed to just blocking it via app info. I guess this explains why I took a bit of a hit on collatz the last day or so.
edit: does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.
... does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.
If you abort and return 16 GPU tasks and allow one CPU task to be completed afterwards, your quota of 32 will drop temporarily to 16 and then be restored to 32 by the one CPU task. Rinse and repeat until you have cleared all GPU tasks :-).
... does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.
If you abort and return 16 GPU tasks and allow one CPU task to be completed afterwards, your quota of 32 will drop temporarily to 16 and then be restored to 32 by the one CPU task. Rinse and repeat until you have cleared all GPU tasks :-).
So my nuking about 200 WU's at once netted a temporary penalty of roughly a dozen. Not bad, and I can see why noone's noticed it before.
I am finally running ABP2 units on my Linux box, an Opteron 1210 at 1.8 GHz with SuSE Linux 11.1 and BOINC 6,6.41. It took 6,716.13 s. Don't know about credits since it is still pending,
Tullio
RE: The only obvious Q8400
)
Early returns are in, and the hypothesis that ABP2 has a higher tendency to get memory-bound than other recent Einstein and SETI aps looks yet more likely.
I have a very close matched pair of hosts with quad-core Conroe (Q6600) and dual-core Conroe (E6600) running on the same model motherboard, same RAM, same clock frequency (stock 2.4 GHz)... Historically, the quad has taken slightly longer on average to do comparable work, but well under 10%.
So far, however, ABP2 running on all four cores is taking about 3600 seconds on the Q6600, vs 2800 on the E6600--a much bigger difference than I am used to seeing.
I also have a Q9450 Penryn-class quad running stock at 2.83 GHz. Historically it enjoys somewhat better than clock-rate advantage over the Q6600, possibly because of architectural advantage, possibly because of considerably higher RAM bandwidth.
Here it is taking about 2600 seconds for ABP2 running 4-up, but a couple of results which ran against GW or SETI work took only 2200. Taken together, these results suggest that even the higher Penryn bandwidth is burdened by the ABP2 RAM demands in a 4-up configuration, but less so than the Conroe.
As I run my RAM dead stock, if my initial guesses are right (which depend, in part, on the notion that unlike GW, ABP2 results are very similar to one another in computation requirement) this all suggests I have an opportunity by twisting the tail on RAM. Sadly, the major opportunity would be on my Q6600 system. But that is my daily driver, host to my serious audio hobby, and all my financial affairs. I'm not ready to go through a "try until fail and back down" sequence on it. So the notion will remain unverified, at least by me.
Anybody come across this
)
Anybody come across this before...?
Fri Jan 22 09:07:17 2010 Einstein@Home [error] p2030_54170_48472_0076_G51.55+00.04.C_4.dm_291_1: negative FLOPs left -12433123904410.585938
I came across several of these, but I can't trace which work unit they belong to.
credit for these WU's is
)
credit for these WU's is seriously too high. I've got them on 4 of my 6 PCs so far.
Core 1 duo T2250 (1.73 ghz) Vista
S5R6 12-18 credits/core/hour
ABP2 19-24 credits/core/hour
22% faster on average
p3M 800 Win7
S5R6 5.4-6.4 credits/core/hour
ABP2 9.6 credits/core/hour (single WU)
62% faster
Athon 64x2 3800 (2ghz) linux
S5R6 17-25 credits/core/hour
ABP2 24 credits/core/hour
14% faster
Corei7-920 (3.85 ghz) win7
S5R6 29-42 credits/core/hour
ABP2 78 credits/core/hour
120% faster
The spread on my i7 is enormous. If I'm right in thinking ABP2 has a chaotic memory access pattern the much larger amount of cache available and the massive bandwidth that DD3-1600 provides is probably responsible.
The wide spread in performance differentials makes picking the proper credit value somewhat difficult, but it should be a value where some computers are faster with S5R6 and others faster with ABP2 instead of one where some are slightly faster with ABP2 and other enormously faster.
But you are comparing CPU and
)
But you are comparing CPU and CUDA GPU performances on the i7 here, right? That's not quite fair . The ABP1 WUs are identical for CPU and GPU, and wiull earn 40 credits no matter if crunched on a CPU or GPU.
All the other examples for different CPUs show that you can't get it exactly equal across all platforms.
I wouldn't call the memory access pattern of ABP2 "chaotic", a significant part of it is FFT which has a rather regular access pattern, but still too complex for most CPUs prefetching logic.
CU
HB
RE: But you are comparing
)
argh! Did the no GPU setting get reset when ABP2 came out? I could've sworn I had it turned off before, as opposed to just blocking it via app info. I guess this explains why I took a bit of a hit on collatz the last day or so.
edit: does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.
RE: edit: does the server
)
I don't think so. In fact I'm not even sure the GPU is considered for calculating the quota at all, I think it's CPU cores * 32 at the moment.
CU
H-B
RE: RE: edit: does the
)
Indeed, my C2D has a quota of 64 tasks. So the GPU doesn´t count.
RE: ... does the server
)
If you abort and return 16 GPU tasks and allow one CPU task to be completed afterwards, your quota of 32 will drop temporarily to 16 and then be restored to 32 by the one CPU task. Rinse and repeat until you have cleared all GPU tasks :-).
Cheers,
Gary.
RE: RE: ... does the
)
So my nuking about 200 WU's at once netted a temporary penalty of roughly a dozen. Not bad, and I can see why noone's noticed it before.
I am finally running ABP2
)
I am finally running ABP2 units on my Linux box, an Opteron 1210 at 1.8 GHz with SuSE Linux 11.1 and BOINC 6,6.41. It took 6,716.13 s. Don't know about credits since it is still pending,
Tullio