Hello,
after finishing some Arecibo-WUs I can see that the credits being given for them is significantly lower than the ones being given to the GW-search:
At one of my computers a GW-search lead to about 8,33... credits per 1000 CPU seconds.
The pulsar-search only lead to about 6 credits per 1000 CPU seconds.
Is there a specific reason for this or will this be adjusted. I can imagine that many people will exclude Arecibo-data being processed because of this.
Regards, Lothar
Copyright © 2025 Einstein@Home. All rights reserved.
Credits for Arecibo vs. GW-search
)
Thanks for the feedback. The devs are aware of this and will use the statistics of the returned results so far to re-calibrate the credits of the Arecibo search. It's impossible to get it 100 % equal with the S5R5 search across all platforms (SSE, SSE2...) but I'm sure the credits will be adjusted to be a bit more fair to the Arecibo search rather sooner than later. Stay tuned!
Thx again,
Bikeman
RE: It's impossible to
)
Hi,
Could it be, that the Arceibo-Application is based on a totally different evaluation model (i.e. integer arithmetics)?
I've two VIA C7 CPUs running, and both finish the Arecibo-WUs much faster than the regular ones. This one for example finishes Einstein-WUs between 210,000 and 320,000 seconds for 150 to 200 credits. The Arecibo-WUs take 160,000 seconds for 250 credits!
Every other brand cpu takes about 10-25% more processing time für Arecibo data, but these little guys need 25-50% less!
Since the C7 does integer arthmetics quite ok but stalls at floating point calculations, I would guess, the Arecibo-Application makes excessive use of integer operations. This would also explain, why there is no SSE-Version of the Arecibo-Application.
Am I right with this?
Rudi
PS: Of course, the C7 is not "faster" with integer, it's just totally sluggish with floating point, so there's no advantage for these here.
This is indeed
)
This is indeed remarkable.
No, AFAIK the "Arecibo" app is not less dependent on Floating Point calculations compared to the S5R5 app.
Given the clock rates of the C7, the runtime of the S5R5 app is incredibly long. It's worth noting that the C7 supports SSE2 so the SSE2 optimized app variant gets selected. But.... I'm not sure whether the C7 implementation of SSE(2) is really meant to provide maximum performance or just compatibility.
You could test whether the generic, "unoptimized" FPU x87 code is actually faster than the SSE2 implementation on the C7. If you create a file named CPU_TYPE_0 in the BOINC folder (the one with the projects subfolder) forces the "switcher" to select the standard app variant even if the switcher detects SSE(2) support.
CU
Bikeman
RE: This is indeed
)
Well this is indeed. Two similar apps with 50% difference in processing time. Go figure...
Since it even supports SSE3, I'd rather play the "compatibility" card.
I'll give that a try, as soon as the current WU are finished.
See you guys
Rudi
RE: You could test whether
)
Wow. This is next to ridiculus...
The first FPU-only processed result took nearly 80% longer than the one done with SSE2.
So much for "only compatibility". Doesn't end up though, why the Arecibo-App is so much faster on this type of CPU...
I'll guess, I have to live with that. There are just some VIA/Centaur CPUs around here... so no reason to burn valuable programming-time for these. Maybe the next regular app-update will do better, by some x86-miracle.
Thanks all
Rudi
Hi! Thanks for trying
)
Hi!
Thanks for trying this, it excludes one possible explanation for what you observed.
Next possible explanation would be the size of the L2 cache (it's 128 KB???). The S5R5 app seems to benefit from a lot of cache because it involves some memory intensive computations (the "pattern matching" part of the computation, while the other part is more floating point arithmetic intensive). 128 KB isn't too much, even the consumer AMD CPUs that have 512 KB per core seem to suffer a bit compared to Intel CPUs with 1MB+ L2 cache per core when it comes to S5R5 performance.
It happens that I'm currently looking into making the app less dependent on L2 cache (actually a by-product of my first steps in CUDA-land :-) ), and it would be interesting to test-drive a prototype on your system exactly because it is "cache-challenged". Do you run a 64 or 32 bit Linux?
EDIT: ooops, the C7 is only 32 bit. Ok, if interested in benchmarking a prototype app, send me a PM.
CU
Bikeman