I have two hosts, 11711999 and 12133872, running O1AS20-100I tasks using the same CPU, the same RAM and the same OS. Yet one takes 50% more time to complete a task than the other. Main differences are the motherboard, the GPUs and the number of CPU tasks running. To rule most of that out I ran a test where I suspended all tasks except a single O1AS20-100I. After 100 minutes one machine showed 10% progress, the other 15%. No throttling or swapping involved as far as I can tell. Then I did the same with a single FGRPB1 task and both were running at the same speed. Is it possible that the slowdown is caused by the data the application is processing or do I have to look elsewhere?
Copyright © 2024 Einstein@Home. All rights reserved.
Big speed difference with O1AS20-100I tasks
)
That is a prime example on what we see overall and we have no clue why. The tasks are designed to have a specific runtime. So every task should have approximately the same runtime on a given host. Having two identical hosts one expects the runtime to be roughly the same. Why this is not the case for this science run we don't know.
We can gather some information about seemingly identical hosts and see if we can come up with a difference.
The stats for your hosts are currently:
There are a lot of possibilities. Your cpu could have a slight flaw that slows it down or you have a process in the background that forces a different cache behavior on the first host than on the second.
RE: Is it possible that the
)
Hi floyd, i was wondering what mobos you were running and if you were able to determine what the clock and memory speeds were set to.
I'm not sure what linux you are running but maybe compare the "dmidecode" and watch lscpu for a period of time.
I got similar problem
)
I got similar problem once(not in this run). And it cause was just dusty cooler on one of the almost identical hosts :)
So one CPU runs at optimal conditions while second go in throttling mode and back to normal from time to time due minor overheating.
Although in my case its shown performance difference in all tasks, not just one type. But it also vary a lot from one type to another something shows only 10-15% slowdow, where a factor of 1.5 on another types. Seem it depends on how much tasks type heat up CPU (it depend on code and optimizations used)
By the way it is exactly same CPUs: AMD FX-8320. So i can say it is clearly some problems with your computer # 11711999.
Both my FX-8320 shows performance about same level as your # 12133872 (14-15 hours of runtime) even running completely different OS (Win7 x64) on different Motherboards/RAM/GPUs - speed about same. While id 11711999 is ~50% slower compared to 3 different hosts with same CPU.
Are you sure no throttling involved on 11711999? May be not continuous throttling but sporadically - e.g. on ambient (room) temperature trigger it from time to time.
RE: I'm not sure what linux
)
I also thought you might be able to run perf to look at the % cache hits.
http://www.brendangregg.com/perf.html has a good page and
Performance counter stats for process id '4146':
20343.655246 task-clock (msec) # 0.899 CPUs utilized [100.00%]
5,161 context-switches # 0.254 K/sec [100.00%]
1,354 cpu-migrations # 0.067 K/sec [100.00%]
3 page-faults # 0.000 K/sec
91,135,821,052 cycles # 4.480 GHz [100.00%]
# perf stat -p 4146 -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores sleep 5 ## something a bit more memory cache related
Performance counter stats for process id '4146':
9,882,110,572 L1-dcache-loads [100.00%]
423,238,272 L1-dcache-load-misses # 4.28% of all L1-dcache hits [100.00%]
5,079,037,249 L1-dcache-stores
5.002558811 seconds time elapsed
Thanks for your replies,
)
Thanks for your replies, they're really appreciated. As requested, I'll post some more information on the hosts. The faster one, let's call it asmodeus, has an ASRock 990FX Extreme3 motherboard, 990FX chipset, 4GB of DDR3-1600 RAM and two HD7750 GPUs. The other one, juiblex, is equipped with an ASRock 960GM/U3S3 FX board, 760G chipset, the same RAM and a single GTX750Ti. The processors are clocked at 3200MHz, the stock speed for FX-8320E. For the RAM, juiblex follows the JEDEC definitions with 9-9-9-28, asmodeus chooses XMP with 9-9-9-24. That can't make much difference and indeed memtest86 shows similar speeds. During my tests I ran
watch 'grep -i "cpu mhz" /proc/cpuinfo | sort | uniq -c'
in a terminal and there were always enough cores running at full speed. As watch updates every other second by default I can't be absolutely sure there was no sporadic throttling but it can't have been more than that or I should have spotted it eventually.Today I've been digging through the BIOS settings. I've changed everything that I could imagine to have an effect, but no luck. Now I'm trying with OS tools. dmidecode shows only 333MHz RAM speed on juiblex but for now I take this as a false alarm. That perf tool looks interesting but I'll have to do some reading to use it correctly and understand the results. First thing I looked at was the cache statistics as in the example, but juiblex actually shows less cache misses than asmodeus. Then this catches the eye:
Performance counter stats for process id '1210':
29887.244383 task-clock (msec) # 0.996 CPUs utilized [100.00%]
3100 context-switches # 0.104 K/sec [100.00%]
15 cpu-migrations # 0.001 K/sec [100.00%]
4 page-faults # 0.000 K/sec
97922606745 cycles # 3.276 GHz [83.42%]
4317177296 stalled-cycles-frontend # 4.41% frontend cycles idle [83.41%]
27660669045 stalled-cycles-backend # 28.25% backend cycles idle [33.19%]
57113654102 instructions # 0.58 insns per cycle
# 0.48 stalled cycles per insn [49.91%]
8635678139 branches # 288.942 M/sec [66.53%]
70923375 branch-misses # 0.82% of all branches [83.31%]
30.000930685 seconds time elapsed
Performance counter stats for process id '2439':
29999.404385 task-clock (msec) # 1.000 CPUs utilized [100.00%]
2855 context-switches # 0.095 K/sec [100.00%]
36 cpu-migrations # 0.001 K/sec [100.00%]
0 page-faults # 0.000 K/sec
98355818549 cycles # 3.279 GHz [83.33%]
1833446877 stalled-cycles-frontend # 1.86% frontend cycles idle [83.34%]
43593860124 stalled-cycles-backend # 44.32% backend cycles idle [33.33%]
98043871412 instructions # 1.00 insns per cycle
# 0.44 stalled cycles per insn [49.99%]
14472388690 branches # 482.423 M/sec [66.66%]
3986575 branch-misses # 0.03% of all branches [83.29%]
30.001032530 seconds time elapsed
I have a feeling there's something interesting in there but I don't understand it yet. Will continue staring at it tomorrow.
RE: dmidecode shows only
)
hmm I think i would double check what is reported in dmesg and bootup as well, that to me, doesn't seem right.
Are you running a single DIMM? If multiple are the in the right slots? I'd definitely reseat them / swap them between PCs.
RE: dmidecode shows only
)
What does it show on the other system? If the systems are truly so very similar, why might this tool show such a different answer? In particular, on what basis do you think this false?