Perhaps, and perhaps not very deep either :-).
The Haswell is an i3-4130 (2 cores / 4 threads) @ 3.4 GHz with 2 free threads.
The Haswell Refresh is a G3258 (2 cores / 2 threads) @ 3.9 GHz with 1 free thread.I had expected that supporting 4 GPU tasks with 1 free core would be a penalty (even though the 3.9 GHz is an obvious bonus) such that there might be a detrimental effect on the CPU component of the crunch time.
When I saw the G3258 giving 618 secs average CPU time and the i3-4130 giving more than 50% higher at 960 secs, I wondered if there was something beneficial with Haswell Refresh. Perhaps the benefit is coming from 3.9 GHz compared to 3.4 GHz, although that seems like too big a difference for not that big a frequency increase. Perhaps part of the difference is not having the use of HT.
____________
Cheers,
Gary.
It's been a lot of years since I really dug into chip timings but Gary's comments brought some old memories back from the 80s.
Back then if you were looking for the absolute best throughput you had to select your components and clock speeds carefully to avoid forcing the CPU into "wait" states. For you none hardware guys and gals out there, a "wait" state is invoked on the CPU while it waits for a response from the memory chips. This was due to the memory chips being so much slower than the CPUs. One wait state was normally one clock cycle in length.
Here's an example problem. Your memory can return a command or data in 1.1 cpu clock cycles. That means your cpu has to use 2 wait states or cycles to grab the data because the data is not ready during the first one. So you end up wasting almost a full clock cycle of time therefore slowing completion times, sometimes dramatically. In this case, slowing your cpu slightly so it could grab data from memory after one wait state instead of two would cut your wasted clock cycles by half.
I'm not sure if all that applies with modern computers, but here's my question. All other things being equal for debate purposes, could the difference in Gary's machines be that the slightly higher clock speed on the one machine is forcing the cpu into more wait states therefore causing longer completion times?
Phil
I thought I was wrong once, but I was mistaken.
Copyright © 2024 Einstein@Home. All rights reserved.
Do modern computers use wait states?
)
An interesting question, Phil. Although I'm sure the answer is a resounding "no" to the last part.
Wait states or "no op" operations are well and alive in modern chips. Intel, IBM, Oracle and probably others try to keep their "fat" CPU cores busy by letting them run several program threads independently. So if one thread is waiting for data from caches or main memory (or worse: even slower I/O) the CPU simply makes more execution ressources available to the other thread(s). It's called Hyper Threading by Intel and yields nice throughput gains in some conditions (and none in others).
In fact, the problem of memory being slower than CPUs has only got worse since the 80's. Random main memory access times of really fast system can reach ~30 ns, with typical values being 40 - 60 ns. Mediocre systems reach 80 - 100 ns, with some even approaching 200 ns. Those numbers have not really gotten any better since the CPUs advanced from 1 GHz (1 ns) to 4 GHz (0.25 ns). Those are many clock cycles and the chip designers andcompiler guys put great effort into prefetching & caching things, to avoid sudden unexpected trips to main memory.
Typically we're talking about >100 CPU cycles latency here. Which is why single wait states don't matter any more. Hyper Threading does matter in Gary's example though, see my answer in that thread.
MrS
Scanning for our furry friends since Jan 2002
Thanks for the explanation of
)
Thanks for the explanation of more modern computers Extra. That makes perfect sense to me.
Phil
I thought I was wrong once, but I was mistaken.