I think all PI-MMX's are going to be hard pressed to meet any but the lowest template frequency WU's at this point, based on what my K6's have done even taking into account the stronger FPU's in them. Assuming the project team can roughly half the runtime once they optimize it should open them up to a wider range of frequencies, but if the deadline stays at two weeks EAH will be a very tight deadline project for them.
The expected optimization may half the runtime per workunit, but only for (at least) SSE capable CPUs, so anything below a P III or Athlon XP won't benefit from SSE codepaths. However, the current Windows app is running quite slow on those clients compared to the Linux app, so a ca. 30% increase might be possible for those older CPUs as well under Windows.
I think all PI-MMX's are going to be hard pressed to meet any but the lowest template frequency WU's at this point, based on what my K6's have done even taking into account the stronger FPU's in them. Assuming the project team can roughly half the runtime once they optimize it should open them up to a wider range of frequencies, but if the deadline stays at two weeks EAH will be a very tight deadline project for them.
The expected optimization may half the runtime per workunit, but only for (at least) SSE capable CPUs, so anything below a P III or Athlon XP won't benefit from SSE codepaths. However, the current Windows app is running quite slow on those clients compared to the Linux app, so a ca. 30% increase might be possible for those older CPUs as well under Windows.
CU
BRM
Well I don't know where you got that from. When Akos worked over the S5R1 apps he didn't leave the old timers out then (ie non-SSE), performance improved by a factor 2 for them. I haven't seen anything said about not doing any thing for them this time around, so I would expect there to be comparable gains all other things being equal.
Well I don't know where you got that from. When Akos worked over the S5R1 apps he didn't leave the old timers out then (ie non-SSE), performance improved by a factor 2 for them. I haven't seen anything said about not doing any thing for them this time around, so I would expect there to be comparable gains all other things being equal.
Alinator
Akos already helped to improve the C source code of the current app's hot loop, so it's not completely un-optimized;-). Any further improvements (short of using SSE(n) instructions) would have to come from handcoding the algorithm in assembly language, and modern compilers are not that bad that you can expect a speedup by the factor of 2 when using the same instruction set.
I've taken a look at the compiler output and it's not all that bad, actually, except for one thing that will be corrected soon and will hopefully bring performance parity between Windows and Linux.
Maybe Akos can do magic again, I just think it's unfair to expect that with every iteration of optimization, a factor of 2 can be achieved.
RE: I think all PI-MMX's
)
The expected optimization may half the runtime per workunit, but only for (at least) SSE capable CPUs, so anything below a P III or Athlon XP won't benefit from SSE codepaths. However, the current Windows app is running quite slow on those clients compared to the Linux app, so a ca. 30% increase might be possible for those older CPUs as well under Windows.
CU
BRM
RE: RE: I think all
)
Well I don't know where you got that from. When Akos worked over the S5R1 apps he didn't leave the old timers out then (ie non-SSE), performance improved by a factor 2 for them. I haven't seen anything said about not doing any thing for them this time around, so I would expect there to be comparable gains all other things being equal.
Alinator
RE: Well I don't know
)
Akos already helped to improve the C source code of the current app's hot loop, so it's not completely un-optimized;-). Any further improvements (short of using SSE(n) instructions) would have to come from handcoding the algorithm in assembly language, and modern compilers are not that bad that you can expect a speedup by the factor of 2 when using the same instruction set.
I've taken a look at the compiler output and it's not all that bad, actually, except for one thing that will be corrected soon and will hopefully bring performance parity between Windows and Linux.
Maybe Akos can do magic again, I just think it's unfair to expect that with every iteration of optimization, a factor of 2 can be achieved.
CU
BRM