Sorry. You might agree that it's pretty hard to please everyone.
It would help, I guess, to find out on which systems the 1.05 runs faster and on which it's slower, so I can implement an automatic selection independently of the manual "Beta" one.
Yes, but in former times we have had a Website wich was able to search by an app selection and counting pending's, valid's, invalid's and error's. That would help a lot to find out bad WU's, App's and Hosts!
Sorry, but in other Postings here are running a Discussion about "look and feel" of the new Website but the Basic Tools from the old Site are still not working after about 8 Weeks since changing this Site.
It would help, I guess, to find out on which systems the 1.05 runs faster and on which it's slower, so I can implement an automatic selection independently of the manual "Beta" one.
I wonder, the issue seems to be with the happy Linux users, like it happened before? I don´t have an overview of it all, just judging on AgentB´s and Gary´s systems...
Couldn´t you apply what was needed to solve that problem to create 1.04 for Linux earlier? If that works - famous last words - FGRPB1 could go in production again and not rely on beta testing at all anymore. Yes, I know, it seems like some work for about less than a month left with cuda55 / Parkes PMPS XT crunching. However, at the same time, it wouldn´t rely on people reading here to get the benefits of more efficient crunching, if it could just go in production state again?
Sorry. You might agree that it's pretty hard to please everyone.
I agree, then again i might disagree just to prove you are correct . Just to be both inconsistent and consistent.
Quote:
It would help, I guess, to find out on which systems the 1.05 runs faster and on which it's slower, so I can implement an automatic selection independently of the manual "Beta" one.
OK I have another not faster one. On this i7-860host 4918234.
Application N Average Median Minimum Maximum StdDeviation
Progress only comes when things change, and i quite like it when things change. You always go slower on the bends. I think it's amazing that E@H works, and keeps on working.
I wonder, the issue seems to be with the happy Linux users, like it happened before?
I can confirm from the DB that indeed almost all hosts that don't benefit from 1.05 are running Linux. So for the time being I made 1.05 the default again for all non-Linux platforms.
This indeed looks like a compiler issue again, but if it's comparable to the previous issue, it must be the other way 'round - the 1.05 Linux app version was definitely built with the same compiler version than the 1.04.
I had a look at my results. Computer is 64-bit Linux, SSE2, 4 tasks running on 4 physical cores (after finding that 4+4=4.5 for GW tasks):
1.00: ~17500s (2 tasks still visible) 1.01: 16800s (1 task still visible) 1.03: 22300s (1 task still visible) 1.04: 0 tasks still visible 1.05: ~20500s
And my recollection is these tasks used to take ~4.5h, consistent with the surviving 1.00 and 1.01 examples. So that 2% increase in time at 1.03 seems a bit larger than 2%, but 1.05 may bring the time partway back down. Tasks 1.03 and above are labelled FGRPSSE -- should I expect this to say SSE2 instead?
I see that 1.05 have relatively high needs for memory bandwidth. On a two socket system, based on Intel quad core CPU with relatively low clock (2.27GHz), running 8 task parallel (4 task on each, so no HT usage) it uses more than ~10GB/s per socket memory bandwidth:
So this system have relatively low clock and a triple channel DDR3 memory so it perform quiet well, but I think that a more modern ~4GHz CPU with only two memory channels, even with DDR4, may be memory bandwidth starved.
I had a look at my results. Computer is 64-bit Linux, SSE2, 4 tasks running on 4 physical cores (after finding that 4+4=4.5 for GW tasks):
1.00: ~17500s (2 tasks still visible) 1.01: 16800s (1 task still visible) 1.03: 22300s (1 task still visible) 1.04: 0 tasks still visible 1.05: ~20500s
And my recollection is these tasks used to take ~4.5h, consistent with the surviving 1.00 and 1.01 examples. So that 2% increase in time at 1.03 seems a bit larger than 2%, but 1.05 may bring the time partway back down. Tasks 1.03 and above are labelled FGRPSSE -- should I expect this to say SSE2 instead?
You should disable beta testing for that host. That way, you will not get the 1.05 version anymore, but hopefully and likely the for Linux better 1.04 (for now).
A quick update: Bernd isn't available this week and we're preparing the next GW analysis run with high priority. We're going to look into the compiler/optimization issue on Linux again when he returns, so please bear with us.
I see that 1.05 have relatively high needs for memory bandwidth.
That makes sense: the look-up table has to be used intensively, otherwise it woudln't help speeding things up (normally). and we know from Bernds post it's requiring about 100 MB of RAM (per instance), i.e. a lot more than any current CPU cache offers, so those values must be fetched from main memory.
Quote:Bernd Machenschalk
)
Yes, but in former times we have had a Website wich was able to search by an app selection and counting pending's, valid's, invalid's and error's. That would help a lot to find out bad WU's, App's and Hosts!
Sorry, but in other Postings here are running a Discussion about "look and feel" of the new Website but the Basic Tools from the old Site are still not working after about 8 Weeks since changing this Site.
Greetings from the North
Bernd Machenschalk wrote:It
)
I wonder, the issue seems to be with the happy Linux users, like it happened before? I don´t have an overview of it all, just judging on AgentB´s and Gary´s systems...
Couldn´t you apply what was needed to solve that problem to create 1.04 for Linux earlier? If that works - famous last words - FGRPB1 could go in production again and not rely on beta testing at all anymore. Yes, I know, it seems like some work for about less than a month left with cuda55 / Parkes PMPS XT crunching. However, at the same time, it wouldn´t rely on people reading here to get the benefits of more efficient crunching, if it could just go in production state again?
Bernd Machenschalk
)
I agree, then again i might disagree just to prove you are correct . Just to be both inconsistent and consistent.
OK I have another not faster one. On this i7-860 host 4918234.
Application N Average Median Minimum Maximum StdDeviation
FGRPB1v1.00 1346 28098.1 28052.7 22377.86 43664.31 2144.33
FGRPOLDv1.01 46 28193.2 27672.8 23603.95 32379.21 2160.72
FGRPSSEv1.03 76 33262.2 33352.5 29386.59 35967.80 1536.24
FGRPSSEv1.04 33 29007.7 29040.81 24787.41 33561.67 2502
FGRPSSEv1.05 29 30696.9 30500.81 29145.24 33279.06 1099.18
Progress only comes when things change, and i quite like it when things change. You always go slower on the bends. I think it's amazing that E@H works, and keeps on working.
Jasper_7 wrote: I wonder, the
)
I can confirm from the DB that indeed almost all hosts that don't benefit from 1.05 are running Linux. So for the time being I made 1.05 the default again for all non-Linux platforms.
This indeed looks like a compiler issue again, but if it's comparable to the previous issue, it must be the other way 'round - the 1.05 Linux app version was definitely built with the same compiler version than the 1.04.
BM
I had a look at my results.
)
I had a look at my results. Computer is 64-bit Linux, SSE2, 4 tasks running on 4 physical cores (after finding that 4+4=4.5 for GW tasks):
1.00: ~17500s (2 tasks still visible)
1.01: 16800s (1 task still visible)
1.03: 22300s (1 task still visible)
1.04: 0 tasks still visible
1.05: ~20500s
And my recollection is these tasks used to take ~4.5h, consistent with the surviving 1.00 and 1.01 examples. So that 2% increase in time at 1.03 seems a bit larger than 2%, but 1.05 may bring the time partway back down. Tasks 1.03 and above are labelled FGRPSSE -- should I expect this to say SSE2 instead?
I see that 1.05 have
)
I see that 1.05 have relatively high needs for memory bandwidth. On a two socket system, based on Intel quad core CPU with relatively low clock (2.27GHz), running 8 task parallel (4 task on each, so no HT usage) it uses more than ~10GB/s per socket memory bandwidth:
| READ | WRITE |
---------------------------------------------------------------------------------------------------------------
SKT 0 10.85 1.91
SKT 1 10.33 1.69
---------------------------------------------------------------------------------------------------------------
* 21.18 3.60
So this system have relatively low clock and a triple channel DDR3 memory so it perform quiet well, but I think that a more modern ~4GHz CPU with only two memory channels, even with DDR4, may be memory bandwidth starved.
Darren Peets wrote:I had a
)
You should disable beta testing for that host. That way, you will not get the 1.05 version anymore, but hopefully and likely the for Linux better 1.04 (for now).
1.04: ~15800s (15
)
1.04: ~15800s (15 tasks)
OK, yes, there may be an issue with 1.05 on this platform.
A quick update: Bernd isn't
)
A quick update: Bernd isn't available this week and we're preparing the next GW analysis run with high priority. We're going to look into the compiler/optimization issue on Linux again when he returns, so please bear with us.
Thanks,
Oliver
Einstein@Home Project
Sebastian M. Bobrecki wrote:I
)
That makes sense: the look-up table has to be used intensively, otherwise it woudln't help speeding things up (normally). and we know from Bernds post it's requiring about 100 MB of RAM (per instance), i.e. a lot more than any current CPU cache offers, so those values must be fetched from main memory.
MrS
Scanning for our furry friends since Jan 2002