Hrrrmmmfff! FWIW - RR_7F refuses to comment/predict ( same either way, if 167/169 are in or out of the set ) on this 18/61.9 ~ 1/4 of a cycle.
Cheers, Mike.
That might be partly because that is a vastly wrong estimate of the cycle length. Should be something very close to 90. I posted a messy but pretty accurate cycle length estimate function for S5R4 a while back. I need to go away till dawn, but I'll find it for you then, if interested.
That might be partly because that is a vastly wrong estimate of the cycle length. Should be something very close to 90. I posted a messy but pretty accurate cycle length estimate function for S5R4 a while back. I need to go away till dawn, but I'll find it for you then, if interested.
Yeah, RR is broken for R4. Maybe I could fling in a 7g version for fiddles and giggles with a better cycle estimation. Leave it in HTML, which will reliably work across platform ( as the Java thing died on the vine somewhat on that issue ).
I switched over my Pentium 4 2.4GHz system the other day. It has returned a result that was completely processed with 6.05 that is substantially faster than previously with 6.04.
I haven't seen problems on my AMD Athlon64 3700+ system either, although there just doesn't seem to be as much performance increase on my AMD, although I did switch frequency templates which could be obscuring performance gains. Watching my wingman I don't think I've hit a minimum yet, but performance seems to be flat, ranging from 38797 seconds to 40050... (11 hours = 39600 seconds).
It is almost dawn here, but you found it first. For the case at hand that one estimates a cycle length of 88.5, which is close enough not to be obviously wrong, which can't be said of the 61.9 estimate.
I've had a very odd episode on my Q6600 WinXP host running 6.05. I actually doubt it is caused by the 6.05 ap, but report it as it is a major anomaly and I wish to encourage any other seeing such a thing to report it, just in case someone here can diagnose and help avoid repetition.
I believe I also have a couple of examples of much the same phenomenon, but with the 6.02 Linux app and not the new 6.05 Windows app. However, the new Windows app is the same codebase as the Linux app so the slowdown may have the same origin in both cases.
Recently I've built a group of six Q6600s, purely for crunching purposes. They are all overclocked and essentially operating under the same general conditions, stock cooling, stock Vcore, and approximately 2950 - 3000MHz. The motherboards are Gigabyte GA-G31M-S2l. As they are much faster that anything else in the fleet, I've been paying reasonably close attention. Five of the machines are running Linux and one is running Windows XP.
Each machine keeps a cache of about three days and the average estimated crunch time for tasks in the cache is normally in the 6 to 7 hour range. The windows machine was more in the 7 to 8 hour range until the 6.05 app appeared on the scene. It's now pretty much identical with the Linux machines. A couple of times and a couple of weeks ago, I noticed that the estimated time of cache tasks would suddenly jump by around 50% (ie 9 to 10 hour range) and then would gradually reduce back to normal. Of course this was caused by a single abnormally long running task which was disturbing the estimate (by increasing the DCF in one hit). As various people in the past had talked about "outlier" results like this, I didn't pay much attention initially.
It has now happened again a few days ago and the evidence is still there to see (well the important bits anyway). The interesting thing is that my host only suffered the problem for one task but my wingman not only suffered the problem on exactly the same task (highlighted in red) but also on other tasks, some of which were not part on my cache. Below is a list of tasks, shared with a single wingman, showing the development of the problem. My host is the first one listed (CPU_1) and the other (CPU_2) is that of my wingman. If there are dashes in the column for my host, it means that my host didn't share that particular task with my wingman.
It appears that for a period of something approaching half a day that Einstein tasks computed useful results far more slowly than normal, reporting far longer CPU time than in trend.
In my case, the disturbance lasted for one task only and has not recurred (another 90+ tasks crunched). For my wingman, the problem continues on. His times have not reverted to what they were.
Quote:
Some possible factors:
....
Since my machine does nothing but crunch, I can't see that this slowdown has anything to do with other jobs that might be tying up the CPU. There is absolutely nothing else but BOINC running on my machine. In any case, if something were hogging CPU cycles, wouldn't you expect that to show up in wall clock time rather than CPU time? I could imagine a small effect on CPU time but that should be far smaller than the effect on wall clock time.
Yeah, RR is broken for R4. Maybe I could fling in a 7g version for fiddles and giggles with a better cycle estimation. Leave it in HTML, which will reliably work across platform ( as the Java thing died on the vine somewhat on that issue ).
Mike, if you do modify RR for the new cycle period formula and want some data to test it with, I've rejigged my data gathering script for R4 and have quite a bit of data for my six new Q6600 hosts, already saved and growing each day. I'll upload the files to your site if you wish.
I've been running the script and saving the files at home so I won't be able to upload the data until tonight. The files are exactly the same format as they were for R3, complete with leading zeroes on sequence numbers, etc :-).
The six hosts do cover quite a range of frequencies.
Mike, if you do modify RR for the new cycle period formula and want some data to test it with, I've rejigged my data gathering script for R4 and have quite a bit of data for my six new Q6600 hosts, already saved and growing each day. I'll upload the files to your site if you wish.
I've been running the script and saving the files at home so I won't be able to upload the data until tonight. The files are exactly the same format as they were for R3, complete with leading zeroes on sequence numbers, etc :-).
The six hosts do cover quite a range of frequencies.
Thank you Gary, let's go with that plan. Please upload and I will give it a thrash. However please PM me with your password etc, as I forgot what I allocated to you for the subdomain, doh! I'll print it out this time! :-)
I think I could readily produce a version with an altered formula. More so I may/will include an interface on the page to allow full twiddling of the parameters in the formula ( should hard-coded defaults not be desired ). I'll also see if it is sensible to emit some fitness ( of the curve fit ) metric to hence compare any given fiddlings ( of combinations of parameters ). This could allow some manual mucking about to get a reasonable pragmatic/empirical fit. [ The overhead of going to three, four and n-way least squares parameter fits has so far been daunting for moi. ]
( edit ) Moi idiota - I already wrote in code for twiddling the parameters via the interface! OK, Mike acquaint yourself with the project again. I now discover one purpose of design artifacts - to transmit information to future self. :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
The interesting thing is that my host only suffered the problem for one task but my wingman not only suffered the problem on exactly the same task (highlighted in red) but also on other tasks, some of which were not part on my cache.
In my case, the disturbance lasted for one task only and has not recurred (another 90+ tasks crunched). For my wingman, the problem continues on. His times have not reverted to what they were.
Thanks for reporting. There seem some differences in the details in your case from mine. I had formed the intention to look to see if my quorum partner(s) suffered in the same Work Units but had failed to do so. You have motivated me to look. Rather to my surprise, a single partner ran a great many consecutive results in this range, which makes it very clear that he suffered no comparable problem. In my case the partner runs Win XP Pro SP2 on a Q9300 Quad.
I'll follow your convention and report my troubled host as CPU1, and the quorum partner as CPU2:
Some observations:
1. In your case the quorum partner's trouble seems to have started one WU before your own (I added accusatory color) in addition to continuing after your own.
2. In my case I don't see any obvious trouble with my quorum partner at all.
3. My partner is running the _1 variant of the 6.04 Windows ap, under 64-bit WinpXP pro SP2 under Client 6.2.14, while I am running Windows ap 6.05 under 32-bit WinXP Pro SP3 under client 5.10.45.
I'd love to believe the "bad WU" hypothesis, but if true, its seems to vary in its effect with something in the execution environment.
I think I could readily produce a version with an altered formula.
I've not worked on this subject since the post of a cycle estimate. What stopped me dead in my tracks was a perception that there was an important CPU time influence of frequency separate from the cycle effect. It seemed unlikely I would ever see enough samples from a broad enough range of frequencies to make a fit empirically, so I just quit. Also at the time it was not obvious to me that new aps wanting performance comparison were in prospect.
However, in the important case that someone changes any operating condition (e.g. Science Ap, or RAM timings, or hyperthreading enablement, or ...) within the same immediate frequency range, my concern is unwarranted. For that case your notional ap could provide a dramatically improved means of performance comparison with greatly reduced workload of the worthy attempting the comparison (whether an approaching user, or one of the regulars here).
I apologize profusely for the untoward complexity of my representation for the cycle length estimate: I don't imagine it will pose you any coding difficulty at all, but I rather suspect it means that a better form eluded my grasp. On the other hand, it is a vastly better fit than using any of the proposed S5R3 ones.
I've had a very odd episode on my Q6600 WinXP host running 6.05. I actually doubt it is caused by the 6.05 ap, but report it as it is a major anomaly and I wish to encourage any other seeing such a thing to report it, just in case someone here can diagnose and help avoid repetition.
I've found another (very recent) example of a dramatic slowdown, but on a different one of my Q6600s this time. Once again it was for a single task only and subsequent tasks have reverted to the usual speed. This is the quorum in question. The slowed crunch time (38,581secs) is fairly similar to what I saw in the previous example. Interestingly enough, the time of my wingman (30,973secs) shows little variation from the normal run of times for that host.
RE: Hrrrmmmfff! FWIW -
)
That might be partly because that is a vastly wrong estimate of the cycle length. Should be something very close to 90. I posted a messy but pretty accurate cycle length estimate function for S5R4 a while back. I need to go away till dawn, but I'll find it for you then, if interested.
RE: That might be partly
)
Yeah, RR is broken for R4. Maybe I could fling in a 7g version for fiddles and giggles with a better cycle estimation. Leave it in HTML, which will reliably work across platform ( as the Java thing died on the vine somewhat on that issue ).
That'd be this post?
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
I switched over my Pentium 4
)
I switched over my Pentium 4 2.4GHz system the other day. It has returned a result that was completely processed with 6.05 that is substantially faster than previously with 6.04.
I haven't seen problems on my AMD Athlon64 3700+ system either, although there just doesn't seem to be as much performance increase on my AMD, although I did switch frequency templates which could be obscuring performance gains. Watching my wingman I don't think I've hit a minimum yet, but performance seems to be flat, ranging from 38797 seconds to 40050... (11 hours = 39600 seconds).
RE: RE: I posted a messy
)
Yes.
It is almost dawn here, but you found it first. For the case at hand that one estimates a cycle length of 88.5, which is close enough not to be obviously wrong, which can't be said of the 61.9 estimate.
RE: I've had a very odd
)
I believe I also have a couple of examples of much the same phenomenon, but with the 6.02 Linux app and not the new 6.05 Windows app. However, the new Windows app is the same codebase as the Linux app so the slowdown may have the same origin in both cases.
Recently I've built a group of six Q6600s, purely for crunching purposes. They are all overclocked and essentially operating under the same general conditions, stock cooling, stock Vcore, and approximately 2950 - 3000MHz. The motherboards are Gigabyte GA-G31M-S2l. As they are much faster that anything else in the fleet, I've been paying reasonably close attention. Five of the machines are running Linux and one is running Windows XP.
Each machine keeps a cache of about three days and the average estimated crunch time for tasks in the cache is normally in the 6 to 7 hour range. The windows machine was more in the 7 to 8 hour range until the 6.05 app appeared on the scene. It's now pretty much identical with the Linux machines. A couple of times and a couple of weeks ago, I noticed that the estimated time of cache tasks would suddenly jump by around 50% (ie 9 to 10 hour range) and then would gradually reduce back to normal. Of course this was caused by a single abnormally long running task which was disturbing the estimate (by increasing the DCF in one hit). As various people in the past had talked about "outlier" results like this, I didn't pay much attention initially.
It has now happened again a few days ago and the evidence is still there to see (well the important bits anyway). The interesting thing is that my host only suffered the problem for one task but my wingman not only suffered the problem on exactly the same task (highlighted in red) but also on other tasks, some of which were not part on my cache. Below is a list of tasks, shared with a single wingman, showing the development of the problem. My host is the first one listed (CPU_1) and the other (CPU_2) is that of my wingman. If there are dashes in the column for my host, it means that my host didn't share that particular task with my wingman.
__freq_ seq# CPU_1 CPU_2 1164.00 1121 23687 26734 1164.00 1120 24296 26693 1164.00 1119 23426 26729 1164.00 1118 23173 34669 1164.00 1117 39356 39721 1164.00 1116 22386 39744 1164.00 1115 ----- 39753 1164.00 1114 ----- 39784 1164.00 1113 ----- 39900 1164.00 1166 ----- 42514 1164.00 1165 ----- 42520 1164.00 1164 ----- 42556
In my case, the disturbance lasted for one task only and has not recurred (another 90+ tasks crunched). For my wingman, the problem continues on. His times have not reverted to what they were.
Since my machine does nothing but crunch, I can't see that this slowdown has anything to do with other jobs that might be tying up the CPU. There is absolutely nothing else but BOINC running on my machine. In any case, if something were hogging CPU cycles, wouldn't you expect that to show up in wall clock time rather than CPU time? I could imagine a small effect on CPU time but that should be far smaller than the effect on wall clock time.
Cheers,
Gary.
RE: Yeah, RR is broken for
)
Mike, if you do modify RR for the new cycle period formula and want some data to test it with, I've rejigged my data gathering script for R4 and have quite a bit of data for my six new Q6600 hosts, already saved and growing each day. I'll upload the files to your site if you wish.
I've been running the script and saving the files at home so I won't be able to upload the data until tonight. The files are exactly the same format as they were for R3, complete with leading zeroes on sequence numbers, etc :-).
The six hosts do cover quite a range of frequencies.
Cheers,
Gary.
RE: Mike, if you do modify
)
Thank you Gary, let's go with that plan. Please upload and I will give it a thrash. However please PM me with your password etc, as I forgot what I allocated to you for the subdomain, doh! I'll print it out this time! :-)
I think I could readily produce a version with an altered formula. More so I may/will include an interface on the page to allow full twiddling of the parameters in the formula ( should hard-coded defaults not be desired ). I'll also see if it is sensible to emit some fitness ( of the curve fit ) metric to hence compare any given fiddlings ( of combinations of parameters ). This could allow some manual mucking about to get a reasonable pragmatic/empirical fit. [ The overhead of going to three, four and n-way least squares parameter fits has so far been daunting for moi. ]
I'll post/bump to an earlier RR thread to continue this line.
Cheers, Mike.
( edit ) Moi idiota - I already wrote in code for twiddling the parameters via the interface! OK, Mike acquaint yourself with the project again. I now discover one purpose of design artifacts - to transmit information to future self. :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: The interesting thing
)
Thanks for reporting. There seem some differences in the details in your case from mine. I had formed the intention to look to see if my quorum partner(s) suffered in the same Work Units but had failed to do so. You have motivated me to look. Rather to my surprise, a single partner ran a great many consecutive results in this range, which makes it very clear that he suffered no comparable problem. In my case the partner runs Win XP Pro SP2 on a Q9300 Quad.
I'll follow your convention and report my troubled host as CPU1, and the quorum partner as CPU2:
Some observations:
1. In your case the quorum partner's trouble seems to have started one WU before your own (I added accusatory color) in addition to continuing after your own.
2. In my case I don't see any obvious trouble with my quorum partner at all.
3. My partner is running the _1 variant of the 6.04 Windows ap, under 64-bit WinpXP pro SP2 under Client 6.2.14, while I am running Windows ap 6.05 under 32-bit WinXP Pro SP3 under client 5.10.45.
I'd love to believe the "bad WU" hypothesis, but if true, its seems to vary in its effect with something in the execution environment.
RE: I think I could readily
)
I've not worked on this subject since the post of a cycle estimate. What stopped me dead in my tracks was a perception that there was an important CPU time influence of frequency separate from the cycle effect. It seemed unlikely I would ever see enough samples from a broad enough range of frequencies to make a fit empirically, so I just quit. Also at the time it was not obvious to me that new aps wanting performance comparison were in prospect.
However, in the important case that someone changes any operating condition (e.g. Science Ap, or RAM timings, or hyperthreading enablement, or ...) within the same immediate frequency range, my concern is unwarranted. For that case your notional ap could provide a dramatically improved means of performance comparison with greatly reduced workload of the worthy attempting the comparison (whether an approaching user, or one of the regulars here).
I apologize profusely for the untoward complexity of my representation for the cycle length estimate: I don't imagine it will pose you any coding difficulty at all, but I rather suspect it means that a better form eluded my grasp. On the other hand, it is a vastly better fit than using any of the proposed S5R3 ones.
RE: I've had a very odd
)
I've found another (very recent) example of a dramatic slowdown, but on a different one of my Q6600s this time. Once again it was for a single task only and subsequent tasks have reverted to the usual speed. This is the quorum in question. The slowed crunch time (38,581secs) is fairly similar to what I saw in the previous example. Interestingly enough, the time of my wingman (30,973secs) shows little variation from the normal run of times for that host.
Cheers,
Gary.