What's your actual slowdown, %age wise? I'm seeing less than, but close to, double on Intels.
That is the right way to pose the question, for the moment. If "anyone is going to check" it will be by this sort of comparison.
Of course it would be better if the comparison were to two known points in the variability. Since we don't know that variability for S5R4, second best is to compare to a known S5R3 point, I'd suggest the average of several points near the cycle minimum (need to average because of the microfluctuations--forget the favored name here).
My (one) contribution to this comparison is from a WinXP Q6600 which was running 4.46.
Since June 1, looking at somewhat over 1000 results returned, I believe under conditions like the present (mainly the 2.8 GHz current clock rate).
S5R3 tasks near the minima have averaged about 15250 CPU seconds reported and the trend maximum (ignoring about four outliers) looks like about 19000.
My single S5R4 from this host took 30740 CPU seconds with a frequency of 129.1 and a sequence number of 8. On S5R3 this low frequency would have had a very short cycle (first guess is under four sequence numbers) and sequence 8 would actually have been a lowish one for CPU time. I rather think this relationship will have changed a lot on S5R4.
So to toss out a first number on my own suggested method, my initial S5R4 result took 2.02 time as much CPU time as did typical results near the cycle minimum on S5R3.
Enter that one in the Windows ap running on Conroe column, for the case that the previous S5R3 ap was the 4.46 Power Ap (4.36 would have been the same).
[edited to make explicit the previous ap condition in the summary result]
So to toss out a first number on my own suggested method, my initial S5R4 result took 2.02 time as much CPU time as did typical results near the cycle minimum on S5R3.
Enter that one in the Windows ap running on Conroe column, for the case that the previous S5R3 ap was the 4.46 Power Ap (4.36 would have been the same).
To add one other, less interesting host, I have a Windows98SE 930 MHz Coppermine for which 4.36 S5R3 results near the minimum have averaged about 85000 CPU seconds. The single result on on 6.04 S5R4 so far took 164015 CPU seconds for frequency 183.10 sequence 15. or 1.93 times longer than the old minimum.
I believe that for both these hosts I have enough data randomly enough distributed across cyclic variation that a comparison to the old average is reasonably apt:
Conroe single S5R4 result took 1.87 times the recent S5R3 4.46 average
Coppermine single S5R3 result took 1.71 times the recent S5R3 4.36 average.
This difference may well be a difference in where in the cycle the S5R4 result happened to lie, but it also could be an architecture-dependent speed change.
What's your actual slowdown, %age wise? I'm seeing less than, but close to, double on Intels.
On my WinXP 64 system, the average of the last 5 S5R3 WUs comes out to ~26500s, with a low of 24140 and a high of 29400...Power app in use.
Same System (SSE app verified as running), 6 S5R4 WUs average out to 46200s, with a low of 42115 and a high of 49150.
An increase of just under 20000s/WU or slightly less than 75% longer per WU.
With credit/WU on this data pak being ~193 vs the average 237 in S5R3, I'm seeing a credit reduction from ~33.5 cr/hr/cpu with S5R3 down to ~14.9 cr/hr/cpu with S5R4. That is a 55.5% drop in cr/hr/cpu. I have noted that the average credit/WU is expected to be raised a little.
I've completed a couple of S5R4 WUs now and verified they were run on the SSE .exe for Windows. My systems are all AMDs and seem to be awfully slow for an SSE app.
I'm wondering if we're encountering the AMD penalty again. Can anyone check?
I'll look at the binary for "AMD" being in it in just a minute... I'm typing this from the P4, which doesn't have a hex editor on it...
There is indeed an "AuthenticAMD" string in the "1" executable... :(
Just started to run S5R4s as the availability of S5R3s dried up.
I followed instructions, made the D/Ls and started crunching. The estimates for each on 2 rigs were 7 hours and ~5 hours. My projections of the time which will really be taken is 54 hours (the 7 hour estimate one) and 48 hours.
This seems awfully slow for an SSE client, and both rigs (a dual P3 Coppermine and a dual P4 Prestonia Xeon) run SSE and SSE/SSE2.
Where do I look to see if these rigs are running the CPU's SSE instructions?
At this time I have not completed any S5R4 WUs, so if the WU details show this I will have to wait for the result to hit the servers.
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
The estimates for each on 2 rigs were 7 hours and ~5 hours. My projections of the time which will really be taken is 54 hours (the 7 hour estimate one) and 48 hours.
Yes, for whatever reason, the initial estimates for time are way low (around 10 to 20% of actual for me). There are explanations floating about, but I've not been able to get my brain around them.
The "low ball" estimates can cause your machine to flood with more work than you can reasonably accomplish ... until the machine develops some history with the version and gets the estimated time better. I ended up manually deleting a number of WU's that would clearly not finish on time. (If you know your way around the option files, I understand that you can speed up that process of time estimate stabilization... but don't know the details.)
If you are moving from an S5R3 Power app like version 4.36 you may find that the actual times increase by about 80% (from 5 Hrs to about 9 Hrs for most of my machines). Credit claims are a bit (maybe a lot?) flakey, but these get sorted out at the server.
If you know your way around the option files, I understand that you can speed up that process of time estimate stabilization...
Actually it is lots easier than that. The process adapts only gradually when the new result gets done too quickly, however when the new result takes much longer than expected, the working value gets bumped all the way up right away.
So... if you are a long queue person, and you see some S5R4 joining your queue with remaining S5R3 coming before, and S5R4 stuff is getting over-fetched, you can just pause all the S5R3 work in your queue and let one S5R4 result finish. Immediately the estimate on your computer, which governs amount of fetching, will get bumped up. The copy on the web site won't follow until the result is reported, but that does not affect work fetch.
If you have a really big queue, you might wish to repeat this process after half a dozen or so subsequent S5R3 results (or you may just prefer to dial down your queue length by a factor of six until the dust clouds clear).
If you have a really big queue, you might wish to repeat this process after half a dozen or so subsequent S5R3 results (or you may just prefer to dial down your queue length by a factor of six until the dust clouds clear).
Yes! :)
I did "close the barn door" (cut back the queue length), but only after "many of the horses had escaped!" (At 64 WU downloaded/day on several quad machines, a goodly pile of overcommittment quickly accumulated! Situation was aggravated since I cut the new app_info in shortly before 2400UTC!)
I brought S5R4 exe's into the App_info file while there were still a few days worth of S5R3's waiting to be worked. (Once I had a "clean" App_info installed and working properly, I wanted to do the rest in "one swell foop.") Keeping all machines "humping" 24X7 was my priority ... I didn't want one to run dry (or even "light") while I was buried in my pillow.
I considered scrapping the whole cache on each machine, but instead elected to spend the hours necessary to estimate the items that could not be done in time and manually abort each of them.)
A bit of warning from project organizers before conversion would have avoided a bunch of confusion (and a bunch of work) for me. OTOH, maybe it caught them by surprise, too. I was not happy to hear (after the fact) criticism of longer cache lengths, though.
Regardless, I survived to crunch "again another day" so all is well.
My dual P3 (Win2K Pro) is just coming up to complete it's first 2 S5R4 WUs at 51-52 hours. Fortunately I set the local machine queue to1 day. But despite this I have over 8 days in the cache (based on the 50+ hour WU x 2).
I cannot see in Task Manager whether these are using SSE as the complete file descriptor is not given. It stops at einstein_S5R4_6 - so I cannot use the data Archae86 gave.
The other rig (Win XP Pro) does show I am using SSE, and the crunch times (4 at a time) are about 40 hours. In this case my 1 day cache, for the local machine, resulted in me still having about 7 days worth to crunch.
At least all of them are within the given deadline for results return.
Once I am well established crunching the S5R4s then my RAC will drop to below 700, I guess?
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
RE: What's your actual
)
That is the right way to pose the question, for the moment. If "anyone is going to check" it will be by this sort of comparison.
Of course it would be better if the comparison were to two known points in the variability. Since we don't know that variability for S5R4, second best is to compare to a known S5R3 point, I'd suggest the average of several points near the cycle minimum (need to average because of the microfluctuations--forget the favored name here).
My (one) contribution to this comparison is from a WinXP Q6600 which was running 4.46.
Since June 1, looking at somewhat over 1000 results returned, I believe under conditions like the present (mainly the 2.8 GHz current clock rate).
S5R3 tasks near the minima have averaged about 15250 CPU seconds reported and the trend maximum (ignoring about four outliers) looks like about 19000.
My single S5R4 from this host took 30740 CPU seconds with a frequency of 129.1 and a sequence number of 8. On S5R3 this low frequency would have had a very short cycle (first guess is under four sequence numbers) and sequence 8 would actually have been a lowish one for CPU time. I rather think this relationship will have changed a lot on S5R4.
So to toss out a first number on my own suggested method, my initial S5R4 result took 2.02 time as much CPU time as did typical results near the cycle minimum on S5R3.
Enter that one in the Windows ap running on Conroe column, for the case that the previous S5R3 ap was the 4.46 Power Ap (4.36 would have been the same).
[edited to make explicit the previous ap condition in the summary result]
RE: So to toss out a first
)
To add one other, less interesting host, I have a Windows98SE 930 MHz Coppermine for which 4.36 S5R3 results near the minimum have averaged about 85000 CPU seconds. The single result on on 6.04 S5R4 so far took 164015 CPU seconds for frequency 183.10 sequence 15. or 1.93 times longer than the old minimum.
I believe that for both these hosts I have enough data randomly enough distributed across cyclic variation that a comparison to the old average is reasonably apt:
Conroe single S5R4 result took 1.87 times the recent S5R3 4.46 average
Coppermine single S5R3 result took 1.71 times the recent S5R3 4.36 average.
This difference may well be a difference in where in the cycle the S5R4 result happened to lie, but it also could be an architecture-dependent speed change.
RE: What's your actual
)
On my WinXP 64 system, the average of the last 5 S5R3 WUs comes out to ~26500s, with a low of 24140 and a high of 29400...Power app in use.
Same System (SSE app verified as running), 6 S5R4 WUs average out to 46200s, with a low of 42115 and a high of 49150.
An increase of just under 20000s/WU or slightly less than 75% longer per WU.
With credit/WU on this data pak being ~193 vs the average 237 in S5R3, I'm seeing a credit reduction from ~33.5 cr/hr/cpu with S5R3 down to ~14.9 cr/hr/cpu with S5R4. That is a 55.5% drop in cr/hr/cpu. I have noted that the average credit/WU is expected to be raised a little.
Seti Classic Final Total: 11446 WU.
RE: I've completed a couple
)
I'll look at the binary for "AMD" being in it in just a minute... I'm typing this from the P4, which doesn't have a hex editor on it...
There is indeed an "AuthenticAMD" string in the "1" executable... :(
Just started to run S5R4s as
)
Just started to run S5R4s as the availability of S5R3s dried up.
I followed instructions, made the D/Ls and started crunching. The estimates for each on 2 rigs were 7 hours and ~5 hours. My projections of the time which will really be taken is 54 hours (the 7 hour estimate one) and 48 hours.
This seems awfully slow for an SSE client, and both rigs (a dual P3 Coppermine and a dual P4 Prestonia Xeon) run SSE and SSE/SSE2.
Where do I look to see if these rigs are running the CPU's SSE instructions?
At this time I have not completed any S5R4 WUs, so if the WU details show this I will have to wait for the result to hit the servers.
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
RE: Where do I look to see
)
As your hosts are Windows hosts you can check the application name actually running in Task Manager.
Ignore the einstein_S5R4_6.04_windows_intelx86.exe task--that is the dispatcher which chooses which real ap to run. Look for:
einstein_S5R4_6.04_windows_intelx86_1.exe or
einstein_S5R4_6.04_windows_intelx86_0.exe
If you are running _1 it has decided you can use the more advanced instructions and is doing so.
RE: The estimates for each
)
Yes, for whatever reason, the initial estimates for time are way low (around 10 to 20% of actual for me). There are explanations floating about, but I've not been able to get my brain around them.
The "low ball" estimates can cause your machine to flood with more work than you can reasonably accomplish ... until the machine develops some history with the version and gets the estimated time better. I ended up manually deleting a number of WU's that would clearly not finish on time. (If you know your way around the option files, I understand that you can speed up that process of time estimate stabilization... but don't know the details.)
If you are moving from an S5R3 Power app like version 4.36 you may find that the actual times increase by about 80% (from 5 Hrs to about 9 Hrs for most of my machines). Credit claims are a bit (maybe a lot?) flakey, but these get sorted out at the server.
Stan
RE: If you know your way
)
Actually it is lots easier than that. The process adapts only gradually when the new result gets done too quickly, however when the new result takes much longer than expected, the working value gets bumped all the way up right away.
So... if you are a long queue person, and you see some S5R4 joining your queue with remaining S5R3 coming before, and S5R4 stuff is getting over-fetched, you can just pause all the S5R3 work in your queue and let one S5R4 result finish. Immediately the estimate on your computer, which governs amount of fetching, will get bumped up. The copy on the web site won't follow until the result is reported, but that does not affect work fetch.
If you have a really big queue, you might wish to repeat this process after half a dozen or so subsequent S5R3 results (or you may just prefer to dial down your queue length by a factor of six until the dust clouds clear).
RE: If you have a really
)
Yes! :)
I did "close the barn door" (cut back the queue length), but only after "many of the horses had escaped!" (At 64 WU downloaded/day on several quad machines, a goodly pile of overcommittment quickly accumulated! Situation was aggravated since I cut the new app_info in shortly before 2400UTC!)
I brought S5R4 exe's into the App_info file while there were still a few days worth of S5R3's waiting to be worked. (Once I had a "clean" App_info installed and working properly, I wanted to do the rest in "one swell foop.") Keeping all machines "humping" 24X7 was my priority ... I didn't want one to run dry (or even "light") while I was buried in my pillow.
I considered scrapping the whole cache on each machine, but instead elected to spend the hours necessary to estimate the items that could not be done in time and manually abort each of them.)
A bit of warning from project organizers before conversion would have avoided a bunch of confusion (and a bunch of work) for me. OTOH, maybe it caught them by surprise, too. I was not happy to hear (after the fact) criticism of longer cache lengths, though.
Regardless, I survived to crunch "again another day" so all is well.
Stan
Thanks for the
)
Thanks for the advice.
Looks like my estimates were not too far out.
My dual P3 (Win2K Pro) is just coming up to complete it's first 2 S5R4 WUs at 51-52 hours. Fortunately I set the local machine queue to1 day. But despite this I have over 8 days in the cache (based on the 50+ hour WU x 2).
I cannot see in Task Manager whether these are using SSE as the complete file descriptor is not given. It stops at einstein_S5R4_6 - so I cannot use the data Archae86 gave.
The other rig (Win XP Pro) does show I am using SSE, and the crunch times (4 at a time) are about 40 hours. In this case my 1 day cache, for the local machine, resulted in me still having about 7 days worth to crunch.
At least all of them are within the given deadline for results return.
Once I am well established crunching the S5R4s then my RAC will drop to below 700, I guess?
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!