Thank you, that sure sheds a little light on the issue. If that work could be saved, it would make the whole project more efficient.
It would only make ATLAS's contribution to it more efficient. The fact that there are pending credits in itself doesn't cause inefficiency or affect the speed of the project in any way.
But still saving that time (possibly a few hours per timeslice) would be really nice :)
I looked at some of archae86's hosts and did see some "anonymous" hosts of Opteron 1212, Opteron 275, and Xeon X5355 flavors all running Linux, which are likely cluster nodes. They might not be, but it's a reasoned guess... :-)
Anyway, a few of those had timed out, but others showed "client detached". For those that were detached, like he had mentioned, there was sometimes a significant delay between the detach and the issuance of another replication.
I still think there are multiple things that can be done here. The discussion about sending aborts and having the nodes report in that they've been aborted is a good one. I also believe that getting the 6.06 app out to the Windows user base will also help. The faster the data is crunched, the more quickly hosts can become available for a new data set.
Once the 6.06 app is out and becomes the stock application, I think a reduction in the deadline back to 14 days is also appropriate and will also help reduce the level of pending credit...
I don't know if setting the deadline tighter is such a good idea. Given my fleet, that's the last thing I want to see! ;-)
All that does is reduce the host population which is effectively running the project. It either cuts some hosts off from participation completely, or in the case of hosts which run a lot of projects makes their contribution more spotty due to accumulating debt issues when EAH comes up for a run slot. IOW's the host grabs a task, which 'hogs' the machine to beat the tight deadline given its situation, then disappears from the project for an extended period to pay back the debt to the other projects.
IMHO, if there is no time pressure to get the work back quickly (like MW for example) then the deadline should be set as loose as possible in keeping with the science goals of the project and the ability of the backend to track all the outstanding work without grinding to a halt.
I've said it many times before, who cares how long a task stays pending as long as you get credit for it ultimately? Over the course of weeks, months, years, and/or decades, it makes virtually zero difference to any of the metrics you can measure (including RAC) for a host. :-)
I just cleared a bunch of pendings which had been sitting around for a month or so. So my RAC was dropping some while I was waiting for wingmen to catch up. Now it's back where it usually is, and the world didn't stop turning in the mean time! :-D
I don't know if setting the deadline tighter is such a good idea. Given my fleet, that's the last thing I want to see! ;-)
Is it possible that the fact that you have multiple slower hosts adds some bias to your opinion? That seems to be what you're stating...
Ever since the deadline was increased from 14 days to 18 days, the number of complaints about EDF dropped off significantly, perhaps even nearly eliminated.
One of the conditions I stated a long time ago when I asked for the deadline increase was that when the Windows app had SSE, then I felt it could be brought back down to 14 days. I didn't bring the issue up in S5R3 and early on in S5R4 because there was still a significant difference between the Windows and Linux apps. That difference has been greatly reduced now, so I think now is the appropriate time to consider bringing it back to where it was.
To further explain why, I believe that at the time I requested the increase my AMD system was taking 12-15 hours per task. I'm now down to 8-9 hours per task.
Bernd said that deadlines can easily be changed during the run. If changing it back to 14 causes a problem, it can be brought back up.
While we're suggesting multiple classes of host, why not do a fast/slow host split as well? The fast host could be set with a 1 week deadline, and the slow host could have something like 3 or 4 weeks so that people still crunching their p2-400's or with EDFobia and 50 projects running would be able to complete work normally. Default's could be done automatically via initial benchmarks and reported runtime/duty cycles with user overrides available of the obsessive compulsive.
While we're suggesting multiple classes of host, why not do a fast/slow host split as well? The fast host could be set with a 1 week deadline, and the slow host could have something like 3 or 4 weeks so that people still crunching their p2-400's or with EDFobia and 50 projects running would be able to complete work normally. Default's could be done automatically via initial benchmarks and reported runtime/duty cycles with user overrides available of the obsessive compulsive.
While a good thought, it sets up a scenario where someone will complain because their system got classed either way, they feel it should be the other, and they got short-changed because they missed a deadline and the 3rd host reported in before they got their replication back...
It is really rare that there are "Einstein is hogging my CPU" posts now, so 14 days should be enough once the switching app becomes the stock app, just like how 18 days is enough now, particularly in light of the sheer number of P4 machines connected to this project that will benefit from SSE2...
IMHO, if there is no time pressure to get the work back quickly (like MW for example) then the deadline should be set as loose as possible in keeping with the science goals of the project and the ability of the backend to track all the outstanding work without grinding to a halt.
I've said it many times before, who cares how long a task stays pending as long as you get credit for it ultimately? Over the course of weeks, months, years, and/or decades, it makes virtually zero difference to any of the metrics you can measure (including RAC) for a host. :-)
^
Even though I don't have hosts that even get near the deadline (usually they're < 1 day average turnaround time), I completely agree with you.
I don't really care about pending credits (or credits anywa), however, the very existence and length of this thread indicates that many people DO care and do not feel comfortable with a large backlog of pending credits (for whatever reasons).
I don't really care about pending credits (or credits anywa), however, the very existence and length of this thread indicates that many people DO care and do not feel comfortable with a large backlog of pending credits (for whatever reasons).
Yep. It's a balancing act between the complaints about "hogging the CPU" (defined as EDFobia) and about not getting that gratification of seeing credits go up in a "timely fashion". Deadlines should be responsible, not too long, not too short. I don't think that going back to 14 days along with having most everyone using SSE2 is going to be detrimental. It will take 4 days off of the wait time for an absent host. It might increase the pressure on hosts with larger caches, but the performance has grown by at least 30% over the course of S5R4, while 30% of 18 is 5.4. In other words, reduction of the deadline to 14 days is less than the amount of improvement, so it is still erring on the side of favoring the slower / less-allocated hosts.
I don't really care about pending credits (or credits anywa), however, the very existence and length of this thread indicates that many people DO care and do not feel comfortable with a large backlog of pending credits (for whatever reasons).
Yep. It's a balancing act between the complaints about "hogging the CPU" (defined as EDFobia) and about not getting that gratification of seeing credits go up in a "timely fashion". Deadlines should be responsible, not too long, not too short. I don't think that going back to 14 days along with having most everyone using SSE2 is going to be detrimental. It will take 4 days off of the wait time for an absent host. It might increase the pressure on hosts with larger caches, but the performance has grown by at least 30% over the course of S5R4, while 30% of 18 is 5.4. In other words, reduction of the deadline to 14 days is less than the amount of improvement, so it is still erring on the side of favoring the slower / less-allocated hosts.
BUT if the Project has a lot of older pc's crunching for it, your idea could eliminate some of them. That is also a balancing act for the Project. Make it a 1 day deadline and only the best of the best pc's and those connected to no other projects could crunch here. That would eliminate most of the people here. As we go thru the years more and more of the older pc's will have to be dropped off as unable to keep up, but that point will always be contentious. Some people really do believe in the Projects idea and contribute because of that. Others just crunch because they can.
One thing the Project could do, may be a Boinc thing though, is pair pc's up a little better. Pair up a pc that returns units in less than say 7 days with another that is doing the same thing, then those that take more than 7 days with others doing the same. As I said this may be a Boinc thing, not a Project thing. That would help solve the problem of those with fast pc's complaining and those with slower pc's would already understand. A little education on the Projects part would go a long way to getting people used to the idea.
RE: Thank you, that sure
)
It would only make ATLAS's contribution to it more efficient. The fact that there are pending credits in itself doesn't cause inefficiency or affect the speed of the project in any way.
But still saving that time (possibly a few hours per timeslice) would be really nice :)
I looked at some of
)
I looked at some of archae86's hosts and did see some "anonymous" hosts of Opteron 1212, Opteron 275, and Xeon X5355 flavors all running Linux, which are likely cluster nodes. They might not be, but it's a reasoned guess... :-)
Anyway, a few of those had timed out, but others showed "client detached". For those that were detached, like he had mentioned, there was sometimes a significant delay between the detach and the issuance of another replication.
I still think there are multiple things that can be done here. The discussion about sending aborts and having the nodes report in that they've been aborted is a good one. I also believe that getting the 6.06 app out to the Windows user base will also help. The faster the data is crunched, the more quickly hosts can become available for a new data set.
Once the 6.06 app is out and becomes the stock application, I think a reduction in the deadline back to 14 days is also appropriate and will also help reduce the level of pending credit...
As always, IMO, YMMV, etc, etc, etc...
I don't know if setting the
)
I don't know if setting the deadline tighter is such a good idea. Given my fleet, that's the last thing I want to see! ;-)
All that does is reduce the host population which is effectively running the project. It either cuts some hosts off from participation completely, or in the case of hosts which run a lot of projects makes their contribution more spotty due to accumulating debt issues when EAH comes up for a run slot. IOW's the host grabs a task, which 'hogs' the machine to beat the tight deadline given its situation, then disappears from the project for an extended period to pay back the debt to the other projects.
IMHO, if there is no time pressure to get the work back quickly (like MW for example) then the deadline should be set as loose as possible in keeping with the science goals of the project and the ability of the backend to track all the outstanding work without grinding to a halt.
I've said it many times before, who cares how long a task stays pending as long as you get credit for it ultimately? Over the course of weeks, months, years, and/or decades, it makes virtually zero difference to any of the metrics you can measure (including RAC) for a host. :-)
I just cleared a bunch of pendings which had been sitting around for a month or so. So my RAC was dropping some while I was waiting for wingmen to catch up. Now it's back where it usually is, and the world didn't stop turning in the mean time! :-D
Alinator
RE: I don't know if setting
)
Is it possible that the fact that you have multiple slower hosts adds some bias to your opinion? That seems to be what you're stating...
Ever since the deadline was increased from 14 days to 18 days, the number of complaints about EDF dropped off significantly, perhaps even nearly eliminated.
One of the conditions I stated a long time ago when I asked for the deadline increase was that when the Windows app had SSE, then I felt it could be brought back down to 14 days. I didn't bring the issue up in S5R3 and early on in S5R4 because there was still a significant difference between the Windows and Linux apps. That difference has been greatly reduced now, so I think now is the appropriate time to consider bringing it back to where it was.
To further explain why, I believe that at the time I requested the increase my AMD system was taking 12-15 hours per task. I'm now down to 8-9 hours per task.
Bernd said that deadlines can easily be changed during the run. If changing it back to 14 causes a problem, it can be brought back up.
While we're suggesting
)
While we're suggesting multiple classes of host, why not do a fast/slow host split as well? The fast host could be set with a 1 week deadline, and the slow host could have something like 3 or 4 weeks so that people still crunching their p2-400's or with EDFobia and 50 projects running would be able to complete work normally. Default's could be done automatically via initial benchmarks and reported runtime/duty cycles with user overrides available of the obsessive compulsive.
RE: While we're suggesting
)
While a good thought, it sets up a scenario where someone will complain because their system got classed either way, they feel it should be the other, and they got short-changed because they missed a deadline and the 3rd host reported in before they got their replication back...
It is really rare that there are "Einstein is hogging my CPU" posts now, so 14 days should be enough once the switching app becomes the stock app, just like how 18 days is enough now, particularly in light of the sheer number of P4 machines connected to this project that will benefit from SSE2...
RE: IMHO, if there is no
)
^
Even though I don't have hosts that even get near the deadline (usually they're < 1 day average turnaround time), I completely agree with you.
I don't really care about
)
I don't really care about pending credits (or credits anywa), however, the very existence and length of this thread indicates that many people DO care and do not feel comfortable with a large backlog of pending credits (for whatever reasons).
CU
Bikeman
RE: I don't really care
)
Yep. It's a balancing act between the complaints about "hogging the CPU" (defined as EDFobia) and about not getting that gratification of seeing credits go up in a "timely fashion". Deadlines should be responsible, not too long, not too short. I don't think that going back to 14 days along with having most everyone using SSE2 is going to be detrimental. It will take 4 days off of the wait time for an absent host. It might increase the pressure on hosts with larger caches, but the performance has grown by at least 30% over the course of S5R4, while 30% of 18 is 5.4. In other words, reduction of the deadline to 14 days is less than the amount of improvement, so it is still erring on the side of favoring the slower / less-allocated hosts.
RE: RE: I don't really
)
BUT if the Project has a lot of older pc's crunching for it, your idea could eliminate some of them. That is also a balancing act for the Project. Make it a 1 day deadline and only the best of the best pc's and those connected to no other projects could crunch here. That would eliminate most of the people here. As we go thru the years more and more of the older pc's will have to be dropped off as unable to keep up, but that point will always be contentious. Some people really do believe in the Projects idea and contribute because of that. Others just crunch because they can.
One thing the Project could do, may be a Boinc thing though, is pair pc's up a little better. Pair up a pc that returns units in less than say 7 days with another that is doing the same thing, then those that take more than 7 days with others doing the same. As I said this may be a Boinc thing, not a Project thing. That would help solve the problem of those with fast pc's complaining and those with slower pc's would already understand. A little education on the Projects part would go a long way to getting people used to the idea.