Would it be possible to divide computers/users into few groups according to computer speed or average time results are usually ready, and assign the same data packages within its own group?
This way some of the fastest computers wouldnt have to wait for some 300MHz pentium to verify the work.
Of course this would mean that the slowest machines would have to wait long for the credit, but my philosophy for that is basically "if they waited days for the packet to be calculated, they can as well wait days for the credit ;)"
Faster "feedback" can't be a bad thing (if it isnt a nightmare to implement)
Copyright © 2024 Einstein@Home. All rights reserved.
Possible way to lower pending credit times
)
> Would it be possible to divide computers/users into few groups according to
> computer speed or average time results are usually ready, and assign the same
> data packages within its own group?
>
> This way some of the fastest computers wouldnt have to wait for some 300MHz
> pentium to verify the work.
>
> Of course this would mean that the slowest machines would have to wait long
> for the credit, but my philosophy for that is basically "if they waited days
> for the packet to be calculated, they can as well wait days for the credit
> ;)"
>
> Faster "feedback" can't be a bad thing (if it isnt a nightmare to implement)
>
>
What's is difficult with your suggestion is that you also have to take into account the logon time of the machine. Sometimes a PIII 800 Mghz can crunch as many WU as a brand new P4 which is not running 24/24....
Some work in this direction
)
Some work in this direction is being done. The average turnaround time for WUs is not being tracked (incorrectly in some cases, not certain if this has been fixed). Replacements will be preferentially downloaded to machines with a low average turnaround time in some future version of the server.
BOINC WIKI
Then you request work you
)
Then you request work you will get WU’s from a data set that you already have on your computer if there is WU’s left for that particular data set. If not you will get WU’s from a new randomly selected data set. Every data set contains 150 WU’s that are each sent out to 4 different hosts.
IMHO what should be done is to divide hosts in to different groups. Each data set would then only be downloaded by hosts in one particular group. So then a host requests new work and there is no more work available for the data sets that the hosts have, the server will evaluate (or reevaluate) the host and decides witch group it belongs to. A new data set is then downloaded according to what group the host now belongs to.
So the big question is, on what grounds hosts should be divided. If the project shall divide hosts in to groups, then it has to be beneficial to the project to do so.
There are also several sometimes-conflicting factors to take into account then thinking about this:
The scientists want results: The whole point in doing this is to get results to the scientists. Now how long are they willing to wait for that?
Data sets are big: Because the data sets are so big it is a preference that the hosts have to download new ones as seldom as possible.
The size of the database: To keep the database as small as possible it is a preference to keep the time from then a WU’ is sent out to the first host, till the WU is validated and all sent out copies of that particular WU accounted for, as short as possible. This, so it can be removed from the database as fast as possible and there by reduce the size of the database. A side effect of this is that users gets credit as fast as possible. Note that its the particular WU not the whole data set that should be done in the shortest time possible.
Each data set will have a set number of hosts download that particular data set, depending on what group is downloading, so the work flows in the ideal speed for that group. The number of hosts in each group must be big enough to keep the work flowing and the ideal number of hosts for the data set reached within a reasonable time. If the ideal number of hosts for a data set from a group is 30, then host 1-30 to request a new data set get data set 1, 31-60 get data set 2, 61-90 get data set 3 and so on.
So what are the dividing factors and special groups to consider? Add more so we can discuss them.
New hosts: The information on new hosts in the database is limited so to have them crunch there own data sets could be a good idea. The number of hosts per data set should be high so that the data set would be finished quickly and the hosts moved to more appropriate groups quick.
Modem or no modem: Since the big size of the data sets is especially a problem for modem users, it could be appropriate to let them have there own data sets and allocate a small number of hosts to each data set so it will last for 1 to 2 months if it is ok by the scientists. Since many people have access to a faster internet connection at school or work, it would be a good idea to have an easy way to download a new data set, put it on an USB memory, mp3 player or a cdrw and move it to the computer at home.
Speed: There are many ways to measure speed, but does it mater how fast a computer is if only a fraction of that speed is used for the project? From my point of view there are two speed measures that matter, average turnaround time and throughput of WU’s.
If we had 6 hosts crunching 4 WU’s/day then they would crunch through a data set in 25 days.
If we had 8 hosts crunching 3 WU’s/day then they would crunch through a data set in 25 days.
If we had 12 hosts crunching 2 WU’s/day then they would crunch through a data set in 25 days.
If we had 24 hosts crunching 1 WU’s/day then they would crunch through a data set in 25 days.
If we had 168 hosts crunching 1/7 WU’s/day then they would crunch through a data set in 25 days.
If we would group them like this then we would minimize the number of data sets each host wold have to download. If we also would group them by average turnaround time then the size of the database would be minimized to.
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.