Bernd said: To make up for the faster Apps we increased the size of the workunits. The "long" ones will be roughly five times as long as the "long" ones from S4, the "short" ones will be roughly twice as long as their S4 counterparts.
I ask u reconsider and make smaller WUs again !
Problems of large WUs
1) Download time
On Dialup this takes a loong time to download
*Dialup are not stable links and time-out and drop-connection often
At each connection hicup a retry to download is made ... This happens to re-start d/l from begin
So, to download a 16 MB WU, I end up with a 64 MB
network traffic, (or more) cause the retry(s)
2) Crunch time.
Even using Akos optimzed app this is delaying
much more than 1 or 2 hours elapsed to crunch a single WU.
Well, at least on my Atlhon XP+ @ 1600 mhz.
As more the WU delays to crunch, greater a probability
that a power loss do occurs while crunching it
and ruin *all* work already done.
cause corrupted file system -or- damaged HD
Disk space need to keep 2 wus is big!
*cannot download only 1 WU at a time , cause "time to download"
so, is need to have 1 WU crunching and another ready to crunch,
to avoid CPU temperatures changes,
that can crack silicon, and renders whole pc innoperant
I suppose that not only I is connected via Dialup
and crunching with a CPU of less than 3000 MHZ clock speed
IMHO a greater % of crunchers ... more than half !
this way, I will be forced to Quit !
*However u may make also a user selectable WU size
on einstein preferences
to satisfy everyone whishes
Thanks
Copyright © 2024 Einstein@Home. All rights reserved.
Increased WU size -:(
)
No question that S5 makes EAH a harder project for us crunchers running "old timers, however: (comments inline)
Agreed, the large "raw" data file is not a joy to have to DL for dialup hosts, but keep in mind that you generally run more than one result off any given data pack so it's not like every DL needs to get a new one.
I agree the length of the crunch is far greater than before generally, but is unavoidable due to the higher sensitivity of the S5 run, and other backend factors not strictly related to sheer crunching concerns.
However your concerns over loss of computation time are not totally valid. A power failure or other system crash generally only results in loosing the calculation from the last checkpoint, not the whole thing.
Your point about a complete hard disk failure is well taken from a data loss and work loss POV, but that's true *every* time you run your machine. In my case for example, I don't relish the thought of one of my K6-2/500's loosing a drive 175 hours into a 225 hour S5 whopper, but that's the risk you take when you *choose* to participate. Fortunately, HD failures like that are infrequent enough, even with running 24/7/365, I'll take that chance. The other data I have onboard is much more important to me, that's why I do backups so my risk of loss is relatively low, including for EAH.
As I said before, the only time you carry more than one data file is when there are no more results to crunch from the existing one, and the project requests that BOINC delete the old one when it's through with it. This was true for S4 as well. FWIW, I haven't noticed a really significant increase in disk space used since S5 began.
Your point about reducing thermal cycling for computer components to enhance life expectancy is valid, that's one of the reasons I run DC as well. However manufacturers test their components over the full range from cold start to full load, and simulate that over their expected "normal" useful life (3 to 5 years).
Your other comments have merit, but the project team has to consider their requirements and resources as well as those of the volunteers.
In my case, EAH is now a "tight" deadline project for my slugs, but that isn't going to stop me from letting them plow through as many results as they can.
Regarding you suggestion about user selection on WU length, the problem with that is it leads to "cherry picking" by the more credit oriented participants and that leads to a whole set of problems in and of itself. One thing is clear, and that is there is a fair number of them running EAH, based on some of the posts I've read lately, particularly with the request to cease running Akos' prototype optimized apps.
Alinator
RE: On Dialup this takes a
)
hm, i think boinc supports suspend/resume downloads, right? i have 56k dial up too and i'm downloading einstein-wus in 2 parts (maybe 8mb today and the rest tomorrow).
mfg bara
Carlos, I really don't
)
Carlos,
I really don't understand any of your points.
I have previously d/l'ed using dial-up, including these large 16MB files. When the connection is lost, it does not have to start from the beginning.
What has the size of the WU got to do with losing power? If you corrupt the file system and/or HD, the completed WU will also be lost (most likely).
I assume your point about storage of just 2 WU's and cracking of silicon is surely just a joke.
FD.
RE: RE: On Dialup this
)
I think that's the case and they implimented that last year some time when they were having trouble with some of the mirror sites, but I can't swear to it.
Alinator
Hi, You seem to
)
Hi,
You seem to misunderstand the way the project works so perhaps I can explain a few things to help clear up the misconceptions you seem to have.
It is certainly true that, on a dialup link, it could take quite a while to download the 16MB large files. Under S4 the equivalent files were only about 4.5MB to about 7.5MB if I remember correctly. I didn't take that much notice of the precise size.
At a rough guess an S5 data file might last 5 times longer than an S4 one - the 5 times ratio mentioned by Bernd. So in the time you get to work on one 16MB large file for S5 you might have had to download 5 x 6MB S4 files. I strongly suspect that you will actually be downloading less in total under S5 than you were for S4.
As far as interrupted downloads are concerned, I was under the impression that BOINC could restart an interrupted download from the point of interruption without having to start at zero. I don't have dialup so I don't know for sure. However, on a broadband connection, I'm sure I've seen downloads restart at the point of interruption when a lost network connection is restored.
I would guess that an AMD XP1600+ could easily take 18-20 hours to crunch a long result. Let's call it 20 hours. Let's say it gave 176 credits. Imagine you could slice that long result into 10 short ones that each took 2 hours to crunch and returned 17.6 credits. How could this be a better outcome? At the end of a 20 hour period your have contributed exactly the same amount of science and received exactly the same number of credits. What benefit is there in calling it 10 results rather than 1 result? In a very short time they've all been removed from the online database anyway. Any thought that you are somehow contributing more to the project is completely an illusion.
A power loss simply does not ruin the work already done. As a result is being crunched, checkpoint files are written regularly. If a power loss occurs and then BOINC is restarted, crunching will resume from the saved checkpoint and virtually nothing is lost.
Yes, a power loss could cause disk corruption or physical damage. But can you explain how long results compared to short results would make the risk of disk corruption any different?
Do you realise that once you have downloaded a single large data file (16 MB) you can get literally 100's of results from that one download? Let's say you set a 5 day cache and that causes a download of 6 fresh results. Very little extra data is downloaded - you would hardly notice it, even on dialup. What is downloaded is a set of very short "instructions" on how to "slice off" those 6 extra results from your large data file. The physical data for those new results is already on your computer. So there is virtually no time loss in getting new results when needed.
Only when that large data file is completely "used up" will you need to download a fresh large data file. At that point the original one is deleted, freeing up the space it occupied. And yes it might take a while to get that new file. However if you have a cache of say 1 day you will have 24 hours warning of the need to download a new large data file.
Sure, there is a temperature change when a CPU goes from 100% load to idle or vice versa. I've never heard of that causing silicon to crack :). If it could then you had better not fire up your computer in the morning or shut it down at night because there is a bigger temperature change at those times :).
I don't understand how being on dialup with a less than 3.0GHz CPU would cause you to quit?? When you were crunching S4 at an accelerated rate weren't you putting your dialup link under much greater strain because you would have been downloading 4.5Mb to 7.5MB data files at a much higher rate than you are now??
I have a lot of experience with slower machines, seeing as I own quite a few in the range of 400 to 1000MHz, mainly PIIIs. They are all doing just fine on the S5 data. Sure a few of them take up to 60 hours to crunch a long S5 result and get their 176 credits. I've just looked at a Dell PIII-450 that finished its last S4 result on June 17 and returned its first S5 on June 19. It has been crunching for exactly three months without a single break and has a total credit of 12,100 and a RAC of 134. What's wrong with that? It has not crashed once and has had no downtime in the whole three months. Because the S5 optimisations are not as good as what Akos achieved with S4, its RAC will fall a bit in the future, but so what!! It's still doing the science!
Cheers,
Gary.
Large units yes, but when it
)
Large units yes, but when it is downloaded you get several WU's to run off the same file, these are small downloads.
All work with Akos optimzed apps has ended for now, all have to switch back to the stock app. Several other threads on that.
1 to 2 hours crunch time is not all that bad, when you get the larger ones they should take ~20 hours. Even with download problems that is a long time between downloads.
Just try and hang in there, you may get a better connection next time.
Try the Pizza@Home project, good crunching.
I have one machine on dial-up
)
I have one machine on dial-up and think that those responding seem to be missing a couple of points for such set-ups...
1) While all are correct regarding S5 downloads having a lot of 'bang-for-the-buck' in number of workunits to crunch, this is irrelevant if one cannot keep the line tied up for the needed time to complete the download. Though it is true that the download will restart at the point where the disconnection happened, this still may result in unecessary idle times for some machines depending on set-up, number of projects, etc. (i.e., a machine may be connected at such inconsistent frequency that all work is completed before the new download is finished--especially since upload files are only about 120-150k or so). Simply put, the longer workunits may make EAH a very inefficient project for some dial-up users.
2) There seems to be some confusion regarding the nature of 'risk' (a statisitcal term) as it pertains to long vs. short workunits for dial-up users. It is true that the risk of failure applied at any single point in time is probably identical across both types of workunits. However, the cumulative risk (which is the much more relevant issue regarding failure issues--see numerous publications on hazard modeling, failure-time models, etc.) is obviously greater for the longer workunits, in general, and may be moreso for dial-up users. Given the infrequent connections for some dial-up users and that such users are not generally connected to LAN backup procedures (usually resorting to some form of manual backup), the risk of loss is quite likely greater for all workunits, and may increase at a faster rate than for non-dial-up users as workunit length increases (e.g., an exponential vs. a linear increase in risk for the two different groups of users).
Agreed, but that's why you
)
Agreed, but that's why you need the human, "Central Scrutinizer" to weigh all the factors involved in deciding how best to allocate your resources.
There is no "one size fits all" solution.
Alinator
Solution for problem 1
)
Solution for problem 1 Scott:
Just set your cache high enough ...
Solution for problem 2:
there is no solution because there is no problem :)
if your disk fails, you have much more problems than a Workunit that got lost.
For example your really important data, buying an installing your new disk etc.
greets
RE: RE: Bernd said: To
)
First, the optimized applications aren't allowed now. You should stop using them if you haven't already.
I guess because I'm on DSL and have a good size hard drive, I don't see any real problems with download time or storage of work units. However, the crunch time is definitely a factor. I've gone from crunching 1 work unit in about 1 hour with the old optimized applications (maybe it was 2 hours with the standard application, I don't remember) to crunching 1 work unit in about 17 hours with the standard application. This is kind of ridiculous. I can't even return one unit a day now since I run SETI as well and it always has work available. OK, maybe the S5 application is faster than the S4 application, but when the 5x longer work units take around 10x longer to crunch, it seems out of proportion. I'd definitely like to see shorter work units, but somehow I doubt that's going to happen. :(