There used to be a time when I could face challenges like this in my sleep. Unfortunately, I find I no longer have the interest, or perhaps, the incentive, to deal with the issues I am having.
I've researched as best I can, and at this point BOINC has become the instrument of destruction of at least one machine and 3 disk drives. Or maybe it's just μSoft's windoze. Despite trying to tweak BOINC's parameters, my machines continue to have disk drives over-heating. On the upside, I went from a 500 GB hard drive to a TB drive. On the downside, I can't afford to do this every 6-8 months. (POGS and EINSTEIN)@Home appear to be the worst offenders, with the former having the highest disregard for CPU power constraints.
In all fairness, the first disk drive I had to replace on my new (at the time) dual processor 2.8 GHz HP machine died from multiple causes. Post-mortem autopsy revealed a major thrombosis in the ventilation system. Cigarette smoke and an abundance of cat hair had all but defeated the 520-1020's ability to circulate air. After a 10 hour operation, a disk transplant, and a system resurrection courtesy of the Church of the Holy Restoration (with current backup credentials), the machine's ability to perform was restored. Furthermore, acute respiratory distress in this author's own body led to a 4 day hospital stay at a local VA hospital, and the successful cessation of a more-than 40 year habit of smoking. It has now been 2 years and 5 months since my last cigarette or nicotine-bearing product use of any kind. Also have been off the Skittles maintenance program for going on 8 months...
The two cats are still in good health, probably better health, for the nicotine removal from the environment.
But I digress.
This machine is now beginning to suffer again, both from an overheating problem not related to ventilation problems, and being narrowed down to the consecutive use of BOINC while watching epic movies such as The Hobbit - I, II, & III, and The Lord of the Rings trilogy, in marathon viewing sessions. I'm currently in my second go-through of the latter, having watched first in the production order, and then, realizing that The Hobbit came first, decided that LotR needed a second go-round.
Again, I digress...
One final complaint that leads to my decision to stop BOINCking: As my machine approaches critical heat, the physical framework of this HP All-in-1 machine starts to "phase-out". By this, I mean that the parts no longer fit together as well as they used to, and this terribly annoying 60 Hz hum begins to compete with the loudest florescent lighting within earshot. In all but the loudest of battle scenes (Battle of the 5 Armies, or the Overthrow of Mordor battle in Return of the King), the buzzing all but forces commitment to the flight deck at the VA.
I also recognize this last issue as a harbinger -- a death throe, if you will -- of yet another Tbyte disk drive. And yet the hum goes away when I put BOINC to sleep...
So unless someone can tell me how to make BOINC behave itself (or rather, how the programs within it can be made to obey BOINC (the great and powerful), or where there is a good free or very inexpensive MORPG (preferably based on Tolkien) site, I must away and bid y'all a fond farewell...
Yes, that was a shameless begging. Come to think of it, this whole monologue is a shameless display of talentless humor (but again, I digress).
I may not participate much socially with SETI.USA, but I have always felt a kinship and sense of belonging, as well as pleasure knowing that even I might contribute to the scientific disciplines...
So, any ideas on how to keep things from going to the 100% CPU usage with POGS or EINSTEIN??
-_=G.g
Copyright © 2024 Einstein@Home. All rights reserved.
With regrets, I believe I must resign
)
I'm probably not the best person to get a response from, PsiberMan.
I'm sorry :\ And to hear you might be leaving :(
Hope things turn out so that you're happier staying. :)
Please wait here. Further instructions could pile up at any time. Thank you.
I could suggest that with
)
I could suggest that with only a Pentium G620 2.60GHz 2 core CPU, and just 4Gb Ram that you stop trying to play heavyweight games or movies, and crunch at the same time. It's one or the other or get a more powerful machine.
But I digress, Boinc does not in itself destroy disc drives it is what the users do with them that does that.
But I digress again, use Boinc manager, options, computing preferences, and set CPU use to 50% of the CPU's. Essential with only a two core CPU.
Waiting for Godot & salvation :-)
Why do doctors have to practice?
You'd think they'd have got it right by now
I have used a Pentium PII
)
I have used a Pentium PII Deschutes at 400 MHz to run SETI@home, Einstein@home and BBC Experiment from 2004 to 2008, when I bought a SUN workstation which is still running Linux BOINC alongside two HP PCs, one Windows and one Linux. But I never had overheating problems, perhaps because I have no cats. Th Windows 10 PC is the only one to have a graphic board, a Geforce GTX 750 TI OC, working happily on SETI and Einstein. The PC UPS has only 300 W, with no 6 pin connector, but so far has given no problem.
Tullio
Tullio, I think you mean a
)
Tullio, I think you mean a PSU not a UPS :-) It also depends which kind of 750Ti OC you have. The slimline low profile 3/4 length version (I have 4) does not need a six pin power plug. The other full length cards with two prongs do need one. in all of them the recommended PSU is a minimum of 400W. I run mine on an after market Antec 430W.
I would suggest that your PSU is probably working flat out, thus reducing its MTBF. But if it works for you then fair enough.
Waiting for Godot & salvation :-)
Why do doctors have to practice?
You'd think they'd have got it right by now
Thanks Chris, you are right.
)
Thanks Chris, you are right. I have also an UPS which protects the SUN workstation, my main Linux box, and its monitor. I have two 500 W PSU on my shelves, just in case. But I loath these minitowers PCs and their cramped space. I used to maintain Onyx UNIX boxes designed by Scott McNealy and they were far easier to open and work inside. Ciao.
Tullio
Minitowers became popular for
)
Minitowers became popular for two reasons. First wave PC's were literally desktop machines with the monitor perched on top. Then as technology moved on, they could get the same computing power in an under the desk small tower for more desk space.
But of course the mini towers are hard to upgrade, sometimes only 2 memory slots and not enough space for a double width, 2 slot, full length aftermarket graphics card. But it's horses for courses. they were designed for general 9-5 office use and specced accordingly. Most as you say come with 300W PSU's which are fine for that use. They weren't designed for 24/7 crunching stressing all the components.
But we digress from the OP's thread. I have advised him that he is simply trying to do too much with what he has, and it isn't the fault of Boinc or Einstein, but there we are. And actually his question would be better in number crunching.
Waiting for Godot & salvation :-)
Why do doctors have to practice?
You'd think they'd have got it right by now
RE: There used to be a time
)
Please don't be troubled by any thoughts of 'letting the project down'. I'm sure your contribution is appreciated but you must do what is best for you. Congratulations on your decision to give up smoking and for having the ability to stick to your decision. I hope that the bad effects continue to diminish over time.
More likely, it's untamed heat combined with disk drives that aren't as robust as they used to be. You mention 6-8 months. Surely a drive failing that quickly would be replaced under warranty? I haven't needed to buy a new drive for at least 8-10 years so I'm unfamiliar with current day warranties. Can you not get drives that have 3yr or 5yr warranties these days? Sure, you pay a bit more but isn't it worth it?
I'm not quite sure how a project has "disregard for CPU power constraints". As has already been mentioned, you can limit BOINC to a subset of the CPU cores. That would certainly reduce power and heat. On the other hand, the project apps just do as they are told - start running now or stop running now.
The projects you mention need extensive calculations to be done which causes the CPU to ramp up to 100% load. The CPU will draw whatever power it needs to do this. Running at 100% load doesn't damage anything. It's not removing the heat that does the damage. There are lots of potential ways to improve the removal of heat and some are quite simple to implement. If you would care to describe/provide a link to the type of case you are using and what cooling fans are included, it should be possible to improve the cooling.
With regard to disk drives, the majority of heat comes from spinning the platters at high speed and not from actually reading and writing to them. I can't really speak for other projects but I know from observation that Einstein apps don't interact with the disk all that often, according to the disk activity leds. I very much doubt that a project using the disk occasionally will have any significant effect on the life of the disk.
On top of the base heat load from spinning, a much bigger potential problem is the buildup of internal heat from poor case ventilation. I would guess that this is probably the most significant cause in your situation.
A third aspect, from anecdotal evidence only, is that modern high capacity disks fail prematurely much more frequently, even when just running 'office' type loads rather than BOINC loads. For example, I run my fleet in an industrial warehouse without air-con where the ambient is in the range 28C to 36C (winter/summer). Next door to me is an office with air-con and a year round average ambient of 23C-24C and running office type apps - basically low load to idle a lot of the time. The machines there (about 10-15) are quite recent and there have been 3 HDD failures during normal use in the last 6-12 months (disks around 1-4 years old). Over the same period, I've had none in normal use and one or two that got trashed during mains power failures/power spikes. I have a lot more disks in use and these first saw service between 2000-2004. These old disks are a lot heavier than modern ones. I'm guessing they have better bearings/lubrication and more robust internal mechanics. They run quite hot 24/7 (because of the high ambient) but rarely fail.
Can you specify more clearly what you mean by this. You seem to imply that progressively, your machine is running hotter and hotter over time. If so, it is likely to be a 'heat disposal' problem. There are a few things to consider apart from just blocked ventilation. Are any fans running slower than they should be? Do you regularly clean the fins of heat sinks? Have you tried replacing the thermal interface material between CPU and heat sink? As an experiment, can you open the case and direct a room fan at the internals? If temperatures decrease dramatically, you have a ventilation problem.
One point you should consider is this. If the machine was at one point operating satisfactorily, but now is not, then what has changed? It won't be the project apps or a combination of them with other things you use the machine for. A CPU can only run at full load and other uses may lower the overall load by preventing project apps from using all the CPU cycles. If the machine is running hotter now than it did previously and BOINC was always running, the problem isn't just BOINC.
I don't know what sort of BIOS options are available to you - probably not much since it's a 'big name' manufacturer. If you have any ability to change operating frequency and voltages, you could achieve a cooler running environment by lowering the CPU multiplier and/or the base clock frequency, and possibly the core voltage. Heat production increases quite considerably with both frequency and voltage. If you can run at a lower frequency you may be able to have a lower Vcore. I build all my own machines so I have no real knowledge of what (if any) BIOS options there are to tweak in pre-built machines.
What do you call "critical heat", how do you measure it and what parts don't "fit together"?
Are you really sure the noise is from a disk drive? Could it be fan noise from a fan that is running out of lubrication? You should have your case open and really check closely exactly what is humming/vibrating and it could easily be a fan. Increasing temperature (and noise) over time points to a fan problem. You have fans in the PSU, on the CPU and probably elsewhere in the case. You should closely check them all. With your hand (and with care) you can usually sense where the heat is being generated. Is the PSU case unusually hot? Is the CPU heat sink very hot? Is the CPU heat sink cool but the CPU temperature is hot? That latter situation points to the thermal interface material. It can 'dry out' over time. Are all the legs that hold the CPU heat sink firmly in place nice and tight? If one is a bit loose you may have a hot CPU.
Disks don't slow down at all if you stop BOINC. If the noise is coming from a disk you should still hear it. Fans may slow down if they have thermal speed control. The symptoms you describe seem to indicate a noisy fan at high speed when the heat is really on.
Cheers,
Gary.
Just one observation on the
)
Just one observation on the heating question: my SUN WS front panel is like a car radiator, allowing a good airflow with the back panel fan extracting air. It has been working 24/7 since January 2008, with disk problems only deriving from software installation of Linux OS versions and, more frequently, from SuSE Linux updates, which frequently end in trouble.
My 2014 HP PC running Windows 10, having also a GTX 750 TI graphic board, has only a side panel opening for ventilation. Its main disk, a 2 TB Seagate, has already failed.
Tullio
The only hard disk failure
)
The only hard disk failure I've had since starting BOINC, was on my DVR PC which also throws large multiple GB video files around all day. Other than that, CPU heat issues do arise--and on the systems where that's a problem, I use the third-party utility Tthrottle. It will scale BOINC computation on the fly to maintain a user-specified temperature.
Click Here to see My Detailed BOINC Stats
Well, the OP has not
)
Well, the OP has not responded back to suggestions of help, and his account shows zero tasks and the machine hasn't logged in since 2nd May. So perhaps he just wanted to chat for a bit.
Waiting for Godot & salvation :-)
Why do doctors have to practice?
You'd think they'd have got it right by now