Progress stops

alanb1951
alanb1951
Joined: 28 Nov 16
Posts: 23
Credit: 741674433
RAC: 252245

PappyGus wrote:Matt White

PappyGus wrote:
Matt White wrote:
This allows access to processor scheduling and virtual memory. Processor scheduling is normally set to programs. You may change this setting to Background services.

 

I went ahead and changed that setting. Since then, including going to 100% CPU time and 75% CPU usage, E@H hasn't had any issues. SETI, on the other hand, is having issues (see my last post).

 

Regarding the SETI tasks - there's a known issue with the nvidia 436 series drivers and [some/all?] Arecibo tasks with a high angle range.  If you've had some other tasks complete successfully, that could be your problem (the task you cited has an angle range of 1.377536, which is quite high!)

 

I suggest you pop over to the SETI@Home Number Crunching forum and have a look at the thread whole serie of data blocks failing with SoG where this is discussed and there is some advice about backing out to older drivers.

Good luck...

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

PappyGus wrote:Matt White

PappyGus wrote:
Matt White wrote:
This allows access to processor scheduling and virtual memory. Processor scheduling is normally set to programs. You may change this setting to Background services.

I went ahead and changed that setting. Since then, including going to 100% CPU time and 75% CPU usage, E@H hasn't had any issues. SETI, on the other hand, is having issues (see my last post).

It is likely that settings which work for one project/application may not work well for another. This can be true within the same project. Some apps don't play nice together.

I would consider reducing the CPU usage to 50% and see if that mitigates the issue with the SETI jobs. Doing so will limit the number of CPU tasks to the number of physical cores (as well as reducing power consumption), while allowing overhead for the GPU. If that works, I would gradually advance the usage until you start seeing issues, then drop back to the previous setting. You may find the sweet spot is different for each application running. I know I mentioned this in my second comment, but the issue seems to be similar to behavior I saw with my server.

BOINC pushes a box much harder than most other day to day applications, which is one reason I'm watchful about power consumption. Most power supplies have a maximum rating which is not always a sustained rating. A power supply being pushed to the limit, especially if it isn't designed for 100% duty cycle at rated power can inject weird issues into a box. Issues which may show up on one task but are absent in others.

At 60% usage, my server draws about 360-380 watts from the AC main, depending on what tasks it is running. The supply itself is rated for 460 watts. At dead stop, with just the OS idling, it draws about 150 watts. My UPS has an AC wattmeter built in so getting these figures is easy for me.

In my case, I attributed the weird behavior to power consumption. Your case might be different, but it might be worth investigating.

Clear skies,
Matt
Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3092215038
RAC: 808929

I have a water cooled system

I have a water cooled system and ran in to similar problems.  Temps reported by the water sensor are always low and never near what the CPU reports. The CPU reported temps are sampled and, depending on how they are sampled, may be misleading.  GPU-Z shows CPU temp and one can select "highest" to see what it got up to.  There could be a lot of throttling going on that is not obvious.  CPU-Z also show frequency and if you see it changing a lot then obvious throttleing.

I have open rack mining and when I touched the under side of the liquid cooled CPU sockets (pair of zeons) it was extremely hot, far above the reported temps.  I also observed a lot of fluctuation in frequency (the multiplier changes are more obvious) and also in instantaneous temps reported by tthrottle.  This was with about 20 WCG tasks running and a few GPU ones.  I used the windows %processor power feature to set the frequency down about 97% and that solved the temp problem.  I did not have to disable hyperthreading.  Linux also has a tool for Intel CPUs.  

PappyGus
PappyGus
Joined: 8 Jun 19
Posts: 6
Credit: 25527594
RAC: 0

Well, turns out the SETI

Well, turns out the SETI issues are due to the driver.  SETI started giving out alternate tasks, so that problem is gone. I've bee running 75% CPUs and 100% CPU time for a few days and not a single problem.  I don't know if the nVidia driver issue carried over some how, but since I haven't changed the driver, and I'm not having issues with E@H any longer, it looks like having had the setting at 100% of the CPUs and 50%-75% time was the issue for me. I'm staying with 100% time and 75% CPUs.

A big Thank You to Gary Roberts and Matt White. Also, a Thank You to ALANB1951 for the info on SETI's issue with the latest nVidia driver.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118425868500
RAC: 25861856

Good to hear that things seem

Good to hear that things seem to be working correctly now.

Thanks for reporting back and letting us know!

Cheers,
Gary.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

PappyGus wrote:Well, turns

PappyGus wrote:

Well, turns out the SETI issues are due to the driver.  SETI started giving out alternate tasks, so that problem is gone. I've bee running 75% CPUs and 100% CPU time for a few days and not a single problem.  I don't know if the nVidia driver issue carried over some how, but since I haven't changed the driver, and I'm not having issues with E@H any longer, it looks like having had the setting at 100% of the CPUs and 50%-75% time was the issue for me. I'm staying with 100% time and 75% CPUs.

A big Thank You to Gary Roberts and Matt White. Also, a Thank You to ALANB1951 for the info on SETI's issue with the latest nVidia driver.

Outstanding! Glad to hear you are up and running!

Clear skies,
Matt
Stephen M.
Stephen M.
Joined: 19 Sep 19
Posts: 1
Credit: 115415377
RAC: 0

I was having a similar

I was having a similar problem on an iMac (2017, 4.2 GHz Intel Core i7), and this discussion helped tremendously!  It was very helpful to me to learn that keeping the CPUs at 100% time and less utilization would reduce thermal cycling on the CPUs.  Obvious when I thought about it, I just hadn’t thought about it.  It’s good to know that by changing these settings I won’t be physically stressing the processors and boards so much, and can keep my power supply well within its limits.  Also, the “hung” tasks then began to show progress again.  Laughing

My computer's power supply has a peak power rating of 217 W, so maybe I’ll play with the utilization until I reach about 150W.  That “feels” like a good compromise.

Thanks again to all for the insightful comments!!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.