CPU starved Meerkat WU's

marmot
marmot
Joined: 28 Nov 14
Posts: 21
Credit: 27289040
RAC: 0
Topic 228917

My one machine was running SRBase LLR WU's on all CPU threads and 3x Meerkat on the RX 550 completing in about 15k sec for each WU.

Started running SiDock Curie long and now the Meerkat are starved for CPU and estimates are up to 2 days to complete.

 

I used a task manager, called Process Hacker, to investigate the wrapper, and internal threads, and this thread:

einsteinbinary_BRP7_0.12_windows_x86_64_BRP7-opencl-ati.exe+0x1500 

is set to priority IDLE.

From my many years of looking at these internal thread behaviors, this is not typical BOINC GPU wrapper priority settings.  The internal threads, for GPU WU's, are usually set to all NORMAL with one set to BELOW NORMAL or all set to NORMAL and the wrapper set to BELOW NORMAL.

I checked a CPU heavy Einstein WU (on this laptop) and all internal threads are set to NORMAL including the titular thread at x1500:

"hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe+0x1500, Normal"

while the wrapper is set to BELOW NORMAL:

"hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe, Below Normal"

 

So, I used Process Hacker to modify the einsteinbinary_BRP7_0.12_windows_x86_64_BRP7-opencl-ati.exe+0x1500 to a NORMAL priority and it's cranking away.

4 WU, at once, completed in 19400 sec (more efficient than 3 WU at once) and I'm testing 5, which will use all available GPU RAM, and they are heading towards an apparent 24000 sec runtime.

 

I *REALLY* do not want to manually adjust 4 or 5 new WU internal priorities every 5 to 7 hours (or every hour as their run times diverge) but I do want to keep running Meerkat.

 

Changing the internal thread priority to NORMAL would not have an adverse effect on BOINC users and most users should see a minimum 10% decrease in run times.  It's the priority that is usually set for these GPU app internal threads.  The wrapper should remain BELOW NORMAL.

The maximum CPU usage on the WU"s i adjusted the priorities is 0.5% of the 6 available CPU threads. 

 

Please, please, no one tell me to leave an entire CPU thread idle for these WU's.  
It's unnecessary if they have their process priorities correctly set and even when I did idle a CPU thread the WU with the IDLE internal thread was showing a 33% longer runtime than the one with the properly set internal priority.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48036393707
RAC: 35284032

You can set process priority

You can set process priority for the various task types in your cc_config.xml file. 
 

https://boinc.berkeley.edu/wiki/Client_configuration

_________________________________________________________________________

marmot
marmot
Joined: 28 Nov 14
Posts: 21
Credit: 27289040
RAC: 0

Ian&Steve C. wrote:You can

Ian&Steve C. wrote:

You can set process priority for the various task types in your cc_config.xml file. 
 

https://boinc.berkeley.edu/wiki/Client_configuration

 

This is not a permanent solution.  This is a beta app with version 0.12 so there was bound to be a problem found.

It's unreasonable to expect all users to make a cc_config.xml change to correct an issue that could effect any of our computers depending on some, unknown, CPU project WU load.

However, if I find some time later today, I'll try and test it.

Maybe the internal thread will inherit the higher priority but maybe not (it does not).

Currently it did NOT inherit the Below Normal priority that it's wrapper has.

Instead it's set to Idle.

 

EDIT 1:

<process_priority_special>N</process_priority_special>

is going to change the priority of all GPU wrapper and apps.

MilkyWay@Home is running on the other GPU and works fine.

Eh, I'll still test it and report back. 

 

EDIT 2:

Takes a client restart (reread config files has no effect) and the Meerkat Wrapper now comes into RAM set to Normal priority, however, the internal process is still sitting at Idle.  It does not inherit the parent wrapper priority.

This is something that will need dev hands to change.

 

Thanks, Ian, for the attempt.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48036393707
RAC: 35284032

I'm not sure I understand

I'm not sure I understand your comments about permanency. once you set this in the cc_config, the change is "permanent" until you unset it. it has nothing to do with an app being beta or any specific app version, and it's a global setting for all projects. Also as I understand it, BOINC is not capable of setting priority higher than Normal unless you run BOINC with elevated privileges.

the BRP7 app is also not beta. it may be new with a version less than 1.0, but it is not declared as beta and does not require the beta flag to be set to get this application.

I'm not sure what you mean by "wrapper" in the context of the apps in question. none of these use the BOINC wrapper. they are standalone executables. maybe you mean wrapper in a different context, but in the context of BOINC, there exists a BOINC wrapper for the purposes of making non-BOINC applications compatible with BOINC WrapperApp and that's what the documentation is referring to.

 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5020
Credit: 18921155358
RAC: 6511705

To get finer control over

To get finer control over process priorities in Windows, you can use the Microsoft Process Explorer to change the priority at the application level.

Or you can use external utilities like Process Lasso which I used for elevating specific gpu applications for years and works wonderfully.

Running these utilities allows normal default BOINC application priorities across all projects but then allows you to focus and elevate/change the process priority for specific applications.

For Linux I used a utility called schedtool to change specific application priorities which can do the same thing as Process Lasso.

So pick your poison . . .  you can have finer control of BOINC applications if desired.

 

marmot
marmot
Joined: 28 Nov 14
Posts: 21
Credit: 27289040
RAC: 0

Ian&Steve C. wrote:I'm not

Ian&Steve C. wrote:

I'm not sure I understand your comments about permanency. once you set this in the cc_config, the change is "permanent" until you unset it. it has nothing to do with an app being beta or any specific app version, and it's a global setting for all projects. Also as I understand it, BOINC is not capable of setting priority higher than Normal unless you run BOINC with elevated privileges.

the BRP7 app is also not beta. it may be new with a version less than 1.0, but it is not declared as beta and does not require the beta flag to be set to get this application.

I'm not sure what you mean by "wrapper" in the context of the apps in question. none of these use the BOINC wrapper. they are standalone executables. maybe you mean wrapper in a different context, but in the context of BOINC, there exists a BOINC wrapper for the purposes of making non-BOINC applications compatible with BOINC WrapperApp and that's what the documentation is referring to.

 

Yes, the term Wrapper has multiple meanings in this discussion.

Process Hacker can see inside your BRP7 main app which is "wrapping" multiple threads.  

Process Hacker also sees an unloaded link library called resourcepolicyclient.dll, which I assume, has set the run time priorities of the rest of the internal threads. 

The internal thread  to your main app, einsteinbinary_BRP7_0.12_windows_x86_64_BRP7-opencl-ati.exe+0x1500 is set to IDLE priority and is starving for CPU.

I've seen these work units reporting 138 DAYS till completion.

I'll open up a CPU for the 4x running BRP7 WU while I'm not at home and then manually adjust the internal thread priority while home.  

EVEN with the open CPU these WU's run 35% slower if I leave that internal thread priority at IDLE.

You can improve project throughput on ALL WU's if you correct this problem.

Please have a coder for the project look at my response.  

Thanks.

(Standard programming protocol is to never name an app out of beta with a version less than 1.00.  But that's cultural and distraction from my main point.)

marmot
marmot
Joined: 28 Nov 14
Posts: 21
Credit: 27289040
RAC: 0

Keith Myers wrote:Or you

Keith Myers wrote:

Or you can use external utilities like Process Lasso which I used for elevating specific gpu applications for years and works wonderfully.

 

Thanks Keith,  I'll download Process Lasso to see if it has more capabilities than Process Hacker.

Process Hacker does a fine job of keeping a database of the apps I've manually set priorities of, then setting those apps back to my preferred priorities as they appear in RAM.  It gives me an option to manage the internal threads, also, but it's design doesn't include a database to automatically manipulate internal threads when they appear.

 

EDIT: We've had this conversation before or my déjà vu sense is misfiring. 

Maybe I got busy and never got around to trying Process Lasso.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48036400373
RAC: 35284258

marmot wrote: I've seen

marmot wrote:

I've seen these work units reporting 138 DAYS till completion.

but does it ACTUALLY take that long? the BOINC estimate, is only an estimate. depending on what's going on with the DCF in BOINC this value can be wildly innacurate.

I'm not sure the Einstein devs will be able to do anything about this since BOINC handles the process priority. but feel free to contact Bernd via PM. he usually doesn't browse outside of the Technical News forum and likely wont see this post here.

maybe it's some kind of idiosyncrasy in Windows. my Linux version does not have different process priority/niceness between the parent and child threads.

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5020
Credit: 18921155358
RAC: 6511705

The BOINC estimation for time

The BOINC estimation for time remaining depends on a few basic things.

First the application has to have produced 11 valid results before the APR value can be accurately produced.

Average Processing Rate is used to produce the time remaining statistic.

Second, each application template that generates tasks for a project has a rsc_fpops value that the researcher inputs of their best guess of how many GFLOPs of calculation will be required for tasks to generate a result.

Researchers often guess very wrong, magnitudes wrong, and underestimate or overestimate that value which can lead to very wrong and outlandish estimation of how long a calculation will take on the running hardware.

Then you can projects like GPUGrid that are using a combo application that runs both on the cpu and gpu and BOINC has no mechanism for understanding what category of device it belongs under.  This also produces wildly overestimated times remaining of months to complete for standard 5-day deadline tasks.

The unfortunate side effect is users often abort task prematurely and unnecessarily which they should not do and just let the tasks run to completion and finish within the deadline requirement.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.