Gravitational Wave Engineering run on LIGO O1 Open Data

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117577326551

RAC: 35179160

DanNeely wrote:Setting

18 Apr 2019 8:44:43 UTC

Message 170784 in response to message 170781

(moderation:

)

DanNeely wrote:

Setting cpu/gpu usage to 0 threw an error message when I tried reading the config file. Going the other direction and setting hardware requirements well in excess of what my boxes have worked great though:
<app>
    <name>einstein_O1OD1E</name>
    <gpu_versions>
        <gpu_usage>99</gpu_usage>
        <cpu_usage>99</cpu_usage>
    </gpu_versions>
</app>

I guess that makes perfect sense if you think about it :-). Telling BOINC that a particular app requires much more hardware than you have - so don't even try these tasks - sounds like the logical solution :-).

Cheers,
Gary.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

DanNeely wrote:Gary Roberts

19 Apr 2019 4:17:08 UTC

Message 170801 in response to message 170781

(moderation:

)

DanNeely wrote:

Gary Roberts wrote:
Richie wrote:
DanNeely wrote:
Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

'ON' for CPU and 'OFF' for all GPUs (AMD , Nvidia , Intel).

I suspect Dan would just want to exclude O1OD1E GPU tasks and not FGRPB1G tasks as well. Your suggestion excludes all types of GPU crunching. Off the top of my head (I've never tried it) a possible way would be to use the app_config.xml mechanism and use both the name and plan class tags to identify just the GPU version. Perhaps setting the cpu_usage and gpu_usage (or maybe the max_concurrent) for that combination to zero might effectively exclude those tasks without affecting anything else. It would be worth experimenting.

Setting cpu/gpu usage to 0 threw an error message when I tried reading the config file. Going the other direction and setting hardware requirements well in excess of what my boxes have worked great though:

<app> <name>einstein_O1OD1E</name> <gpu_versions> <gpu_usage>99</gpu_usage> <cpu_usage>99</cpu_usage> </gpu_versions> </app>

after ~12 hours on each of two systems I'm reasonably confident this is working as expected, I'm getting a mix of O1OD1E CPU tasks and Fermi GPU tasks on both, but not anything I don't want.

And apparently I'm not as clever as I thought, the server was just toying with me. Both of my boxes recently got several 99CPU 99GPU needed tasks. It's late so I'm not going to screw around and see if boinc will attempt to run them until sometime tomorrow. But it looks like I need a plan C of some sort.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

DanNeely wrote:DanNeely

19 Apr 2019 10:50:51 UTC

Message 170806 in response to message 170801

(moderation:

)

DanNeely wrote:

DanNeely wrote:
Gary Roberts wrote:
Richie wrote:
DanNeely wrote:
Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

'ON' for CPU and 'OFF' for all GPUs (AMD , Nvidia , Intel).

I suspect Dan would just want to exclude O1OD1E GPU tasks and not FGRPB1G tasks as well. Your suggestion excludes all types of GPU crunching. Off the top of my head (I've never tried it) a possible way would be to use the app_config.xml mechanism and use both the name and plan class tags to identify just the GPU version. Perhaps setting the cpu_usage and gpu_usage (or maybe the max_concurrent) for that combination to zero might effectively exclude those tasks without affecting anything else. It would be worth experimenting.

Setting cpu/gpu usage to 0 threw an error message when I tried reading the config file. Going the other direction and setting hardware requirements well in excess of what my boxes have worked great though:

<app> <name>einstein_O1OD1E</name> <gpu_versions> <gpu_usage>99</gpu_usage> <cpu_usage>99</cpu_usage> </gpu_versions> </app>

after ~12 hours on each of two systems I'm reasonably confident this is working as expected, I'm getting a mix of O1OD1E CPU tasks and Fermi GPU tasks on both, but not anything I don't want.

And apparently I'm not as clever as I thought, the server was just toying with me. Both of my boxes recently got several 99CPU 99GPU needed tasks. It's late so I'm not going to screw around and see if boinc will attempt to run them until sometime tomorrow. But it looks like I need a plan C of some sort.

Well the tasks won't run at least.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117577326551

RAC: 35179160

DanNeely wrote:... it looks

19 Apr 2019 23:14:57 UTC

Message 170815 in response to message 170806

(moderation:

)

DanNeely wrote:

... it looks like I need a plan C of some sort.

The documention implies that you can use an <app_version> clause to replace an <app> clause. It says "overrides" but because the <app> clause itself is shown as optional, I suspect you would use one or the other rather than expecting the second to override the first. Maybe you'll need to try both ways.

<max_concurrent> is an option in an <app> clause but it's not shown at all for <app_version>. It might be just an oversight so perhaps something like the following might do what you want. Note that xxxx represents the type of GPU you have, ati or nvidia. If a <max_concurrent> of zero is accepted, the client might know not to request work for that plan_class. Maybe you'll get a better idea by checking what actually gets installed in the state file.

<app_version>
    <app_name>einstein_O1OD1E</app_name>
    <plan_class>GW-opencl-xxxx-V1</plan_class>
    <max_concurrent>0</max_concurrent>
    <avg_ncpus>99</avg_ncpus>
    <ngpus>99</ngpus>
</app_version>

The other things that might give some clues are the contents of a sched_request and sched_reply as a result of particular settings used in app_config.xml. It could also be worthwhile looking at the scheduler logs on the website to see the decision making process the scheduler went through in response to a particular request.

Cheers,
Gary.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

I just tried adding the

20 Apr 2019 13:50:19 UTC

Message 170823

(moderation:

)

I just tried adding the nvidia version of that <app_verion>, loaded the config file, aborted a block of existing nvidia GW tasks, and had a fresh batch of them downloaded afterward.

Other than when an error occurs and a URL is listed in the event log, I'm not sure how to see a scheduler request/reply

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Gary Roberts wrote:DanNeely

20 Apr 2019 14:49:07 UTC

Message 170824 in response to message 170815

(moderation:

)

Gary Roberts wrote:

DanNeely wrote:
... it looks like I need a plan C of some sort.

The documention implies that you can use an <app_version> clause to replace an <app> clause. It says "overrides" but because the <app> clause itself is shown as optional, I suspect you would use one or the other rather than expecting the second to override the first. Maybe you'll need to try both ways.

<max_concurrent> is an option in an <app> clause but it's not shown at all for <app_version>. It might be just an oversight so perhaps something like the following might do what you want. Note that xxxx represents the type of GPU you have, ati or nvidia. If a <max_concurrent> of zero is accepted, the client might know not to request work for that plan_class. Maybe you'll get a better idea by checking what actually gets installed in the state file.
<app_version>
    <app_name>einstein_O1OD1E</app_name>
    <plan_class>GW-opencl-xxxx-V1</plan_class>
    <max_concurrent>0</max_concurrent>
    <avg_ncpus>99</avg_ncpus>
    <ngpus>99</ngpus>
</app_version>
The other things that might give some clues are the contents of a sched_request and sched_reply as a result of particular settings used in app_config.xml. It could also be worthwhile looking at the scheduler logs on the website to see the decision making process the scheduler went through in response to a particular request.

Gary, what about an exclude gpu in the cc_config? Not sure if you need the device num or not. Also don't konw if need plan class, but can't hurt to try.

<cc_config> <options> <exclude_gpu> <url>http://einstein.phys.uwm.edu/</url> <device_num>0</device_num> <app_name>einstein_O1OD1E</app_name> <plan_class>GW-opencl-xxxx-V1</plan_class> </exclude_gpu> </options></cc_config>

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

Zalster wrote:Gary Roberts

20 Apr 2019 15:05:25 UTC

Message 170825 in response to message 170824

(moderation:

)

Zalster wrote:

Gary Roberts wrote:
DanNeely wrote:
... it looks like I need a plan C of some sort.

The documention implies that you can use an <app_version> clause to replace an <app> clause. It says "overrides" but because the <app> clause itself is shown as optional, I suspect you would use one or the other rather than expecting the second to override the first. Maybe you'll need to try both ways.

<max_concurrent> is an option in an <app> clause but it's not shown at all for <app_version>. It might be just an oversight so perhaps something like the following might do what you want. Note that xxxx represents the type of GPU you have, ati or nvidia. If a <max_concurrent> of zero is accepted, the client might know not to request work for that plan_class. Maybe you'll get a better idea by checking what actually gets installed in the state file.
<app_version>
    <app_name>einstein_O1OD1E</app_name>
    <plan_class>GW-opencl-xxxx-V1</plan_class>
    <max_concurrent>0</max_concurrent>
    <avg_ncpus>99</avg_ncpus>
    <ngpus>99</ngpus>
</app_version>
The other things that might give some clues are the contents of a sched_request and sched_reply as a result of particular settings used in app_config.xml. It could also be worthwhile looking at the scheduler logs on the website to see the decision making process the scheduler went through in response to a particular request.
Gary, what about an exclude gpu in the cc_config? Not sure if you need the device num or not. Also don't konw if need plan class, but can't hurt to try.

<cc_config> <options> <exclude_gpu> <url>http://einstein.phys.uwm.edu/</url> <device_num>0</device_num> <app_name>einstein_O1OD1E</app_name> <plan_class>GW-opencl-xxxx-V1</plan_class> </exclude_gpu> </options></cc_config>

Your attempt to limit the exclusion to GW GPU tasks didn't work, it also showed GPU missing on my Fermi tasks, failed over to a backup project, and at some point in there process began aborting the fermi GPU tasks (I managed to stop boinc and revert the change before it took out more than 50 or 60 of them).

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1589355725

RAC: 763933

A couple of observations. I

20 Apr 2019 20:39:37 UTC

Message 170832

(moderation:

)

A couple of observations.

I would much rather run gravity waves than binary pulsars even though the RAC is taking a huge hit. If I wanted credits I would not run Einstein at all but something stupid like Collatz.

Since the GPU app came most work stalls after completing with a " waiting to acquire lock" They eventually clear and validate. I don't know if this is a feature or a bug.

I hope my modest efforts with a GTX1060 are helping to develop a more efficient app, The current thing is a real hog.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

Betreger wrote:Since the GPU

20 Apr 2019 21:03:27 UTC

Message 170833 in response to message 170832

(moderation:

)

Betreger wrote:

Since the GPU app came most work stalls after completing with a " waiting to acquire lock" They eventually clear and validate. I don't know if this is a feature or a bug.

Hi! That was a bug on v0.12 which is now deprecated and current version is v0.13. I haven't seen a single v0.12 tasks that run properly. They couldn't validate but some v0.13 tasks run by Nvidia are validating. Also AMD tasks if they were run 1x or maybe 2x..,but I think none of 4x has yet validated. There are plenty of validation inconclusives among them instead.

scole of TSBT

Joined: 2 Mar 05

Posts: 10

Credit: 720822167

RAC: 1089903

zombie67 [MM wrote:]How to

20 Apr 2019 22:50:16 UTC

Message 170834

(moderation:

)

EDIT: I didn't realize that was a month old post :)

zombie67 [MM wrote:

]How to get the tasks for Gravitational Wave Engineering run on LIGO O1 Open Data? There is no way to select that app in the project preferences. Or am I missing it?

Do you have run test applications selected?

Gravitational Wave Engineering run on LIGO O1 Open Data

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner