Can anyone explain this result?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3000758672
RAC: 694626

Stranger and

Stranger and stranger.

Would it be possible for you to download and run

http://boinc.berkeley.edu/dl/clinfo.zip

-from memory, I think the best way is to run it at an administrative command prompt and redirect the output to a text file.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

RE: RE: The BRP4G

Quote:
Quote:
The BRP4G application should be intelligent enough to figure what it is running on, and if that piece of hardware is even capable of doing what it wants to do.

Yes, that seems to be exactly what it's doing.


Or not, not when you see [Einstein@Home] [coproc] ATI instance 1: confirming 0.500000 instance for p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0

That still implies it's running on GPU 1, which isn't capable of running the task.

I wonder... @Darrell, what have you set for the GPU Utilization factor for BRP tasks in http://einstein.phys.uwm.edu/prefs.php?subset=project and what are the contents of your app_config.xml file?

@Heinz, in case you think it's done on the CPU, I don't think so. Despite the CPU being OpenCL capable, I don't think you can run a GPU app indiscriminately on a CPU. It would also not show as having been done on the Cypress GPU.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3000758672
RAC: 694626

RE: RE: RE: The BRP4G

Quote:
Quote:
Quote:
The BRP4G application should be intelligent enough to figure what it is running on, and if that piece of hardware is even capable of doing what it wants to do.

Yes, that seems to be exactly what it's doing.

Or not, not when you see [Einstein@Home] [coproc] ATI instance 1: confirming 0.500000 instance for p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0

That still implies it's running on GPU 1, which isn't capable of running the task.


No, that merely implies that BOINC thinks the task is running on GPU 1 (which it shouldn't, being excluded).

I wouldn't call BOINC a reliable witness as to what is actually happening behind the scenes, in this instance.

It might be interesting to see what exactly BOINC had directed the application to do, by examining init_data.xml from the slot directory - but I'm not sure even that would be definitive, because I suspect the application is capable of over-riding an impossible directive (using its own internal OpenCL capability check).

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

RE: No, that merely implies

Quote:
No, that merely implies that BOINC thinks the task is running on GPU 1 (which it shouldn't, being excluded).


Darrell only added the exclusion after that task ran.

In this post,

Darrell wrote:
Because at the time this task was running Boinc Manager showed that the task was running on device 1. I have since added an exclusion to the cc_config.xml to ensure that it only runs on device 0.

Quote:
I wouldn't call BOINC a reliable witness as to what is actually happening behind the scenes, in this instance.


It being the managing program, detailing which task should run with what application, on what piece of hardware, it should be a reliable witness.
And if it isn't, then that should be added PDQ.

For if you can't trust BOINC anymore to manage what task runs at what time, on what hardware, using how much memory and which application, then there's no use for BOINC really. We can do it all stand-alone.

Oh, PS I am not (yet) writing anything about this to David.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3000758672
RAC: 694626

Ah, yes. I was looking

Ah, yes. I was looking at

Quote:
16-Apr-2014 17:14:00 [Einstein@Home] Config: excluded GPU. Type: ATI. App: einsteinbinary_BRP5. Device: 1
16-Apr-2014 17:14:00 [Einstein@Home] Config: excluded GPU. Type: ATI. App: hsgamma_FGRP3. Device: 1


but the task p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0 would have been Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-opencl-ati), or einsteinbinary_BRP4G for short. When did we lose "using [short_name]" from the event log?

I still think that BOINC can be relied on to report on what instructions it has given, but I don't think it is (or was ever meant to be) an analytic tool for detecting that the instructions it has given are being accurately obeyed.

@ Darell,

I'd still be interested in seeing BOINC's directive from init_data.xml, but don't post the whole thing - there's a surprising amount of private data in there. The bit that would be useful is the section below:

Quote:
remove this line
p2030.20131123.G181.70-02.90.S.b1s0g0.00000_805
p2030.20131123.G181.70-02.90.S.b1s0g0.00000_805_0
boinc_0
0
4740
0.000000
0.000000
0
028174059.637716
11861.023874
6442924.803000
7878.988357
0.500000
60.000000
0.000000
1.000000
intel_gpu
0
0
1.000000
0.500000

17500000000000.000000
350000000000000.000000
260000000.000000
20000000.000000
1399408935.000000
0


- or even just the bit I've picked out in blue.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 797641542
RAC: 1230955

RE: @Heinz, in case you

Quote:

@Heinz, in case you think it's done on the CPU, I don't think so. Despite the CPU being OpenCL capable, I don't think you can run a GPU app indiscriminately on a CPU. It would also not show as having been done on the Cypress GPU.

No, in fact the BRP code does not run on OpenCL CPUs at all (it would fail with an error message).

I suspected that there might be a problem because inclusion or exclusion of CPUs in the list of OpenCL enabled devices might prevent client and science app to pick the right GPU.

From the perspective of the science app, there isn't that much you can do wrong, actually: the science app calls a BOINC api function to get the correct OpenCL platform & device, and the API function does this by parsing information that the core client wrote to the init_data.xml file in the respective slot directory of the task. As you can see from the example snippet above, the device is basically encoded by its position in a list. If OpenCL CPUs share the same list as OpenCL GPUs and can appear at lower indices...we have a problem. Device nr n would have a different meaning to the client and to older APIs. I need to inspect the source code when I have time to see how it works.

The BRP app from Einstein@Home will use the values returned by this API function and will log failures to use the device selected by the BOINC API.

HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3000758672
RAC: 694626

OK, I've re-enabled my

OK, I've re-enabled my OpenCL/CPU testbed to match this problem as closely as I can here (no AMD devices, sorry):

Quote:
23/04/2014 16:43:44 | | Starting BOINC client version 7.3.15 for windows_x86_64
23/04/2014 16:43:44 | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 4600 (driver version 10.18.10.3496, device version OpenCL 1.2, 1298MB, 1298MB available, 184 GFLOPS peak)
23/04/2014 16:43:44 | | OpenCL CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10878, device version OpenCL 1.2 (Build 76413))
23/04/2014 16:53:32 | Einstein@Home | [coproc] Assigning intel_gpu instance 0 to p2030.20131123.G200.03-00.49.N.b3s0g0.00000_3419_1
23/04/2014 16:53:32 | Einstein@Home | [cpu_sched] Starting task p2030.20131123.G200.03-00.49.N.b3s0g0.00000_3419_1 using einsteinbinary_BRP4 version 134 (opencl-intel_gpu) in slot 0
23/04/2014 16:54:32 | Einstein@Home | [coproc] intel_gpu instance 0; 1.000000 pending for p2030.20131123.G200.03-00.49.N.b3s0g0.00000_3419_1
23/04/2014 16:54:32 | Einstein@Home | [coproc] intel_gpu instance 0: confirming 1.000000 instance for p2030.20131123.G200.03-00.49.N.b3s0g0.00000_3419_1

The init_data.xml in slot 0 contains:

intel_gpu
0
0
1.000000
1.000000

- index 0 throughout, even though clinfo detects the GPU second:

Quote:

Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2
Platform Name: Intel(R) OpenCL
Platform Vendor: Intel(R) Corporation
Platform Extensions: cl_intel_dx9_media_sharing cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics

Platform Name: Intel(R) OpenCL
Number of devices: 2

Device Type: CL_DEVICE_TYPE_CPU
Device ID: 32902
...

Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
...


Now that's set up, let me know if there are any useful tests I can run on this platform.

Darrell
Darrell
Joined: 11 Nov 04
Posts: 32
Credit: 15397991
RAC: 0

RE: I'm sure we should

Quote:

I'm sure we should flag this for all the developers involved.
But also, if Darrell could de-exclude the GPU for this project, then wait until he sees something similar happen again, stop BOINC and run everything through the client simulator (http://boinc.berkeley.edu/trac/wiki/ClientSim) and point out the run he had, one of us could point that to the developers as well.

Apropos, BOINC 7.2.28 had OpenCL for CPUs already. So the present recommended 7.2.42 does so as well.

I can do this this weekend, have college classes, Tuesday, Wednesday, and Thursday after working eight hours.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

This weekend is quickly

This weekend is quickly enough. :)

Darrell
Darrell
Joined: 11 Nov 04
Posts: 32
Credit: 15397991
RAC: 0

RE: I wonder... @Darrell,

Quote:

I wonder... @Darrell, what have you set for the GPU Utilization factor for BRP tasks in http://einstein.phys.uwm.edu/prefs.php?subset=project and what are the contents of your app_config.xml file?

@Heinz, in case you think it's done on the CPU, I don't think so. Despite the CPU being OpenCL capable, I don't think you can run a GPU app indiscriminately on a CPU. It would also not show as having been done on the Cypress GPU.

Utilization factor set at the default value in the preferences.

contents Einstein@Home app_config.xml:


einsteinbinary_BRP4

0.5
0.25



einsteinbinary_BRP4G

0.5
0.25



einsteinbinary_BRP5

0.5
0.25



hsgamma_FGRP3

0.5
0.25



einstein_S6CasA

0.5
0.25


Note: The BRP4G and S6CasA apps exclusion were added this week to the client config xml, after they had completed tasks on the HD3000. I just thought it was strange that most opencl tasks fail immediately when trying to run on the HD3000 as you would expect, and suddenly along comes a couple that don't.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.