Macbook pro AMD GPU, Error while computing

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0
Topic 197276

Hi, this might be boinc related, feel free to pass it on. Sorry if this has been posted before.

Just updated to boinc 7.2.28 and decided to give the GPU another go. Last time I tried the discrete GPU was active even while boinc was suspendend, draining battery. Now this seems to be fixed!

Unfortunately I get only "Error while computing", as you can see here:
http://einsteinathome.org/account/tasks&offset=0&show_names=1&state=0&appid=25
Mostly this happens immediately but I had one task run last night:
http://einsteinathome.org/task/410900799
I woke up, it was still running, then I went away from the computer, the GPU started again and then I got a kernel panic and had to restart. As you can see from the error logs, all tasks seems to end with:

[ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (gld returned: 10015).
[ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30


I think this is caused by the graphics switching not working correctly. Is this supposed to work? Here is the start of my boinc log:

18-Nov-2013 08:01:07 [---] cc_config.xml not found - using defaults
18-Nov-2013 08:01:07 [---] Starting BOINC client version 7.2.28 for x86_64-apple-darwin
18-Nov-2013 08:01:07 [---] log flags: file_xfer, sched_ops, task
18-Nov-2013 08:01:07 [---] Libraries: libcurl/7.26.0 OpenSSL/1.0.1e zlib/1.2.5 c-ares/1.9.1
18-Nov-2013 08:01:07 [---] Data directory: /Library/Application Support/BOINC Data
18-Nov-2013 08:01:07 [---] OpenCL: AMD/ATI GPU 0: ATI Radeon HD 6750M (driver version 1.0, device version OpenCL 1.1, 512MB, 512MB available, 72 GFLOPS peak)
18-Nov-2013 08:01:07 [---] OpenCL CPU: Intel(R) Core(TM) i7-2675QM CPU @ 2.20GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.1)
18-Nov-2013 08:01:07 [---] Host name: coffe.local
18-Nov-2013 08:01:07 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2675QM CPU @ 2.20GHz [x86 Family 6 Model 42 Stepping 7]
18-Nov-2013 08:01:07 [---] Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 xAPIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0
18-Nov-2013 08:01:07 [---] OS: Mac OS X 10.7.4 (Darwin 11.4.2)
18-Nov-2013 08:01:07 [---] Memory: 8.00 GB physical, 143.58 GB virtual
18-Nov-2013 08:01:07 [---] Disk: 464.96 GB total, 143.34 GB free
18-Nov-2013 08:01:07 [---] Local time is UTC +1 hours
18-Nov-2013 08:01:07 [---] VirtualBox version: 4.2.8

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0

Macbook pro AMD GPU, Error while computing

Hmm, okey. I was able to force to GPU active by opening VLC. Then this seems to work, I just got my first task validated:
http://einsteinathome.org/workunit/178627636

Is the run time as expected?

Quote:
411023695 9563905 18 Nov 2013 17:29:22 UTC 22 Nov 2013 7:32:00 UTC Completed and validated 53,882.82 3,213.18 15.01 1,000.00 Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-opencl-ati-lion)


Thats 15 hours instead of the initially estimated 2 hours.

Anyone else have a macbook pro with 2 graphics cards? Is the auto-switching working for you?

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 0

Hi Tullus, > I think this

Hi Tullus,

> I think this is caused by the graphics switching not working correctly. Is this supposed to work?

No, I don't see a way how that could work. If you (or the OS) switch from the dedicated to the internal GPU while running our app, I'd expect it to fail with the error you see. For me the question is why is the dedicated GPU being switched while actively used for computing. That's not what I'd expect.

BTW, what kind of internal GPU does your MacBook come with (not recognised by BOINC)?

Thanks,
Oliver

PS: a general recommendation: keep your system up-to-date. 10.7.5 was released over a year ago. You are still running 10.7.4.

Einstein@Home Project

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0

The integrated graphics is an

The integrated graphics is an Intel HD Graphics 3000 512 MB, while the discrete is the AMD.

As for why this is happening I have no idea, but it seems to work fine if I force the discrete GPU before allowing boinc to use the GPU. If the integrated GPU is active, and then boinc starts using the GPU, the task crashes immediately. Note that I am not doing any manual intervention along the lines of forcing the integrated GPU to be active. I am using gfxCardStatus to monitor which GPU is active, and when einstein starts, the gpu does indeed switch to the discrete as it should.

Note: I am running a few other CPU only projects at the same time ("use at most 90% of the processors", resulting in 7 out of 8 virtual cores being utilized). This should be fine right?

I am a bit hesitant about testing this too much (so the above might just be random coincidence with some bad WUs or maybe the moon was in the wrong place...), since the kernel panic freaked me out a little. I am pretty convinced the kernel panic was caused by boinc/einstein. But if you want me to try again with some debug flags or something let me know.

Quote:
PS: a general recommendation: keep your system up-to-date. 10.7.5 was released over a year ago. You are still running 10.7.4.


Hmm, yes, I am running OS X Lion 10.7.5 (11G63), so not sure why boinc reports it as 10.7.4. The system is always up to date.

PS. Is 10.9 maverick considered stable (in the boinc sense)? I have seen some posts about NVIDIA driver problems, but nothing about AMD.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 0

RE: If the integrated GPU

Quote:
If the integrated GPU is active, and then boinc starts using the GPU, the task crashes immediately. [...] when einstein starts, the gpu does indeed switch to the discrete as it should.


Hm, so you say the GPU switch is triggered as expected but that still confuses the app and causes it to error out, throwing the error above? Or does the error above only occur in situations where the GPU is switched back to internal while the app is still running?

Quote:

Note: I am running a few other CPU only projects at the same time ("use at most 90% of the processors", resulting in 7 out of 8 virtual cores being utilized). This should be fine right?


Yes, that's ok, and it could also explain why your GPU performance appears to be so bad. The GPU needs to be "fed" by a CPU core. You might want to reduce the number of CPUs even further (e.g. 75%, to free up one physical core) to see whether this improves the GPU performance.

Quote:

Is 10.9 maverick considered stable (in the boinc sense)? I have seen some posts about NVIDIA driver problems, but nothing about AMD.


I use it on all my systems without problems. OpenCL, the technology we use for AMD GPUs, is provided by Apple, so there are no external dependencies as there are for NVIDIA GPUs (CUDA). Since you're probably using Time Machine it should be fairly safe to upgrade to 10.9 (and it's free!). You may always revert to 10.7 if issues arise...

Einstein@Home Project

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0

RE: RE: If the integrated

Quote:
Quote:
If the integrated GPU is active, and then boinc starts using the GPU, the task crashes immediately. [...] when einstein starts, the gpu does indeed switch to the discrete as it should.

Hm, so you say the GPU switch is triggered as expected but that still confuses the app and causes it to error out, throwing the error above? Or does the error above only occur in situations where the GPU is switched back to internal while the app is still running?

I just had another go at this, with the integrated GPU active, I activated "Use GPU always", here is my console log showing gfxs notion about the GPU state:

22.11.13 15:46:33,205 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:33.204 gfxCardStatus[466/0x7fff7639f960] [lvl=2] _displayReconfigurationCallback() A non-built-in display reconfiguration callback has been triggered.
22.11.13 15:46:33,302 Finder: kCGErrorIllegalArgument: CGSGetDisplayAliasList: Invalid display 0x745e4205
22.11.13 15:46:33,303 Finder: kCGErrorIllegalArgument: CGSGetDisplayAliasList: Invalid display 0x745e4205
22.11.13 15:46:33,707 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:33.706 gfxCardStatus[466/0x103c82000] [lvl=2] ___displayReconfigurationCallback_block_invoke_0() Notification: GPU changed. Integrated? 0
22.11.13 15:46:33,726 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:33.725 gfxCardStatus[466/0x103c82000] [lvl=2] -[GSMenuController updateMenu] Using dynamic switching?: 1
22.11.13 15:46:33,726 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:33.725 gfxCardStatus[466/0x103c82000] [lvl=2] -[GSMenuController updateMenu] Using old-style switching policy?: 0
22.11.13 15:46:33,726 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:33.725 gfxCardStatus[466/0x103c82000] [lvl=2] -[GSMenuController updateMenu] AMD Radeon HD 6750M in use. Bummer! Less battery life for you.
22.11.13 15:46:35,195 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:35.191 gfxCardStatus[466/0x7fff7639f960] [lvl=2] _displayReconfigurationCallback() A non-built-in display reconfiguration callback has been triggered.
22.11.13 15:46:35,274 Finder: kCGErrorIllegalArgument: CGSGetDisplayAliasList: Invalid display 0x745e4205
22.11.13 15:46:35,274 Finder: kCGErrorIllegalArgument: CGSGetDisplayAliasList: Invalid display 0x745e4205
22.11.13 15:46:35,693 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:35.693 gfxCardStatus[466/0x103c82000] [lvl=2] ___displayReconfigurationCallback_block_invoke_0() Notification: GPU changed. Integrated? 1
22.11.13 15:46:35,705 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:35.705 gfxCardStatus[466/0x103c82000] [lvl=2] -[GSMenuController updateMenu] Using dynamic switching?: 1
22.11.13 15:46:35,706 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:35.705 gfxCardStatus[466/0x103c82000] [lvl=2] -[GSMenuController updateMenu] Using old-style switching policy?: 0
22.11.13 15:46:35,706 [0x0-0x2a02a].com.codykrieger.gfxCardStatus: 2013-11-22 15:46:35.705 gfxCardStatus[466/0x103c82000] [lvl=2] -[GSMenuController updateMenu] Intel HD Graphics 3000

The task crashes immediately, found here: http://einsteinathome.org/task/411044341, for the record:

7.2.28

process exited with code 216 (0xd8, -40)

[15:46:32][88219][INFO ] Application startup - thank you for supporting Einstein@Home!
[15:46:32][88219][INFO ] Starting data processing...
[15:46:32][88219][INFO ] Using OpenCL platform provided by: Apple
[15:46:32][88219][INFO ] Using OpenCL device "ATI Radeon HD 6750M" by: AMD
[15:46:33][88219][ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (gld returned: 10015).
[15:46:33][88219][ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30
[15:46:33][88219][ERROR] Couldn't create OpenCL command queue (error: -30)!
[15:46:33][88219][INFO ] OpenCL shutdown complete!
[15:46:33][88219][ERROR] Demodulation failed (error: 2008)!
15:46:33 (88219): called boinc_finish

]]>

So the events seems to be:

1) [15:46:32][88219][INFO ] Starting data processing...
2) [15:46:32][88219][INFO ] Using OpenCL device "ATI Radeon HD 6750M" by: AMD
3) 15:46:33,726 AMD Radeon HD 6750M in use
4) [15:46:33][88219][ERROR] Demodulation failed (error: 2008)!
5) 15:46:35,706 Intel HD Graphics 3000

Which seems to be the expected order, the computer switches back to the integrated 2 seconds after the task has failed. Only thing I can think of is that the einstein app tries to use the GPU "too soon", i.e. before it has time to start up properly (which doesn't make too much sense, but who knows). There is of course another issue, I am not sure how fast gfx is in updating the gpu status, so the above order might be, as you indicate, wrong. (i.e. the above shows the integrated being switched back on 2 seconds after the task has crashed, but might be it switched when the task crashed.)

And just to repeat myself, this does seem not happen if I open some app that requires the discrete GPU, in which case the discrete is active all the time and everything is fine.

I have some programming experience, I have even dipped my toe into openCL programming, so if you have something you want me to test don't hesitate to ask.

Thanks for looking into this.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 0

RE: The task crashes

Quote:
The task crashes immediately, found here: http://einsteinathome.org/task/411044341, for the record:


Just to be precise, this is not "crashing" but proper error handling, with an orderly shutdown ;-)

Quote:
Only thing I can think of is that the einstein app tries to use the GPU "too soon", i.e. before it has time to start up properly (which doesn't make too much sense, but who knows)


Yep, that's exactly what I have in mind. It seems the OpenCL framework isn't correctly synchronised with the dynamic switching. If a client tries to acquire an OpenCL context, it should block that call until the device switching has taken place. However, ...

... thinking about it, it could also be related to BOINC's GPU enumeration during startup and how that relates to dynamic GPU switching. I was just wondering whether OpenCL reports both devices, despite the fact that only one is active at a given time. I'm curious, do you see a GPU switch at the time when BOINC starts up and enumerates the GPUs? In your example above that would for be at 18-Nov-2013 08:01:07 for instance.

Oliver

Einstein@Home Project

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0

When starting boinc, it

When starting boinc, it momentarily switches to the discrete GPU. Explaining why it can find the AMD.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 0

Well, it doesn't have to

Well, it doesn't have to because Apple's OpenCL framework allows you to detect all GPUs and query their details, without switching to them. This means that the device enumeration done by BOINC should be correct, regardless of which GPU is activated. So ignore my previous comment in that direction.

Regarding the fact that BOINC briefly causes the dedicated GPU to be used: IIRC, that's because of a workaround that allows to determine the GPU memory on AMD devices, which sometimes was reported incorrectly by OpenCL. That workaround uses OpenGL to query the device - that in turn causes a switch to the dedicated GPU.

Einstein@Home Project

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0

Well, I am officially out of

Well, I am officially out of ideas. I will try to find some other opencl boinc project and let you know how it goes.

Tullus
Tullus
Joined: 18 Aug 13
Posts: 9
Credit: 1263090
RAC: 0

Okay, tried Primegrid: PPS

Okay, tried Primegrid: PPS (Sieve) v1.39 (openclPPSsieveMAC), same problem over there. So at least this is boinc related and not specific to Einstein. Here is one task:
http://www.primegrid.com/result.php?resultid=503368467

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.