Hi, this might be boinc related, feel free to pass it on. Sorry if this has been posted before.
Just updated to boinc 7.2.28 and decided to give the GPU another go. Last time I tried the discrete GPU was active even while boinc was suspendend, draining battery. Now this seems to be fixed!
Unfortunately I get only "Error while computing", as you can see here:
http://einsteinathome.org/account/tasks&offset=0&show_names=1&state=0&appid=25
Mostly this happens immediately but I had one task run last night:
http://einsteinathome.org/task/410900799
I woke up, it was still running, then I went away from the computer, the GPU started again and then I got a kernel panic and had to restart. As you can see from the error logs, all tasks seems to end with:
[ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (gld returned: 10015). [ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30
I think this is caused by the graphics switching not working correctly. Is this supposed to work? Here is the start of my boinc log:
18-Nov-2013 08:01:07 [---] cc_config.xml not found - using defaults 18-Nov-2013 08:01:07 [---] Starting BOINC client version 7.2.28 for x86_64-apple-darwin 18-Nov-2013 08:01:07 [---] log flags: file_xfer, sched_ops, task 18-Nov-2013 08:01:07 [---] Libraries: libcurl/7.26.0 OpenSSL/1.0.1e zlib/1.2.5 c-ares/1.9.1 18-Nov-2013 08:01:07 [---] Data directory: /Library/Application Support/BOINC Data 18-Nov-2013 08:01:07 [---] OpenCL: AMD/ATI GPU 0: ATI Radeon HD 6750M (driver version 1.0, device version OpenCL 1.1, 512MB, 512MB available, 72 GFLOPS peak) 18-Nov-2013 08:01:07 [---] OpenCL CPU: Intel(R) Core(TM) i7-2675QM CPU @ 2.20GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.1) 18-Nov-2013 08:01:07 [---] Host name: coffe.local 18-Nov-2013 08:01:07 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2675QM CPU @ 2.20GHz [x86 Family 6 Model 42 Stepping 7] 18-Nov-2013 08:01:07 [---] Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 xAPIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 18-Nov-2013 08:01:07 [---] OS: Mac OS X 10.7.4 (Darwin 11.4.2) 18-Nov-2013 08:01:07 [---] Memory: 8.00 GB physical, 143.58 GB virtual 18-Nov-2013 08:01:07 [---] Disk: 464.96 GB total, 143.34 GB free 18-Nov-2013 08:01:07 [---] Local time is UTC +1 hours 18-Nov-2013 08:01:07 [---] VirtualBox version: 4.2.8
Copyright © 2024 Einstein@Home. All rights reserved.
Macbook pro AMD GPU, Error while computing
)
Hmm, okey. I was able to force to GPU active by opening VLC. Then this seems to work, I just got my first task validated:
http://einsteinathome.org/workunit/178627636
Is the run time as expected?
Thats 15 hours instead of the initially estimated 2 hours.
Anyone else have a macbook pro with 2 graphics cards? Is the auto-switching working for you?
Hi Tullus, > I think this
)
Hi Tullus,
> I think this is caused by the graphics switching not working correctly. Is this supposed to work?
No, I don't see a way how that could work. If you (or the OS) switch from the dedicated to the internal GPU while running our app, I'd expect it to fail with the error you see. For me the question is why is the dedicated GPU being switched while actively used for computing. That's not what I'd expect.
BTW, what kind of internal GPU does your MacBook come with (not recognised by BOINC)?
Thanks,
Oliver
PS: a general recommendation: keep your system up-to-date. 10.7.5 was released over a year ago. You are still running 10.7.4.
Einstein@Home Project
The integrated graphics is an
)
The integrated graphics is an Intel HD Graphics 3000 512 MB, while the discrete is the AMD.
As for why this is happening I have no idea, but it seems to work fine if I force the discrete GPU before allowing boinc to use the GPU. If the integrated GPU is active, and then boinc starts using the GPU, the task crashes immediately. Note that I am not doing any manual intervention along the lines of forcing the integrated GPU to be active. I am using gfxCardStatus to monitor which GPU is active, and when einstein starts, the gpu does indeed switch to the discrete as it should.
Note: I am running a few other CPU only projects at the same time ("use at most 90% of the processors", resulting in 7 out of 8 virtual cores being utilized). This should be fine right?
I am a bit hesitant about testing this too much (so the above might just be random coincidence with some bad WUs or maybe the moon was in the wrong place...), since the kernel panic freaked me out a little. I am pretty convinced the kernel panic was caused by boinc/einstein. But if you want me to try again with some debug flags or something let me know.
Hmm, yes, I am running OS X Lion 10.7.5 (11G63), so not sure why boinc reports it as 10.7.4. The system is always up to date.
PS. Is 10.9 maverick considered stable (in the boinc sense)? I have seen some posts about NVIDIA driver problems, but nothing about AMD.
RE: If the integrated GPU
)
Hm, so you say the GPU switch is triggered as expected but that still confuses the app and causes it to error out, throwing the error above? Or does the error above only occur in situations where the GPU is switched back to internal while the app is still running?
Yes, that's ok, and it could also explain why your GPU performance appears to be so bad. The GPU needs to be "fed" by a CPU core. You might want to reduce the number of CPUs even further (e.g. 75%, to free up one physical core) to see whether this improves the GPU performance.
I use it on all my systems without problems. OpenCL, the technology we use for AMD GPUs, is provided by Apple, so there are no external dependencies as there are for NVIDIA GPUs (CUDA). Since you're probably using Time Machine it should be fairly safe to upgrade to 10.9 (and it's free!). You may always revert to 10.7 if issues arise...
Einstein@Home Project
RE: RE: If the integrated
)
I just had another go at this, with the integrated GPU active, I activated "Use GPU always", here is my console log showing gfxs notion about the GPU state:
The task crashes immediately, found here: http://einsteinathome.org/task/411044341, for the record:
process exited with code 216 (0xd8, -40)
[15:46:32][88219][INFO ] Application startup - thank you for supporting Einstein@Home!
[15:46:32][88219][INFO ] Starting data processing...
[15:46:32][88219][INFO ] Using OpenCL platform provided by: Apple
[15:46:32][88219][INFO ] Using OpenCL device "ATI Radeon HD 6750M" by: AMD
[15:46:33][88219][ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (gld returned: 10015).
[15:46:33][88219][ERROR] Error in OpenCL context: [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30
[15:46:33][88219][ERROR] Couldn't create OpenCL command queue (error: -30)!
[15:46:33][88219][INFO ] OpenCL shutdown complete!
[15:46:33][88219][ERROR] Demodulation failed (error: 2008)!
15:46:33 (88219): called boinc_finish
]]>
So the events seems to be:
Which seems to be the expected order, the computer switches back to the integrated 2 seconds after the task has failed. Only thing I can think of is that the einstein app tries to use the GPU "too soon", i.e. before it has time to start up properly (which doesn't make too much sense, but who knows). There is of course another issue, I am not sure how fast gfx is in updating the gpu status, so the above order might be, as you indicate, wrong. (i.e. the above shows the integrated being switched back on 2 seconds after the task has crashed, but might be it switched when the task crashed.)
And just to repeat myself, this does seem not happen if I open some app that requires the discrete GPU, in which case the discrete is active all the time and everything is fine.
I have some programming experience, I have even dipped my toe into openCL programming, so if you have something you want me to test don't hesitate to ask.
Thanks for looking into this.
RE: The task crashes
)
Just to be precise, this is not "crashing" but proper error handling, with an orderly shutdown ;-)
Yep, that's exactly what I have in mind. It seems the OpenCL framework isn't correctly synchronised with the dynamic switching. If a client tries to acquire an OpenCL context, it should block that call until the device switching has taken place. However, ...
... thinking about it, it could also be related to BOINC's GPU enumeration during startup and how that relates to dynamic GPU switching. I was just wondering whether OpenCL reports both devices, despite the fact that only one is active at a given time. I'm curious, do you see a GPU switch at the time when BOINC starts up and enumerates the GPUs? In your example above that would for be at 18-Nov-2013 08:01:07 for instance.
Oliver
Einstein@Home Project
When starting boinc, it
)
When starting boinc, it momentarily switches to the discrete GPU. Explaining why it can find the AMD.
Well, it doesn't have to
)
Well, it doesn't have to because Apple's OpenCL framework allows you to detect all GPUs and query their details, without switching to them. This means that the device enumeration done by BOINC should be correct, regardless of which GPU is activated. So ignore my previous comment in that direction.
Regarding the fact that BOINC briefly causes the dedicated GPU to be used: IIRC, that's because of a workaround that allows to determine the GPU memory on AMD devices, which sometimes was reported incorrectly by OpenCL. That workaround uses OpenGL to query the device - that in turn causes a switch to the dedicated GPU.
Einstein@Home Project
Well, I am officially out of
)
Well, I am officially out of ideas. I will try to find some other opencl boinc project and let you know how it goes.
Okay, tried Primegrid: PPS
)
Okay, tried Primegrid: PPS (Sieve) v1.39 (openclPPSsieveMAC), same problem over there. So at least this is boinc related and not specific to Einstein. Here is one task:
http://www.primegrid.com/result.php?resultid=503368467