BRP4G Cuda 55 Beta v1.56 for Mac OS X Postpones: out of CPU/GPU Memory

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902
Topic 202371

This just started today and I was wondering if there is a problem on my end or if others are experiencing the same problem.

 

My GPU tasks (BRP4G-Beta-cuda55-Lion) v 1.56 for Mac OS X has been stopping with an error "Postponed: Out of CPU/GPU memory"

My main memory is only being used at a few %, so it's not on the CPU side.  But looking at the GPU, the processor seems to come to a screeching halt while the GPU memory is still maxed out on my NVIDIA GTX 960.

 

I'm using CUDA 8.0.46 and the only change I made recently was updating to Mac OS Sierra 10.12.1 and the NVIDIA GPU driver 367.15.10.15f01 last night.  I also see that v1.56 of the BRP4G app is also pretty new, but I've been running it for a while without problems--I'm pretty sure anyway.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3390416540
RAC: 2827009

What % of your memory is

What % of your memory is BOINC allowed to use? And how much memory do you have?

 

Is there a reason to upgrade your drivers?

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

I have it set to use 85% of

I have it set to use 85% of memory when in use, and 95% when not in use.  The main memory only has 7% memory pressure.  I've got 32GB of RAM and it's barely getting used.  So I don't think that's what it's referring to.

However, the GPU memory and processor are normally pegged when BOINC is running a GPU task.  The difference here is that the GPU processor drops off when this happens, while the memory still shows its maxed.  If I suspend GPU tasks or quit BOINC, both drop off.

It hasn't happened today, but happened multiple times yesterday.  This is really weird.  

I upgraded the NVIDIA drivers because there were new ones out that were supposed to work better with the new version of Mac OS Sierra, which also came out.  But there was no new version of CUDA.  I'm thinking that the new NVIDIA web driver is the culprit.  But I haven't been able to reproduce it today.  Transient glitches piss me off.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

I was also having some random

I was also having some random machine restarts that I suspected were caused by the old graphics driver.  That's the other reason I upgraded drivers.  (It's a Hackintosh, so there could be all kinds of issues that I have yet to find on this end).

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

The plot thickens.  I also

The plot thickens.  I also updated a genuine Apple MacBook Pro with an i7-3720QM CPU and NVIDIA GTX 650M GPU to Mac OS Sierra 10.12.1 from 10.12.0 and it also had a BRP4G glitch.  It is using the stock Apple graphics drivers and has had CUDA 8.0.46 on it for a few weeks, so that hasn't changed.  The only common denominator, as far as I can see (if indeed these are related issues) was the OS upgrade.

Unlike the first problem I reported, BOINC did not report a pause due to memory, but the BRP4G work unit had been working for over 19.5 hours (and they normally take 1.5).

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

Now I lost a whole batch of

Now I lost a whole batch of these work units to the same error as this one https://einsteinathome.org/task/585113336 

 

[14:57:16][2629][ERROR] Failed to enable CUDA thread yielding for device #0 (error: 2)! Sorry, will try to occupy one CPU core...
[14:57:16][2629][ERROR] Couldn't acquire CUDA context of device #0 (error: 2)!
[14:57:16][2629][ERROR] Demodulation failed (error: 1002)!
Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4338
Credit: 3202801763
RAC: 1946516

I get the same error on my

I get the same error on my Windows 7 machine on a GTX970 when the GPU driver crashes and restarts. I'm running 3 tasks at the time. I have 2 GTX970s on the same machine and it is always the same GPU that seems to crash.

Have you checked if your computer has driver restarts? I have no idea how Apple is handling this kind of situation but just in case.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

Mine only runs one at a time,

Mine only runs one at a time, and I'm not sure if the driver restarts or not.  As I mentioned earlier, this is a Hackintosh (I put together the hardware and hacked Mac OS Sierra onto it).  The machine completely restarts on me for some reason I haven't figured out yet.

But what I do see is that my GPU memory is maxed out and the GPU drops to idle when this happens.

It's really pissing me off because it's been going twice as fast as it was under Linux, which wasn't using the whole GPU potential.

My MacBookPro with an NVIDIA GTX650M is now mysteriously taking twice as long to do GPU tasks.  I blame something in Mac OS Sierra 10.12.1.  This all started when I updated both machines.  It looks like the same thing is happening there, but without the error. I also updated to NVIDIAs latest web driver the same night on the Hackintosh, but the MacBookPro doesn't use those--it just uses CUDA. 

I wonder if I should try to submit a bug report to NVIDIA and Apple. 

Incidentally, which version of CUDA are you using? 

 

 

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4338
Credit: 3202801763
RAC: 1946516

The CUDA version is 8.0 and

The CUDA version is 8.0 and driver version is 361.91. Do Macs have something like the Windows Event Viewer where driver restarts and other problems are logged? I have no experience with Macs.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

Yeah, they have something

Yeah, they have something very similar to the Windows Task Manager where you can view CPU, memory usage and such.  But I am also using a tool called iStat Menus that lets you put some of that info on your menu bar.  So I can watch CPU, disk, and network load and temps of various subsystems (for example). 

You can click on the CPU graph and it has a handy pulldown list of all kinds of other stuff, including GPU memory and processor load.  So normally I see a nice bar graph showing both the processor and GPU memory slamming away at a work unit, but the processor drops off while the memory remains maxed out when this error happens.

Earlier today I saw that there is a new version of CUDA out.  I was using 8.0.46 on both of these machines, and now it has 8.0.52 out.  The timing makes me wonder if they figured out that Mac OS Sierra 10.12.1 has a glitch and pushed this out to fix it.

Keep your fingers crossed for me tonight.  Hope that fixes it.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341974019
RAC: 902

Just spotted the part in your

Just spotted the part in your comment about the logs.  Yes, the Console is a GUI way to review your logs.  Otherwise you can use the command line and other ways. 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.