This just started today and I was wondering if there is a problem on my end or if others are experiencing the same problem.
My GPU tasks (BRP4G-Beta-cuda55-Lion) v 1.56 for Mac OS X has been stopping with an error "Postponed: Out of CPU/GPU memory"
My main memory is only being used at a few %, so it's not on the CPU side. But looking at the GPU, the processor seems to come to a screeching halt while the GPU memory is still maxed out on my NVIDIA GTX 960.
I'm using CUDA 8.0.46 and the only change I made recently was updating to Mac OS Sierra 10.12.1 and the NVIDIA GPU driver 367.15.10.15f01 last night. I also see that v1.56 of the BRP4G app is also pretty new, but I've been running it for a while without problems--I'm pretty sure anyway.
Copyright © 2024 Einstein@Home. All rights reserved.
What % of your memory is
)
What % of your memory is BOINC allowed to use? And how much memory do you have?
Is there a reason to upgrade your drivers?
I have it set to use 85% of
)
I have it set to use 85% of memory when in use, and 95% when not in use. The main memory only has 7% memory pressure. I've got 32GB of RAM and it's barely getting used. So I don't think that's what it's referring to.
However, the GPU memory and processor are normally pegged when BOINC is running a GPU task. The difference here is that the GPU processor drops off when this happens, while the memory still shows its maxed. If I suspend GPU tasks or quit BOINC, both drop off.
It hasn't happened today, but happened multiple times yesterday. This is really weird.
I upgraded the NVIDIA drivers because there were new ones out that were supposed to work better with the new version of Mac OS Sierra, which also came out. But there was no new version of CUDA. I'm thinking that the new NVIDIA web driver is the culprit. But I haven't been able to reproduce it today. Transient glitches piss me off.
I was also having some random
)
I was also having some random machine restarts that I suspected were caused by the old graphics driver. That's the other reason I upgraded drivers. (It's a Hackintosh, so there could be all kinds of issues that I have yet to find on this end).
The plot thickens. I also
)
The plot thickens. I also updated a genuine Apple MacBook Pro with an i7-3720QM CPU and NVIDIA GTX 650M GPU to Mac OS Sierra 10.12.1 from 10.12.0 and it also had a BRP4G glitch. It is using the stock Apple graphics drivers and has had CUDA 8.0.46 on it for a few weeks, so that hasn't changed. The only common denominator, as far as I can see (if indeed these are related issues) was the OS upgrade.
Unlike the first problem I reported, BOINC did not report a pause due to memory, but the BRP4G work unit had been working for over 19.5 hours (and they normally take 1.5).
Now I lost a whole batch of
)
Now I lost a whole batch of these work units to the same error as this one https://einsteinathome.org/task/585113336
I get the same error on my
)
I get the same error on my Windows 7 machine on a GTX970 when the GPU driver crashes and restarts. I'm running 3 tasks at the time. I have 2 GTX970s on the same machine and it is always the same GPU that seems to crash.
Have you checked if your computer has driver restarts? I have no idea how Apple is handling this kind of situation but just in case.
Mine only runs one at a time,
)
Mine only runs one at a time, and I'm not sure if the driver restarts or not. As I mentioned earlier, this is a Hackintosh (I put together the hardware and hacked Mac OS Sierra onto it). The machine completely restarts on me for some reason I haven't figured out yet.
But what I do see is that my GPU memory is maxed out and the GPU drops to idle when this happens.
It's really pissing me off because it's been going twice as fast as it was under Linux, which wasn't using the whole GPU potential.
My MacBookPro with an NVIDIA GTX650M is now mysteriously taking twice as long to do GPU tasks. I blame something in Mac OS Sierra 10.12.1. This all started when I updated both machines. It looks like the same thing is happening there, but without the error. I also updated to NVIDIAs latest web driver the same night on the Hackintosh, but the MacBookPro doesn't use those--it just uses CUDA.
I wonder if I should try to submit a bug report to NVIDIA and Apple.
Incidentally, which version of CUDA are you using?
The CUDA version is 8.0 and
)
The CUDA version is 8.0 and driver version is 361.91. Do Macs have something like the Windows Event Viewer where driver restarts and other problems are logged? I have no experience with Macs.
Yeah, they have something
)
Yeah, they have something very similar to the Windows Task Manager where you can view CPU, memory usage and such. But I am also using a tool called iStat Menus that lets you put some of that info on your menu bar. So I can watch CPU, disk, and network load and temps of various subsystems (for example).
You can click on the CPU graph and it has a handy pulldown list of all kinds of other stuff, including GPU memory and processor load. So normally I see a nice bar graph showing both the processor and GPU memory slamming away at a work unit, but the processor drops off while the memory remains maxed out when this error happens.
Earlier today I saw that there is a new version of CUDA out. I was using 8.0.46 on both of these machines, and now it has 8.0.52 out. The timing makes me wonder if they figured out that Mac OS Sierra 10.12.1 has a glitch and pushed this out to fix it.
Keep your fingers crossed for me tonight. Hope that fixes it.
Just spotted the part in your
)
Just spotted the part in your comment about the logs. Yes, the Console is a GUI way to review your logs. Otherwise you can use the command line and other ways.