Computation Error with GW Search

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48110902079
RAC: 33496841

I thought we went over this

I thought we went over this already. That issue doesn’t seem to be in play anymore. Or at least is not implemented in the way you’re describing. 
 

ive personally watched my 3GB nvidia card happily use 2GB of GPU memory (66%) and my 4GB nvidia cards happily use 3.2GB memory (80%) on these Einstein OpenCL tasks. That’s much more than 25% and on single tasks. 
 

I think the limit is only applied to single buffer size. Which is worked around by nearly all implementations (including SETI and Einstein) by running multiple buffers, effectively lifting that limit. 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18938164469
RAC: 6456162

But the limit is there.  It

But the limit is there.  It is defined in the host details whenever OpenCL is probed for the card.

Whether the limit can be managed by some clever code writing in your application is an unknown to me as I am not a developer.

I only state the 25% limit is real.  I don't know how the project app developers get around the issue.

 

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 156
Credit: 2231287216
RAC: 615534

If you follow own link to the

If you follow own link from previous post (https://forums.developer.nvidia.com/t/why-is-cl-device-max-mem-alloc-size-never-larger-than-25-of-cl-device-global-mem-size-only-on-nvidia/47745) to the end you will find this (posted in 2017y):

Nvidia driver in practice seems to successfully allocate single memory chunks (for OpenCL) far beyond result of CL_DEVICE_MAX_MEM_ALLOC_SIZE value, and everything works well (but of course such allocations are inappropriate for production code)

 

 And this (posted by NVIDIA development team representative)

Developers can try to allocate more memory than CL_DEVICE_MAX_MEM_ALLOC_SIZE, but the successful allocation is not guaranteed (this is same for any allocation call). The developers should check for error returned by clCreateBuffer and use the allocation only if the call returns CL_SUCCESS

So looks like this limit is reported but not actually enforced. Just allocation beyond this value is "not guaranteed". But usually all works OK as if no such limit in practice exist.

Strange decisions from NV... And they not gonna fix it. Probable another trick to push more software devs from using open industry standards (openCL) to theirs own private (CUDA) as there is no such "fake" limits in CUDA

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48110902079
RAC: 33496841

Keith Myers wrote: But the

Keith Myers wrote:

But the limit is there.  It is defined in the host details whenever OpenCL is probed for the card.

Whether the limit can be managed by some clever code writing in your application is an unknown to me as I am not a developer.

I only state the 25% limit is real.  I don't know how the project app developers get around the issue.

 

all I can say is that this limit doesn’t seem to affect Einstein at all with the way the apps here are coded. So there’s really no point mentioning it. 
 

the Gamma Ray tasks use only a small amount of GPU ram, and the Gravitational Wave tasks must be using multiple buffers as they can use 100% of the GPU memory. This is actually the cause of all the issues here with GW tasks. The GPU ram filling up past 100%, not 25% 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18938164469
RAC: 6456162

Ok, I will try and refrain

Ok, I will try and refrain from posting from long learned muscle memory. Not an issue so should never be mentioned in the future.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48110902079
RAC: 33496841

but it’s demonstrably not

but it’s demonstrably not applicable here. It’s an interesting nugget of knowledge, but why mention it if it’s not applicable to Einstein. It won’t help anyone solve any problems here, since it causes no problems here. It’s really info for the devs to know their limits when writing their software. And the devs here have either knowingly or unknowingly coded the apps in such a way as to not be limited by this. How else could we be seeing successful executions using >25% if it was a hard limit? 
 

It probably hasn’t been an issue for nvidia GPUs for a LONG time since the amount of GPU ram on most cards has grown. Maybe if you had a GPU with less than 1GB of ram, you might run into this 25% single buffer limit, but how many people are running GPUs with that little VRAM anymore? 
 

it’s kind of like the whole PCIe bandwidth thing that likes to be brought up every now and then. Times have changed. Maybe it was a problem in the past, but no use bringing it up anymore if it’s not a problem anymore. (GPUGRID is the only project I’ve found that has a noticeable impact with PCIe bandwidth) 

_________________________________________________________________________

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 389685255
RAC: 443781

I do see the VRAM allocation

I do see the VRAM allocation (via nvidia-smi) ramp up in fairly small steps when an openCl GW task starts.  This morning I've had several consecutive "large", i.e. >3 GB, tasks run on my GTX 1060 (6 GB).  It takes about 20 seconds from the start of the task to top-out.   Just one example, probing nvidia-smi at roughly 2-second intervals, the VRAM usage (MB) went like: 8, 317, 439, 691, 819, 997, 1253, 1381, 1571, 1715, 1963, 3031, 3295.  Mostly steps of a few hundred MB, except for next to the last.  Suggests, to me, that the app IS managing multiple buffers and thus avoiding the 25% openCl constraint.  It also sheds light on why low-ram GPUs don't crash immediately when trying to run a WU that (eventually) needs more VRAM.  It isn't all requested in one chunk.

 

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 156
Credit: 2231287216
RAC: 615534

It not just requested in

It not just requested in steps - CPU part of code need to prepare data fist before loading it to GPU RAM.

Actual GPU computations start only after GPU RAM usage reaches maximum (all needed data prepared on CPU and loaded to GPU RAM)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.