I have started getting a bunch of Computation Error's on some 2.09 G/W GPU work units on one of my rigs. I previously had a similar issue on another rig that had a GTX-760 GPU card with 2GB of memory. I had to opt out of running the G/W wok units on that rig due to the small memory size. Now I seem to be having the same issue on another rig that has a GTX 1050 Ti card with 4 GB of memory. Some work units run fine and some fail with a Computation Error. Am I correct in guessing that 4 GB of memory is now too small for some G/W GPU work units?
I have my app_config file set to run 2 work units concurrently on the GPU card with 1 CPU core per work unit. If I change this to 1 GPU core and 1 CPU core per work unit, would it help?
Thanks in advance for any assistance.
Copyright © 2024 Einstein@Home. All rights reserved.
Ron Kosinski wrote: I have
)
The tasks require 4gb of memory for alot of those tasks and that's why it's crashing, your GTX760 only has 2gb on onboard memory and you can't extend it with system memory. Suggest you put that pc in a different venue ie default, home, work or school and let it run different tasks.
mikey wrote: The tasks
)
Hi Mikey,
I already have the G/W work units stopped for the box with the GTX-760 card.
I am having a problem with the G/W work units on the box with the GTX-1050 Ti card. I did change my app_config file to run only one G/W work unit per card. Let's see if that solves the crashing.
Ron Kosinski wrote:.... If I
)
I'll let you answer that for yourself :-).
All you need to do is pick any of your recently failed tasks on the website and click on the TaskID link. Scroll down through all the stderr output you find there looking for the word "error". Here is the first occurrence with irrelevant stuff truncated:-
XLAL Error - XLALComputeECLFFT ... : Processing FFT failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
As the frequency term in the task name gets larger, this is likely to become a more common problem for people with older and more basic GPUs that (unfortunately) don't have enough memory for running multiple concurrent tasks. As the above example shows, if you check the stderr output immediately when you see tasks failing, it's pretty easy to diagnose for yourself, this particular cause of task failures.
Cheers,
Gary.
Gary Roberts wrote: All you
)
Gary, thanks for the info on what to look for in the output file. Switching back to running one 2.09 w/u per GPU card has solved the problem. I forgot I had recently changed the app_cofig file to run two w/u per card. Trying to run two w/u on a 4GB card is the same as trying to run one w/u on a 2 GB card. It just doesn't work sometimes! :-)
Mikey, Gary, again, thank you for the help!