too much GPU workunits received

Achim

Joined: 11 Feb 05

Posts: 3

Credit: 25695903

RAC: 0

13 Jan 2016 13:46:01 UTC

Topic 198371

(moderation:

)

Hello,

GPU crunching is new for me - on a nvidia NV 310 (debian jessie) since december.
One WU shows expected time 15 hours and takes 15 hours. So I can compute about 7 WU in 14 days (8 hours per day). But computer receives much too much WU (> 50) until memory limit - there is no chance to compute them all in time.

I have seen, that computer downloads a WU, computes for a few seconds, than stops and orders a new WU until memory is full.

Can I set a limit anywhere?
Can You do anything against this?

Thank You
Achim

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 196961446

RAC: 202093

too much GPU workunits received

13 Jan 2016 16:23:36 UTC

Message 136178

(moderation:

)

Hi,

What do you mean by "computes for a few seconds, than stops" is there a reason why the computation stops? There usually is one in the status column of the Advanced View or the Event Log (Meldungen).

From the server side it looks like your host is requesting work every minute as if it forgot to compute the already received work a minute before. The server is overestimating the actual runtime a bit but that is not a problem.

We need to find out why the task stops after a few seconds and then you will receive the right amount of tasks.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118579180905

RAC: 17636017

You have your computers

14 Jan 2016 3:51:26 UTC

Message 136179

(moderation:

)

You have your computers hidden so it's not possible for those who might try to help to really understand what you are seeing. If you want the problem to be solved quickly, you should 'unhide' your computers (just temporarily if you wish) until the problem is solved. The other way to do this is to tell us the hostID of the host that is having problems. That way, we can look at your list of tasks for possible errors or any information in the returned results that might help us understand the problem.

If you keep them hidden, we will need to ask and you will need to spell out detailed answers to lots of questions. I see from your account page that you support a number of projects and for quite a long time. Here are a couple of questions first up for you to help us understand things.

1. When did you first start trying to crunch GPU tasks here?

2. Have you tried to crunch GPU tasks at Seti?

3. What settings do you use for the two values that control the size of your work cache?

4. Have you tried reducing these settings to see if that will stop new tasks from downloading?

5. Are you having similar problems with CPU tasks or is it just GPU tasks?

6. Do you have any successfully completed and returned GPU tasks?

7. Do any of the tasks that "stop" actually get a computation error when they "stop"?

8. If they do, have you looked (by clicking on the task ID on the website) at the information that is returned with the failed task?

9. Does each task that "stops" show high priority mode just before it stops?

10. What is the status of such tasks that are still visible in BOINC Manager?

11. What version of BOINC are you running?

If you see tasks running in "high priority", you should reduce your work cache settings to low values and "suspend" (through BOINC Manager) all recent tasks until you have a small enough number left that are not running in high priority. I've seen something like this happen (tasks running for a short while then suspending) with an older version of BOINC some years ago, so perhaps this might be the problem.

Unless you tell us lots of details, we can only guess.

Cheers,
Gary.

Achim

Joined: 11 Feb 05

Posts: 3

Credit: 25695903

RAC: 0

Hi, thanks for Your

14 Jan 2016 9:29:50 UTC

Message 136180 in response to message 136179

(moderation:

)

Hi,

thanks for Your answers. I hope I did the "unhide" right. The hostid is 1700699.

Seti is enabled for GPU. But until now there was no GPU WU from Seti.

some logging here:
[08:08:52][2336][INFO ] Seed for random number generator is 1084507204.
[08:08:53][2336][ERROR] Error creating CUDA FFT plan (error code: 2)
[08:08:53][2336][ERROR] Demodulation failed (error: 1011)!
[08:08:53][2336][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!
------> Returning control to BOINC, delaying next attempt for at least 15 minutes...

With my CPU tasks there is no such problem.
Yes, there are successful completed GPU tasks.
GPU tasks always run in "high priority". I just changed, until 4 WU the state is active now (suspended the others).

Achim

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 196961446

RAC: 202093

The GPU only has 511MB VRAM

14 Jan 2016 9:48:11 UTC

Message 136181

(moderation:

)

The GPU only has 511MB VRAM which is mainly used by some other program. So the BRP6 application needs to wait until enough memory is available.

This is what happens:
- BRP6 is started and there is not enough VRAM available so it tells the client to retry this task in 15 minutes
- the client tries to run another GPU task, which is not available, so it asks the server for a new one
- it gets another BRP6 task that runs into the same VRAM problem as the other ones
- this happened every minute until the disk space filled up

The problem is that the Client shouldn't ask for more work if there are tasks in temporary exit status. This would mean that the GPU would be idle but it can't be used until enough VRAM is available.

Can you please try to install a more recent version of BOINC? Like 7.6.22 (currently in Debian testing)?

Edit: I checked the source code and the 7.6.x Client has a fix for exactly this case. So you should upgrade to 7.6 and then abort most of the remaining tasks and see if new tasks are downloaded.

Achim

Joined: 11 Feb 05

Posts: 3

Credit: 25695903

RAC: 0

Hi Christian, many thanks

14 Jan 2016 14:23:31 UTC

Message 136182 in response to message 136181

(moderation:

)

Hi Christian,

many thanks for this explaining answer.
I use and will use the stable version from debian. So I must wait for backporting boinc from stretch. Nice to see that a problem I have is already solved.

Thank You
Achim

too much GPU workunits received

Forums › Problems and Bug Reports

too much GPU workunits received

You have your computers

Hi, thanks for Your

The GPU only has 511MB VRAM

Hi Christian, many thanks

Comment viewing options

Forums › Problems and Bug Reports