too much GPU workunits received

Achim
Achim
Joined: 11 Feb 05
Posts: 3
Credit: 25695903
RAC: 0
Topic 198371

Hello,

GPU crunching is new for me - on a nvidia NV 310 (debian jessie) since december.
One WU shows expected time 15 hours and takes 15 hours. So I can compute about 7 WU in 14 days (8 hours per day). But computer receives much too much WU (> 50) until memory limit - there is no chance to compute them all in time.

I have seen, that computer downloads a WU, computes for a few seconds, than stops and orders a new WU until memory is full.

Can I set a limit anywhere?
Can You do anything against this?

Thank You
Achim

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 196961446
RAC: 202093

too much GPU workunits received

Hi,

What do you mean by "computes for a few seconds, than stops" is there a reason why the computation stops? There usually is one in the status column of the Advanced View or the Event Log (Meldungen).

From the server side it looks like your host is requesting work every minute as if it forgot to compute the already received work a minute before. The server is overestimating the actual runtime a bit but that is not a problem.

We need to find out why the task stops after a few seconds and then you will receive the right amount of tasks.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118579180905
RAC: 17636017

You have your computers

You have your computers hidden so it's not possible for those who might try to help to really understand what you are seeing. If you want the problem to be solved quickly, you should 'unhide' your computers (just temporarily if you wish) until the problem is solved. The other way to do this is to tell us the hostID of the host that is having problems. That way, we can look at your list of tasks for possible errors or any information in the returned results that might help us understand the problem.

If you keep them hidden, we will need to ask and you will need to spell out detailed answers to lots of questions. I see from your account page that you support a number of projects and for quite a long time. Here are a couple of questions first up for you to help us understand things.

1. When did you first start trying to crunch GPU tasks here?

2. Have you tried to crunch GPU tasks at Seti?

3. What settings do you use for the two values that control the size of your work cache?

4. Have you tried reducing these settings to see if that will stop new tasks from downloading?

5. Are you having similar problems with CPU tasks or is it just GPU tasks?

6. Do you have any successfully completed and returned GPU tasks?

7. Do any of the tasks that "stop" actually get a computation error when they "stop"?

8. If they do, have you looked (by clicking on the task ID on the website) at the information that is returned with the failed task?

9. Does each task that "stops" show high priority mode just before it stops?

10. What is the status of such tasks that are still visible in BOINC Manager?

11. What version of BOINC are you running?

If you see tasks running in "high priority", you should reduce your work cache settings to low values and "suspend" (through BOINC Manager) all recent tasks until you have a small enough number left that are not running in high priority. I've seen something like this happen (tasks running for a short while then suspending) with an older version of BOINC some years ago, so perhaps this might be the problem.

Unless you tell us lots of details, we can only guess.

Cheers,
Gary.

Achim
Achim
Joined: 11 Feb 05
Posts: 3
Credit: 25695903
RAC: 0

Hi, thanks for Your

Hi,

thanks for Your answers. I hope I did the "unhide" right. The hostid is 1700699.

Seti is enabled for GPU. But until now there was no GPU WU from Seti.

some logging here:
[08:08:52][2336][INFO ] Seed for random number generator is 1084507204.
[08:08:53][2336][ERROR] Error creating CUDA FFT plan (error code: 2)
[08:08:53][2336][ERROR] Demodulation failed (error: 1011)!
[08:08:53][2336][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!
------> Returning control to BOINC, delaying next attempt for at least 15 minutes...

With my CPU tasks there is no such problem.
Yes, there are successful completed GPU tasks.
GPU tasks always run in "high priority". I just changed, until 4 WU the state is active now (suspended the others).

Achim

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 196961446
RAC: 202093

The GPU only has 511MB VRAM

The GPU only has 511MB VRAM which is mainly used by some other program. So the BRP6 application needs to wait until enough memory is available.

This is what happens:
- BRP6 is started and there is not enough VRAM available so it tells the client to retry this task in 15 minutes
- the client tries to run another GPU task, which is not available, so it asks the server for a new one
- it gets another BRP6 task that runs into the same VRAM problem as the other ones
- this happened every minute until the disk space filled up

The problem is that the Client shouldn't ask for more work if there are tasks in temporary exit status. This would mean that the GPU would be idle but it can't be used until enough VRAM is available.

Can you please try to install a more recent version of BOINC? Like 7.6.22 (currently in Debian testing)?

Edit: I checked the source code and the 7.6.x Client has a fix for exactly this case. So you should upgrade to 7.6 and then abort most of the remaining tasks and see if new tasks are downloaded.

Achim
Achim
Joined: 11 Feb 05
Posts: 3
Credit: 25695903
RAC: 0

Hi Christian, many thanks

Hi Christian,

many thanks for this explaining answer.
I use and will use the stable version from debian. So I must wait for backporting boinc from stretch. Nice to see that a problem I have is already solved.

Thank You
Achim

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.