GPU workunit problems?

Dan

Joined: 23 Jun 10

Posts: 1

Credit: 300382

RAC: 0

I'm experiencing the same

30 Nov 2013 18:39:42 UTC

Message 118683

(moderation:

)

I'm experiencing the same phenomenon recently with my PC - this is only on Einstein GPU work units, not with other Boinc projects that also utilise the GPU.

I have a WU that has now been running for 47 hours on the GPU at 67% complete. Estimated time to completion has steadily increased over that time from 7 to 10 hours, and still it chugs away, seemingly no end in sight!

Processor is quad-core Phenom II x4
GPU is an AMD Radeon HD7850 2GB
8GB system RAM.
Win 7

Computing preferences set to
100% of processors
100% of CPU time

Looking at the activity of the GPU itself, the GPU cycles from running at about 30-50% of load for a few seconds, followed by another few seconds of seeming zero load. The GPU has an auto-throttle function on GPU and RAM clockspeed and it rarely enters full-power mode as a result. Perhaps this is part of the problem?

Mike.Gibson

Joined: 17 Dec 07

Posts: 21

Credit: 4474586

RAC: 6875

I have just updated my Intel

19 Jan 2014 0:30:27 UTC

Message 118684

(moderation:

)

I have just updated my Intel 4000 to Open CL 1.2 and got my first WUs. Unfortunately they are all coming up with computation errors.

Can anyone help, please?

Mike

Mike.Gibson

Joined: 17 Dec 07

Posts: 21

Credit: 4474586

RAC: 6875

I have just updated my Intel

19 Jan 2014 0:30:28 UTC

Message 118685

(moderation:

)

I have just updated my Intel 4000 to Open CL 1.2 and got my first WUs. Unfortunately they are all coming up with computation errors.

Can anyone help, please?

Mike

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3000438704

RAC: 699008

Host 7591557 shows the HD

19 Jan 2014 0:42:40 UTC

Message 118686 in response to message 118685

(moderation:

)

Host 7591557 shows the HD 4000 hardware, but makes no mention of OpenCL 1.2

What, exactly, did you install?

Mike.Gibson

Joined: 17 Dec 07

Posts: 21

Credit: 4474586

RAC: 6875

Thanks for the reply,

19 Jan 2014 2:11:39 UTC

Message 118687 in response to message 118686

(moderation:

)

Thanks for the reply, Richard.

I upgraded to the latest driver 10.18.10.3345 because I had been told in another thread that I needed to upgrade from OpenCL 1.1 to OpenCL 1.2 to get GPU WUs.

However, I have just seen postings on another thread which say that I should downgrade to version 9.18.10.3071

Mike

Werner

Joined: 13 Aug 10

Posts: 5

Credit: 8359175

RAC: 0

I've just noticed this

17 Feb 2014 22:01:15 UTC

Message 118688 in response to message 118673

(moderation:

)

I've just noticed this strange behaviour on my laptop, too (running on an i7 with a radeon 6850m gpu). my desktop box (phenom x4 + radeon 5770) seems to work correctly, though.

why do i have to keep one core free for gpu WU's to run and not just sit around?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119000941742

RAC: 24434902

RE: I've just noticed this

18 Feb 2014 0:07:49 UTC

Message 118689 in response to message 118688

(moderation:

)

Quote:

I've just noticed this strange behaviour on my laptop, too (running on an i7 with a radeon 6850m gpu). my desktop box (phenom x4 + radeon 5770) seems to work correctly, though.

Exactly what "strange behaviour" are you referring to? Your message references a message posted by Ageless back in September last year and he certainly wasn't reporting any 'strange behaviour' :-).

Quote:

why do i have to keep one core free for gpu WU's to run and not just sit around?

You don't have to - it's not compulsory - but with AMD GPUs, if you don't then GPU performance can suffer, especially if you wish to run more than one GPU task concurrently.

But I rather suspect from the phrase you used - "to run and not just sit around" - that you are talking about something different. I took a look at your list of tasks for that machine and you have just one only GPU task which you received on Feb 7. Has that task not even started crunching? If so, it may be that you don't have enough free GPU memory available - unless there's some other reason why it can't run (task suspended for example).

Try restarting BOINC to see if that makes any difference. In a reply here, copy and paste the BOINC startup messages from the event log - about the first 30-40 lines or so. It's quite likely that the reason it can't run may be listed there.

Also, on your "correctly working" machine, are you aware that it last made contact back on Feb 1 and that its cache of work has now expired? If your machine isn't running any more, it would be appreciated if you could abort and return the remaining tasks before shutting down, rather than just letting them time out. That way they can be reissued immediately rather than having to wait for the deadline.

Cheers,
Gary.

mikey

Joined: 22 Jan 05

Posts: 12853

Credit: 1884339703

RAC: 326769

RE: I've just noticed this

18 Feb 2014 12:51:40 UTC

Message 118690 in response to message 118688

(moderation:

)

Quote:

I've just noticed this strange behaviour on my laptop, too (running on an i7 with a radeon 6850m gpu). my desktop box (phenom x4 + radeon 5770) seems to work correctly, though.

why do i have to keep one core free for gpu WU's to run and not just sit around?

To expand on Gary's thoughts...when a gpu crunches it needs to load and off load info to and from itself, the only way that happens is thru the cpu, if the cpu is busy crunching on its own then the gpu will have to wait for the data to flow and the gpu is not crunching optimally. Gpu's can normally do about 10 times as much work in the same amount of time as a cpu can. Your AMD 5770 gpu has 800 tiny shaders built into it, if you think of each shader core as a teeny tiny cpu core and you can see how fast a gpu can process info and why keeping it processing data at full speed can be a good thing.

The default is to ALWAYS leave one free, but some projects are better at optimizing their workunits then others are for the different gpu's, go into the Boinc Manager and look under the Tasks tab. It will show the different running units and on the unit that is using your gpu it will say something like:
"running(0.314 CPUs + 1 ATI GPU(device 1))"

This means that my pc is crunching a gpu unit and is using almost 30% of a cpu core to keep it fed. In this case I AM keeping a cpu core free as trying to adjust a cpu core to only use 60+% for a cpu project is not within Boinc's settings right now. What I posted above also shows I am running multiple gpu's in that machine as a machine with only 1 gpu in it will not have the (device 1) part at the end.

The way to tell if you need to leave a cpu core free is too take a look at that cpu % number, if it is 0.1 or more then leave a cpu core free, if it is 0.01 or less then the gpu can pretty much handle everything on its own with little to no cpu assistance and you can safely crunch cpu units using all your cores.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: ...it will say

18 Feb 2014 19:51:33 UTC

Message 118691 in response to message 118690

(moderation:

)

Quote:

...it will say something like:
"running(0.314 CPUs + 1 ATI GPU(device 1))"

This means that my pc is crunching a gpu unit and is using almost 30% of a cpu core to keep it fed. ...

This statement is wrong, the line saying "running(0.314 CPUs + 1 ATI GPU(device 1))" is telling you how Boinc is instructed to schedule the resources. In this example it tells Boinc (and in turn Boinc tells the user) to account for 0.313 CPU and 1 full ATI GPU for this task. With this info and the same type of info about the other tasks in the queue and knowing how many CPU cores and GPUs are available to use Boinc can make a decision on how many tasks to start.

Every GPU task will always use as much of the CPU as it needs, nothing more nothing less. Once the task is running it's up to the operating system to give each running program the time it needs given it's priority setting among other things. The problem with stalling or extremely slow running GPU tasks when not "reserving" a CPU core is a driver and OS thing, simply put the CPU part of the program does not respond fast enough to keep the GPU loaded if all cores are already running flat out on other tasks.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119000941742

RAC: 24434902

RE: The default is to

19 Feb 2014 6:22:25 UTC

Message 118692 in response to message 118690

(moderation:

)

Quote:

The default is to ALWAYS leave one free ...

No, that's not correct. The default is to use all cores unless instructed otherwise.

As Holmis points out, BOINC is advised of the recommended resource requirements for crunching a GPU task. BOINC notes the recommendation but will not leave a core free unless the total CPU resource recommendation for all running GPU tasks adds up to at least 1.0 CPUs.

The recommendation is designed to be a good starting point but it may not suit all the different possible hardware combinations. I have several hosts with single Kepler series (nvidia) GPUs which run 2 or 3 concurrent tasks on the GPU and CPU tasks on all cores. If I stop a CPU task (ie free up a CPU core through preferences) it makes negligible difference to the GPU crunching speed. It does slightly speed up the remaining CPU tasks but there is an overall loss of output because of the loss of the extra core.

The only way to achieve the best efficiency for a particular hardware combination is to experiment. If a system seems to be running GPU tasks inefficiently, freeing up a CPU core (through preferences) and measuring the result is a good experiment to try. There will be an optimal configuration of the number of concurrent GPU tasks and the number of free CPU cores which will vary with different hardware and is not easy to predict without experiment.

Quote:

... as trying to adjust a cpu core to only use 60+% for a cpu project is not within Boinc's settings right now.

It's the OS and not BOINC that allocates CPU cycles to tasks. If some fraction of a CPU core (less than a full core) is recommended for supporting GPU tasks, BOINC will still start a task for that core and it's then up to the OS to allocate to that task whatever cycles are available. Whether or not this is an efficient way to crunch is a different matter. The only way to find the most efficient configuration is to experiment.

Quote:

The way to tell if you need to leave a cpu core free is too take a look at that cpu % number, if it is 0.1 or more then leave a cpu core free, if it is 0.01 or less then the gpu can pretty much handle everything on its own with little to no cpu assistance and you can safely crunch cpu units using all your cores.

It's quite wrong to make a blanket statement like this. There are no such hard and fast rules here at Einstein. Where do you get these numbers from?

Cheers,
Gary.

GPU workunit problems?

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports