Boinc and DCF with GPU tasks

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

RE: The AMD loses on each

Quote:
The AMD loses on each individual unit, but together it overwhelms the Intel, in RAC, with it's number of cores.


This is also project dependent. If you were to take those same two chips and put them on a project like PrimeGrid that takes advantage of the Intel AVX and/or AVX2 instruction set, the Intel would bury the AMD even with it's extra cores. AMD has not figured out how to correctly use the AVX instruction set, so it never gets used.

There are a few projects that utilize the AVX instruction set bonus. Of course this makes the chip run hotter.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118618150724
RAC: 18063643

RE: It's a HD7750

Quote:
It's a HD7750 ...


Sorry, I was lazy. BOINC lists it as a "7700 series" so that's what I should have called it rather than assuming it was a 7770.

Quote:
... and I'm running two BRP4G or four BRP5.


This is why the estimates are getting more screwed than they need to be. BOINC sees 35+ksecs for a BRP5 and doesn't take into account that you get 4 done in that time. BOINC is wanting to take the estimate for a BRP5 to almost 10 hours because that is all it sees. On the other hand BOINC sees a mere 7.8ksecs for a BRP4G (you get only two in that time). If you ran equal numbers of the two series, you would have better estimates for each one. They wouldn't be perfect but there'd be a lot less variation.

I'm actually quite surprised that you can fit 4 BRP5 tasks into 1GB of RAM. Do you actually get an improvement in output? I have a 1GB 7770 running 3x and the improvement over running 2x was quite small. I've never tried running 4x on any 1GB GPU. In the early days these tasks were supposed to need around 300MB each.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118618150724
RAC: 18063643

In this earlier post I

In this earlier post I said:

Quote:

The FX-6300 crunches 3 FGRP4 CPU tasks concurrently with 4 BRP5 GPU tasks. It has a RAC of around 60K. CPU tasks take more than 40ksecs and GPU tasks take over 19ksecs (for 4).

Really, the numbers speak for themselves. I know that I could somewhat improve the GPU crunch times, particularly for the FX-6300 host, if I further reduced the number of CPU tasks. For the i3, the improvement is marginal at best. I really want to participate in the FGRP search so I choose to sacrifice a little on BRP5 crunch times to get more FGRP4 done. In order to quantify the effect on the FX-6300, I've just reduced the CPU tasks to 2 so that there is now a free core for each GPU task. I'll let it run this way for a day or two to get some precise numbers. I might get lucky and the RAC might actually improve a little :-).


There are now a number of results under the new crunching arrangements for the FX-6300. Here is the comparison.

With 3 FGRP4 CPU tasks concurrently with 4 BRP5 GPU tasks, CPU tasks take more than 40ksecs and GPU tasks take over 19ksecs (3 CPU cores free for GPU support).

With 2 FGRP4 CPU tasks concurrently with 4 BRP5 GPU tasks, CPU tasks take around 36ksecs and GPU tasks take around 19ksecs (4 CPU cores free for GPU support).

There is a definite 1+ hour improvement in the CPU crunch time (but an overall loss of CPU task output) and essentially no change in the GPU crunch time (maybe 100 secs at most). Overall, this represents a loss in performance since both runs are important to me. I was obviously wrong to suppose that I might get a bit of an improvement in the GPU crunch time.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118618150724
RAC: 18063643

RE: The CPU load of GPU

Quote:
The CPU load of GPU tasks is also something to keep in mind. It's significantly higher on my FX-6300 than on the Intel, running the same type of tasks on the same GPU.


Yes, and this just highlights how poor the performance is compared with Intel. When I bought my FX-6300 I was hoping it would outperform an i3. Not only does it use much more power but it just can't keep up :-(.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118618150724
RAC: 18063643

RE: ... Running 5 times as

Quote:
... Running 5 times as many units at one time makes it a no brainer compared to the i3 dual core cpu when used in this way.


I'm puzzled at how you come up with the 5 times figure?? With an FX-6300, you have 6 cores. With an i3 you have 4 (virtual). So how do you get 5 times as many units?

Quote:
... The AMD loses on each individual unit, but together it overwhelms the Intel, in RAC, with it's number of cores.


You are mistaken if you think this. A Haswell i3 virtual core can process a FGRP4 task in around half the time taken by the 6300. So in the time taken to get 6 FGRP4 tasks from the 6300, you actually get more than that from the i3. I can't really see this as "overwhelms the Intel ..." :-). When you also consider the extra power needed for the 6300, how can you really say that?

Cheers,
Gary.

mikey
mikey
Joined: 22 Jan 05
Posts: 12820
Credit: 1881884328
RAC: 1095756

RE: RE: ... Running 5

Quote:
Quote:
... Running 5 times as many units at one time makes it a no brainer compared to the i3 dual core cpu when used in this way.

I'm puzzled at how you come up with the 5 times figure?? With an FX-6300, you have 6 cores. With an i3 you have 4 (virtual). So how do you get 5 times as many units?

I wasn't thinking of the Intel having HT, and I was leaving one core free for the gpu, so the dual core Intel would have 1 cpu core while the AMD would have 5 cpu cores. Hence 5 to 1.

Quote:
Quote:
... The AMD loses on each individual unit, but together it overwhelms the Intel, in RAC, with it's number of cores.

You are mistaken if you think this. A Haswell i3 virtual core can process a FGRP4 task in around half the time taken by the 6300. So in the time taken to get 6 FGRP4 tasks from the 6300, you actually get more than that from the i3. I can't really see this as "overwhelms the Intel ..." :-). When you also consider the extra power needed for the 6300, how can you really say that?

I don't consider the power needed in my calculations as I always over power my machines for future use, or in future machines. I only buy 750 or 850 watt psu's, gold ones lately, so have plenty of reserve power available. As for the time I again forgot about the HT aspect of the Intel.

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

RE: RE: ... and I'm

Quote:
Quote:
... and I'm running two BRP4G or four BRP5.

This is why the estimates are getting more screwed than they need to be.


Well, actually no. I really run two BRP4G OR four BRP5. Exclusive or, with the exception of my little experiment. During which I tried to ...

Quote:
If you ran equal numbers of the two series, you would have better estimates for each one. They wouldn't be perfect but there'd be a lot less variation.


... do exactly that and all the numbers I mentioned are from the time before any BRP5 task finished. Correct estimates for two BRP5 would have required a DCF of 2.2 which is still too far from the 1.4 for the same number of BRP4G. Running BPR5 one by one could have worked DCF-wise but that's really too inefficient so I decided to run BRP5 exclusively and now four of them. Interesting enough, BRP4G run times didn't change a bit during the transition when BOINC ran a mix of WUs. In my opinion this indicates two BRP5 not causing much more load than one BRP4G.

Quote:
I'm actually quite surprised that you can fit 4 BRP5 tasks into 1GB of RAM. Do you actually get an improvement in output?


There certainly is an improvement when I run a fourth GPU task on the Intel machine. On the AMD the difference is marginal. Slightly better with 4 tasks I'd think, but it's not clear enough to be sure. I'll switch over to 3 for a while and see what happens.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.