GPU is supposed to complete tasks much faster than CPU.
Indeed the GPU version is just fantastic when it comes to speed: 10 min or so as opposed to 10 hours or so with CPU. However, if you have good GPUs but a rather slow or interrupted internet connection the you are almost lost. That is because there's significantly more to be uploaded than just a little result file of candidates that we had for FGRPB tasks. Hm, if no improvement is in sight, I'll have to convince my internet provider of something ... :-)
...Indeed the GPU version is just fantastic when it comes to speed: 10 min or so as opposed to 10 hours or so with CPU...
My experience, running one task at a time on RX580 is that they take 13 to 20 minutes to complete. And the CPU takes 8 to 12 h per task, but it runs 8 tasks in parallell - while also using one thread to support the GPU task. So it's maybe a 1:5 relation in throughput. Not great compared to other GPU applications, it seems that this type of work is difficult to code efficiently for GPU. Difficult to break down and parallellize into smaller work units.
My experience, running one task at a time on RX580 is that they take 13 to 20 minutes to complete. And the CPU takes 8 to 12 h per task, but it runs 8 tasks in parallell - while also using one thread to support the GPU task. So it's maybe a 1:5 relation in throughput. Not great compared to other GPU applications, it seems that this type of work is difficult to code efficiently for GPU. Difficult to break down and parallellize into smaller work units.
Substantial gains can be had at higher task multiplicities for GW v2.02 GPU tasks. My RX 570s running 4x concurrent tasks average 6 minutes per task, with a range of 5.3 min to 8 min (from 40 recent validated tasks). These times were with no CPU tasks running, so that may contribute somewhat to faster times. With 888 valid GPU tasks, I've had no invalids at 4x multiplicity.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
My RX 570s running 4x concurrent tasks average 6 minutes per task, with a range of 5.3 min to 8 min (from 40 recent validated tasks).
You are quite correct and Rolf seems to be ignoring the very real benefit from concurrent GPU tasks. 4x seems to be about the optimum for RX 570s. To overcome the problem of variability of crunch times (tasks seem to have variable 'work content' which causes this), I've decided to use the project 'properties' button in BOINC Manager to record the number of successfully completed tasks at a particular time on a daily basis. It's early days but I'm tending to see a fairly constant number of completed tasks per 24hr period.
I recently built a couple of systems based on a Ryzen 5 2600 CPU (6C/12T) with a single RX 570 GPU in each and running GPU tasks only. For the one running at 4x, the first full day produced 252 tasks completed and a further 254 on the second full day. Based on those figures, my average crunch time would be just under 6 mins - so very much in agreement with what you saw for yours.
To determine the 'benefit' from running tasks on a GPU, as opposed to just using CPU cores or threads, it would be more appropriate to not mix the two measurements. Each device should be tested separately and configured for best performance of that device. An RX 570 can produce 250 results per day. Rolf's 8C/16T machine would probably be able to crunch around 20 CPU tasks per day using full cores and perhaps that could be increased to 30 or so if he used all threads. Those numbers are only guesses - they would need to be measured properly. Even so, it's fairly apparent that there is a larger benefit than what Rolf is suggesting, particularly if you took into account the extra heat and power use that would arise from running the full 16 threads. My new machines, with the CPU cores just servicing the GPU, are running nice and cool :-).
To my way of thinking, now that we seem to have a reliable GPU app, using a decent GPU certainly seems to be the best way to go. I'm also trialing some i3-9100F based systems (4C/4T), again with an RX 570. These are not quite as good as the Ryzen. Now that the test machine with that CPU has GW tasks again, I'll be able to get some daily production figures in a few days. There's quite a few FGRPB1G to get rid of first :-).
I've now had 3 full days for both of the new Ryzen 2600 / RX 570 hosts. The one running at 4x has produced 252, 254, 255 completed results for the 3 successive days. I've had the other host running at 5x. Everything works OK and initially I thought I was going to see a small further increase. However the 3 values for daily output are 240, 243, 243 respectively. So 4x really does look like the sweet spot for that set of hardware.
I've now put that host back to 3x. I want to gather data over a longer period to be sure of the advantage of running at 4x. Both machines have lots of pendings but neither has any errors or invalid results so far.
Gary and Cecht thanks for the information. You have given me the information on what to upgrade my GTX1060 to. I have a host on another project which would be very happy to have a 2nd GTX1060.
Looks like there's v2.03 (beta test) available for Nvidia now.
By the way, have you people had great success with running GW tasks 3x or even 4x with AMD RX 570/580 in Windows environment ?
I have read that configuration runs well with Linux. I tried 3x yesterday with three of my Windows systems and they don't like to work well even with that. There were some succesful validations but invalids clearly started accumulating. 2x had worked perfectly. There's no problem with heat and I'm not running any CPU tasks, but so far it seems 2x is the maximum good for my systems. Maybe newer system platforms would work better with 3x or 4x. I would be interested to see what kind of systems people run if they haven't had validation problems with that configuration in Windows.
>By the way, have you people had great success with running GW tasks 3x or
> even 4x with AMD RX 570/580 in Windows environment ?
With a Radeon VII under Windows I can run one or two WUs successfully at the same time. But very slowly.
Running three or more WUs (up to six are possible with decreasing runtime) I only get "validation inconclusive", which usually ends in "invalid"...
There were some succesful validations but invalids clearly started accumulating.
Thanks for raising this point. I ran some Vega tests in W10 recently. With GW app 2.02 it ended up with well above 50% invalids. That was mostly x4. I recall the dev recommending x1, but then the load is so low that the output is too low when compared to FGRP.
I don't know what's to blame here, is it Windows as always, is it the driver as often, is it my fondness for undervolting, is there an issue with the hardware, or can a screw for the app be found so that we can hope for an updated version?
Seeing it is not Beta anymore I'm not happy, it still uses only 50% of the AMD RX 580 on macOS that means zero improvement on the efficiency.
So it takes 1:30 - 2 hours to finish a WU and then I get 1000 credits, the FGRP tasks finish in 15 minutes and give 3400 credits. Meaning I get 20 times the credit compared the the GW app, something is terribly off here.
In addition the scheduler originally estimated 8 minutes per task so it downloaded 600 at once... in my opinion this is still beta and way to go before making it public. Especially because CPU crunching is more efficient at this point so I don't see the point in running GW tasks on the GPU at all. Unless it's just a bad credit system and the Workunits on for the GPU are in reality much larger than the CPU tasks, but then please balance the given credits.
Richie schrieb: GPU is
)
Indeed the GPU version is just fantastic when it comes to speed: 10 min or so as opposed to 10 hours or so with CPU. However, if you have good GPUs but a rather slow or interrupted internet connection the you are almost lost. That is because there's significantly more to be uploaded than just a little result file of candidates that we had for FGRPB tasks. Hm, if no improvement is in sight, I'll have to convince my internet provider of something ... :-)
solling2 wrote:...Indeed the
)
My experience, running one task at a time on RX580 is that they take 13 to 20 minutes to complete. And the CPU takes 8 to 12 h per task, but it runs 8 tasks in parallell - while also using one thread to support the GPU task. So it's maybe a 1:5 relation in throughput. Not great compared to other GPU applications, it seems that this type of work is difficult to code efficiently for GPU. Difficult to break down and parallellize into smaller work units.
Rolf wrote:My experience,
)
Substantial gains can be had at higher task multiplicities for GW v2.02 GPU tasks. My RX 570s running 4x concurrent tasks average 6 minutes per task, with a range of 5.3 min to 8 min (from 40 recent validated tasks). These times were with no CPU tasks running, so that may contribute somewhat to faster times. With 888 valid GPU tasks, I've had no invalids at 4x multiplicity.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
cecht wrote:My RX 570s
)
You are quite correct and Rolf seems to be ignoring the very real benefit from concurrent GPU tasks. 4x seems to be about the optimum for RX 570s. To overcome the problem of variability of crunch times (tasks seem to have variable 'work content' which causes this), I've decided to use the project 'properties' button in BOINC Manager to record the number of successfully completed tasks at a particular time on a daily basis. It's early days but I'm tending to see a fairly constant number of completed tasks per 24hr period.
I recently built a couple of systems based on a Ryzen 5 2600 CPU (6C/12T) with a single RX 570 GPU in each and running GPU tasks only. For the one running at 4x, the first full day produced 252 tasks completed and a further 254 on the second full day. Based on those figures, my average crunch time would be just under 6 mins - so very much in agreement with what you saw for yours.
To determine the 'benefit' from running tasks on a GPU, as opposed to just using CPU cores or threads, it would be more appropriate to not mix the two measurements. Each device should be tested separately and configured for best performance of that device. An RX 570 can produce 250 results per day. Rolf's 8C/16T machine would probably be able to crunch around 20 CPU tasks per day using full cores and perhaps that could be increased to 30 or so if he used all threads. Those numbers are only guesses - they would need to be measured properly. Even so, it's fairly apparent that there is a larger benefit than what Rolf is suggesting, particularly if you took into account the extra heat and power use that would arise from running the full 16 threads. My new machines, with the CPU cores just servicing the GPU, are running nice and cool :-).
To my way of thinking, now that we seem to have a reliable GPU app, using a decent GPU certainly seems to be the best way to go. I'm also trialing some i3-9100F based systems (4C/4T), again with an RX 570. These are not quite as good as the Ryzen. Now that the test machine with that CPU has GW tasks again, I'll be able to get some daily production figures in a few days. There's quite a few FGRPB1G to get rid of first :-).
Cheers,
Gary.
I've now had 3 full days for
)
I've now had 3 full days for both of the new Ryzen 2600 / RX 570 hosts. The one running at 4x has produced 252, 254, 255 completed results for the 3 successive days. I've had the other host running at 5x. Everything works OK and initially I thought I was going to see a small further increase. However the 3 values for daily output are 240, 243, 243 respectively. So 4x really does look like the sweet spot for that set of hardware.
I've now put that host back to 3x. I want to gather data over a longer period to be sure of the advantage of running at 4x. Both machines have lots of pendings but neither has any errors or invalid results so far.
Cheers,
Gary.
Gary and Cecht thanks for the
)
Gary and Cecht thanks for the information. You have given me the information on what to upgrade my GTX1060 to. I have a host on another project which would be very happy to have a 2nd GTX1060.
Looks like there's v2.03
)
Looks like there's v2.03 (beta test) available for Nvidia now.
By the way, have you people had great success with running GW tasks 3x or even 4x with AMD RX 570/580 in Windows environment ?
I have read that configuration runs well with Linux. I tried 3x yesterday with three of my Windows systems and they don't like to work well even with that. There were some succesful validations but invalids clearly started accumulating. 2x had worked perfectly. There's no problem with heat and I'm not running any CPU tasks, but so far it seems 2x is the maximum good for my systems. Maybe newer system platforms would work better with 3x or 4x. I would be interested to see what kind of systems people run if they haven't had validation problems with that configuration in Windows.
>By the way, have you people
)
>By the way, have you people had great success with running GW tasks 3x or
> even 4x with AMD RX 570/580 in Windows environment ?
With a Radeon VII under Windows I can run one or two WUs successfully at the same time. But very slowly.
Running three or more WUs (up to six are possible with decreasing runtime) I only get "validation inconclusive", which usually ends in "invalid"...
Richie schrieb:There were
)
Thanks for raising this point. I ran some Vega tests in W10 recently. With GW app 2.02 it ended up with well above 50% invalids. That was mostly x4. I recall the dev recommending x1, but then the load is so low that the output is too low when compared to FGRP.
I don't know what's to blame here, is it Windows as always, is it the driver as often, is it my fondness for undervolting, is there an issue with the hardware, or can a screw for the app be found so that we can hope for an updated version?
Seeing it is not Beta anymore
)
Seeing it is not Beta anymore I'm not happy, it still uses only 50% of the AMD RX 580 on macOS that means zero improvement on the efficiency.
So it takes 1:30 - 2 hours to finish a WU and then I get 1000 credits, the FGRP tasks finish in 15 minutes and give 3400 credits. Meaning I get 20 times the credit compared the the GW app, something is terribly off here.
In addition the scheduler originally estimated 8 minutes per task so it downloaded 600 at once... in my opinion this is still beta and way to go before making it public. Especially because CPU crunching is more efficient at this point so I don't see the point in running GW tasks on the GPU at all. Unless it's just a bad credit system and the Workunits on for the GPU are in reality much larger than the CPU tasks, but then please balance the given credits.