Running multiple WUs on Ryzen 1800x paired with rx570?

wolfman1360

Joined: 17 Feb 17

Posts: 19

Credit: 33664141

RAC: 0

6 Sep 2019 19:35:40 UTC

Topic 219534

(moderation:

)

Hello!

I'm attempting to optimize this host to be as productive as humanly (or not so humanly) possible.

https://einsteinathome.org/host/12775393

Is it worth it to run 2 (or more) GPU tasks at once? Can I get away with running it with a single core (or thread) leaving 15 free for CPU work? What would be the correct way to go about this if so?

Right now I have boinc set to run at 93% CPU usage. Average runtime on the GPU appears to be 11-12 minutes.

I've been reading quite a lot on these forums and the multiple WUs seem to apply to mainly nvidia cards. I have a gtx 1080, but I have also heard that AMD cards on this project give superior output.

Any help appreciated!

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119498305138

RAC: 25381700

wolfman1360 wrote:... Is it

6 Sep 2019 23:14:29 UTC

Message 173163

(moderation:

)

wolfman1360 wrote:

... Is it worth it to run 2 (or more) GPU tasks at once?

For the machine you point to and for the tasks you currently run, running 2 concurrent tasks should give around 7-10% performance improvement. I would think that would be pretty close to the optimal condition. You will probably use slightly more power and produce a little extra heat.

wolfman1360 wrote:

Can I get away with running it with a single core (or thread) leaving 15 free for CPU work?

That machine has 8 cores / 16 threads. If you run 15 CPU tasks in addition to 2 GPU tasks, you may find that the GPU tasks run slower so that you are worse off. The only way to know would be to make a series of measurements and check for yourself. A lot would depend on how compute intensive the CPU tasks are and it could be that running less CPU tasks actually increases overall output. You really need to test this if you intend to push things that hard.

wolfman1360 wrote:

What would be the correct way to go about this if so?

You have two choices. The first is a project setting called GPU utilization factor which makes it easy to run concurrent GPU tasks. This should automatically 'reserve' a CPU thread for each GPU task. You can further reduce CPU usage with the % of cores BOINC is allowed to use.

The second choice is to consider the BOINC supplied mechanism for Project level configuration. It's a little more complicated and requires that you set up and customise a configuration file called app_config.xml. Please check the documentation.

wolfman1360 wrote:

I've been reading quite a lot on these forums and the multiple WUs seem to apply to mainly nvidia cards. I have a gtx 1080, but I have also heard that AMD cards on this project give superior output.

As you frame them, both those points are not really fully correct :-).

Running multiple concurrent GPU tasks is more to do with how powerful the GPU is rather than its brand.
AMD cards don't necessarily produce superior output. It tends to be more likely that you could find an AMD card that will give you a better output for total cost - which includes both capital and running costs. This is something that can quite easily change over time and with different projects/applications so you need to be paying attention :-).

Also, I'm not sure why you mention a GTX 1080 when the link you provided shows that the GPU is an RX 570. The card you actually have is a pretty decent performer at the moment.

Cheers,
Gary.

wolfman1360

Joined: 17 Feb 17

Posts: 19

Credit: 33664141

RAC: 0

I'm going to attempt to

7 Sep 2019 2:22:39 UTC

Message 173164 in response to message 173163

(moderation:

)

I'm going to attempt to answer as best I can here.

Gary Roberts wrote:

wolfman1360 wrote:
... Is it worth it to run 2 (or more) GPU tasks at once?
For the machine you point to and for the tasks you currently run, running 2 concurrent tasks should give around 7-10% performance improvement. I would think that would be pretty close to the optimal condition. You will probably use slightly more power and produce a little extra heat.

This works for me, though on further inspection, I didn't notice I only had FGRP selected. I have since changed this.

Gary Roberts wrote:

wolfman1360 wrote:
Can I get away with running it with a single core (or thread) leaving 15 free for CPU work?
That machine has 8 cores / 16 threads. If you run 15 CPU tasks in addition to 2 GPU tasks, you may find that the GPU tasks run slower so that you are worse off. The only way to know would be to make a series of measurements and check for yourself. A lot would depend on how compute intensive the CPU tasks are and it could be that running less CPU tasks actually increases overall output. You really need to test this if you intend to push things that hard.

I expected this. I'm still getting used to different projects. For instance, Collatz, while still utilizing 100% GPU, uses far less GPU memory and seems to produce more heat - right now the GPU is right around 70 - collatz can make this creep up to 80. I guess my question is despite the GPU being fully utilized with this particular subset of apps (GRP), does running 2 at once actually increase productivity. So far, you seem to be right on the money with 2 running, and overall productivity will increase by 10%, though I will keep an eye out for any errors or invalids.

Gary Roberts wrote:

wolfman1360 wrote:
What would be the correct way to go about this if so?
You have two choices. The first is a project setting called GPU utilization factor which makes it easy to run concurrent GPU tasks. This should automatically 'reserve' a CPU thread for each GPU task. You can further reduce CPU usage with the % of cores BOINC is allowed to use.

The second choice is to consider the BOINC supplied mechanism for Project level configuration. It's a little more complicated and requires that you set up and customise a configuration file called app_config.xml. Please check the documentation.

I was reading that earlier, too, and I think I have that figured out. I need to read more into the application names to play with this - though I don't think I'm going to get too invested so long as my CPU isn't bottlenecking the GPU by having too much going on at once. Does AMD handle this any different than Intel?

Gary Roberts wrote:

wolfman1360 wrote:
I've been reading quite a lot on these forums and the multiple WUs seem to apply to mainly nvidia cards. I have a gtx 1080, but I have also heard that AMD cards on this project give superior output.
As you frame them, both those points are not really fully correct :-).

Running multiple concurrent GPU tasks is more to do with how powerful the GPU is rather than its brand.

AMD cards don't necessarily produce superior output. It tends to be more likely that you could find an AMD card that will give you a better output for total cost - which includes both capital and running costs. This is something that can quite easily change over time and with different projects/applications so you need to be paying attention :-).

Also, I'm not sure why you mention a GTX 1080 when the link you provided shows that the GPU is an RX 570. The card you actually have is a pretty decent performer at the moment.

My apologies. I don't have the 1080 attached here currently - though I do plan on doing just that in the very near future. I do know some projects seem to, forgive the perhaps improper wording, but be better optimized for specific parts of a GPU? I'm not sure how much better I can explain that. For instance, Einstein seems to utilize the GPUs memory more than other projects. Is that making any sort of sense? I'm probably overcomplicating things over here.

I'll be doing a bit more investigation into this, as I am assuming each machine is different. Minus the CPU threads query, my ultimate question is what applications are recommended, with these two GPUs in particular, to run multiples of? Right now, running 2 at once, I'm seeing usage fluctuate quite drastically from 60-100% - and thus power draw from 61 to 130 Watts. However memory usage on the card seems to be holding right around 1,850-1,900 MB which is a bit more than running one and the CPU is 81% used, since I had assumed 2 GPU wus meant I had to manually set processor usage in boinc. Based on this, should I try increasing to 3 or go back to 1? Clock speeds are very stable too.

Edit: Right now Boinc sees it is crunching 2 WUs at once, though claiming 1 CPU and 1 AMD GPU for both, however when I set project preferences to a GPU utilization factor of 0.5, it is now claiming, on newly downloaded tasks that haven't started yet, 1 CPU and 0.5 AMD GPUs. I'm going to tell Boinc not to fetch any more work and let it run through this pile so I can get a fresh start tomorrow and really make sure this is working as expected.

On a side note, however, browsing these forums, I think I'll be here a while. I like places with active communities, active developers and administrators and enthusiasm for what they do, so sorry if I'm asking questions that are clearly answered elsewhere and I just haven't found them yet. In the meantime, I'll thoroughly enjoy reading the current discussions and findings.

thanks again

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119498305138

RAC: 25381700

wolfman1360 wrote:... I

8 Sep 2019 2:20:54 UTC

Message 173190 in response to message 173164

(moderation:

)

wolfman1360 wrote:

... I didn't notice I only had FGRP selected. I have since changed this.

The O2AS GPU tasks are test tasks and (hopefully) you shouldn't get any unless you have also changed settings to allow beta test tasks as well. I would strongly advise against doing that. The two searches are unlikely to play well together and there are unresolved problems with the test app. Until you have a much better understanding of the full implications and until the new app is working much more reliably, you should really stay out of the GW GPU app.

wolfman1360 wrote:

... right now the GPU is right around 70 - collatz can make this creep up to 80. I guess my question is despite the GPU being fully utilized with this particular subset of apps (GRP), does running 2 at once actually increase productivity. So far, you seem to be right on the money with 2 running, and overall productivity will increase by 10%, though I will keep an eye out for any errors or invalids.

All I can do is tell you exactly what my experience has taught me. That 7-10% figure is based on the behaviour of a large number of hosts running 2x on various models of AMD GPUs. My RX 570s all show close to 10% improvement.

When you first posted, you didn't mention anything about other projects. Because you weren't running any Einstein CPU tasks (only GPU) I wondered at the time if your intention was perhaps to use Einstein for the GPU stuff and other projects for all the CPU threads. It didn't occur to me that you might be trying to run other projects on the GPU as well. It's difficult to give reliable advice if the full picture isn't provided.

I have no experience with running other GPU projects along with Einstein. Around 2010, I started running Milkyway only on a group of machines with HD 4850 GPUs. I've never tried sharing a GPU between projects. When Milkyway stopped providing tasks that could run on a HD 4850, I stopped running Milkyway. My gut feeling is that it can be a source of anguish trying to get different science applications sharing the one set of hardware. For that reason, I tend to stick to the project I'm most interested in. If I want to try a new project, I'll tend to dedicate a machine solely to it until I learn how it behaves.

wolfman1360 wrote:

... I do know some projects seem to, forgive the perhaps improper wording, but be better optimized for specific parts of a GPU? I'm not sure how much better I can explain that. For instance, Einstein seems to utilize the GPUs memory more than other projects. Is that making any sort of sense? I'm probably overcomplicating things over here.

Why do you worry about these things? You can't really change the behaviour of any specific science app. You would hope that the programmer is sufficiently skilled to have written code that best makes use of the hardware. All you can do is tweak your settings to optimise the output of the given code. You just need to decide what you want to support - hopefully based on the importance (as you see it) of the project's potential contribution to science. Once you decide that, you could always suggest improvements to the code if you think there are glaring deficiencies.

wolfman1360 wrote:

... my ultimate question is what applications are recommended, with these two GPUs in particular, to run multiples of?

Nothing is "recommended" - there are going to be two separate science objectives and it's up to you to choose what you want to support. However, based on what your various comments suggest, you seem to want efficient, well behaved, maximised output. Unless you want to help with testing and have no problem with seeing inefficient use of resources and perhaps lots of problems with valid results, the only Einstein GPU app that 'fits your bill' is FGRPB1G. It's suitable for both GPUs you mention at 2x. For the GTX 1080, you may get a further small improvement at 3x or even 4x - I have no experience - but you would need to verify that for yourself through experiment. nVidia GPUs really need a full CPU thread for each concurrent GPU task. The CPU support needed for AMD GPUs is significantly less and you may have no problem with just one thread supporting both tasks but once again, you need to test for your particular setup.

There are various ways of thinking about optimal performance. For me, I want to support Einstein and I'd like to maximise output but not at the expense of excess power use and heat. Potentially GPUs will achieve that so I don't run any CPU tasks on hosts that have a crunching GPU - virtually all hosts these days. This neatly side-steps any issue around having adequate CPU support for whatever the number of GPU tasks being crunched. I've essentially doubled my output for the same sort of power input.

My ultimate aim is to run the GW GPU app when it's ready for 'production'. I've done a limited amount of testing - with a high fraction of invalid results so far - so all my hosts are now back running FGRPB1G until there is something new to test. It might be quite a while before there is a version that is both efficient and gives reliable results. However, in the end, the first detection of the continuous GW emissions that must be there is the 'holy grail' this project seeks for, and to be a small part of that is what motivates me. Reliability is far more important to me than efficiency so it won't bother me if the new app doesn't make fully efficient use of the GPU.

Already it's more 'efficient' to crunch an O2AS task on a GPU rather than on a CPU. I'm just waiting to see what's needed for it to become reliable. It's just the 'nature of the beast' that the search algorithm will have fluctuations in how intensively the hardware is used. Some parts of each computational 'loop' are more easily 'parallelized' than others and probably some things can't be parallelized at all. I'm sure the Devs are doing their best to achieve the best use possible.

wolfman1360 wrote:

... I had assumed 2 GPU wus meant I had to manually set processor usage in boinc. Based on this, should I try increasing to 3 or go back to 1?

If you are using the GPU utilization factor method to run multiple GPU tasks, that automatically 'reserves' (or budgets for) a full CPU thread to 'support' each running GPU task. You don't need to do this separately by using the setting for % of cores BOINC is allowed to use. So, as an example, if you had a 16 thread processor and a GPU running FGRPB1G tasks 2x, BOINC would 'see' just 14 threads (16-2) available for CPU tasks with 2 threads 'reserved' for GPU support.

If you also set the % of cores BOINC is allowed to use to less than 100%, you may further reduce the number of CPU tasks that BOINC is allowed to run. For example, if you set 50%, I believe the 14 from above would be reduced to 7. It gets more murky if the % doesn't give an integer number of cores. I could easily be wrong (since I don't remember ever properly testing this) but if you were to set the 93% you mentioned previously with 2 GPU tasks running, the calculation would give 14x0.93=13.02. It may be that this would be rounded down to 13 as the maximum number of CPU tasks that could be running but I have this sneaking suspicion that BOINC would allow the 14th CPU task just because it can see a tiny fraction of a core not budgeted for and therefore available to be used for that 14th task. I could easily be wrong.

wolfman1360 wrote:

Edit: Right now Boinc sees it is crunching 2 WUs at once, though claiming 1 CPU and 1 AMD GPU for both, however when I set project preferences to a GPU utilization factor of 0.5, it is now claiming, on newly downloaded tasks that haven't started yet, 1 CPU and 0.5 AMD GPUs. I'm going to tell Boinc not to fetch any more work and let it run through this pile so I can get a fresh start tomorrow and really make sure this is working as expected.

I think you're just describing standard behaviour. By default your first tasks were 'branded' as requiring 1 CPU + 1 GPU. Once issued, that 'branding' doesn't change but the crunching behaviour easily can. If you change the GPU utilization factor to 0.5, all new tasks will come with the new branding but the branding of the 'old' tasks won't be changed. Your BOINC client will respect the new branding (for all tasks including the old ones) just as soon as it is advised of the change by the downloading of new tasks. The downloading of new work is the only way the client is advised of the change. Updating the project (without getting new work) will not do it.

Cheers,
Gary.

wolfman1360

Joined: 17 Feb 17

Posts: 19

Credit: 33664141

RAC: 0

Hi Gary, Thank you for all

8 Sep 2019 4:13:50 UTC

Message 173192 in response to message 173190

(moderation:

)

Hi Gary,

Thank you for all of that - it is very helpful. I should add that, despite my account being years old, I have been a holdout over at WCG for 2.5 years and am only now really extensively exploring other avenues that interest me, especially on the GPU side.

I have made sure beta testing is disabled - and have now set Einstein as the one and only project running on the GPU. So far it seems to be stable running 2 concurrently at right around 20 minutes for 2 workunits.

I'm not sure why I worry about these things, to be honest. I am not much for numbers, however anything to do with space (I am a huge Star Trek fan) has always fascinated me. So this and Milkyway have caught my attention quite a lot. Combine that with the great community and I'm hooked, since having discussion to go along with the science and tasks only adds to the enjoyment, and you really do feel like a part of a community.

I do have an AMD HD6450 that runs a GRP WU in about 9 hours. Is this about what I should expect?

I do not have a problem with inefficient WUs or running tests that may not be valid, however I'd like to make sure I'm running the stable ones optimally and without issue so I can properly help beta test when I feel up to that. I'll also be doing a lot of reading to see what folks tend to do during these betas.

Regarding productivity and optimal performance, a few years ago, when I mainly had good CPUs but no GPUs worth mentioning, I was driven away quite extensively from projects that do both CPU and GPU crunching by a few negative forum posts at a few projects regarding my mostly, back then at least, CPU based hardware. I took them far too much to heart and regret it.

The 1080 is a very new acquisition for me. Despite, at times, still feeling like a small fish in a huge pond, I decided I'd start leveraging that GPU power for things that are important to me. Science is science and any way I can help it progress and move forward is important.

Thank you for your very well written explanations and clearing some things up.

Right now this machine is crunching specifically FGRPB1 on the GPU and, for right now, WCG on the CPU and all seems to be going well with no issues.

thank you again.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119498305138

RAC: 25381700

wolfman1360 wrote:... So far

8 Sep 2019 5:56:01 UTC

Message 173193 in response to message 173192

(moderation:

)

wolfman1360 wrote:

... So far it seems to be stable running 2 concurrently at right around 20 minutes for 2 workunits.

That's the sort of figure that I see on mine as well. Looks good!

wolfman1360 wrote:

I do have an AMD HD6450 that runs a GRP WU in about 9 hours. Is this about what I should expect?

That's an old and low end GPU that is not really suitable for crunching these days. That time seems very slow but it's probably what you would expect. I have nothing to compare it with. My oldest GPU is a HD 7770 from around 2011 still running current tasks successfully. It does tasks singly in about 36 mins. You might be able to pick up one extremely cheaply. You'd improve your output enormously with one of those in place of the 6450 :-)

wolfman1360 wrote:

The 1080 is a very new acquisition for me. Despite, at times, still feeling like a small fish in a huge pond, I decided I'd start leveraging that GPU power for things that are important to me. Science is science and any way I can help it progress and move forward is important.

Good on you for not being discouraged and for following your passion. I can really empathize with that attitude. I was always impressed by the sheer dedication and foresight of all those scientists involved in the conceptualization, the design, the initial construction and ultimate upgrading and refinement of the LIGO observatories. I didn't really understand (at the start) just how mind-blowingly awesome the whole deal was.

It really sank in with the detection of the first BH-BH merger event. Then there was the NS-NS merger a couple of years later and all the other flow-on benefits that came from that GW detection. The source location was able to be pin-pointed in the sky for other EM observatories to gather huge amounts of data from the afterglow. Those events spurred me to dig deeper for more information about what questions now had better answers so I've become even more inspired to contribute to the eventual detection of continuous GW emissions - something that Einstein@Home is well placed to be a part of. If anyone wants to get a 15 min tour of what the study of the NS-NS merger is delivering, take a look at this video. It's written and presented by an Australian, so I'm a bit biased :-).

wolfman1360 wrote:

Thank you for your very well written explanations and clearing some things up.
Right now this machine is crunching specifically FGRPB1 on the GPU and, for right now, WCG on the CPU and all seems to be going well with no issues.

You're most welcome and it's good to know you're satisfied with the outcome.

Cheers,
Gary.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1958

Credit: 1515749201

RAC: 1792643

[url]https://einsteinathome.o

10 Sep 2019 5:12:33 UTC

Message 173248

(moderation:

)

https://einsteinathome.org/host/12786992/tasks/4/0?page=34

Are these Ryzens really that fast?

(mine isn't) but that makes my 660Ti SC look like a snail

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119498305138

RAC: 25381700

It's not the Ryzen CPU. It's

10 Sep 2019 5:45:40 UTC

Message 173251

(moderation:

)

It's not the Ryzen CPU. It's the Vega 10 GPU that's doing the fast crunching. So yes, your 660Ti will be looking very much like a snail in comparison. You could probably get 6+ times your current output for something like an RX 570 replacing your 660Ti.

Be aware that it's the FGRPB1G app that does so well on modern AMD GPUs. Things might be a bit different different when the GW GPU app eventually gets sorted out. It's likely that a fast CPU may be necessary to get the most out of a modern GPU. Older hosts (like both yours and mine) will probably be rather dismal performers :-).

Cheers,
Gary.

wolfman1360

Joined: 17 Feb 17

Posts: 19

Credit: 33664141

RAC: 0

So far, very happy with the

10 Sep 2019 7:24:29 UTC

Message 173252

(moderation:

)

So far, very happy with the performance of my rx570 on this project.

How does one go about searching specific GPUs and / or CPUs on this project? The Corsair h100I finally decided to bite the bullet in my fx8350. I was planning on replacing the Radeon 6450 in there but since I have little room in my place as it is, I'm very much debating getting a smaller machine that will end up doing 10 plus times the crunching at the same cost if not less. Looking at maybe another rx570 (or maybe, if I'm feeling generous, rx5700 or 5700xt) or gtx 1660 or 1660 ti.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1958

Credit: 1515749201

RAC: 1792643

https://www.amd.com/en/suppor

13 Sep 2019 3:57:57 UTC

Message 173320

(moderation:

)

https://www.amd.com/en/support/kb/release-notes/rn-rad-win-19-9-2

AMD tends to email them to me all the time since they don't know my one Ryzen is not a fast one and my other AMD's are almost as old as this project

mikey

Joined: 22 Jan 05

Posts: 12917

Credit: 1884444453

RAC: 52444

MAGIC Quantum Mechanic

13 Sep 2019 22:41:40 UTC

Message 173331 in response to message 173320

(moderation:

)

MAGIC Quantum Mechanic wrote:

https://www.amd.com/en/support/kb/release-notes/rn-rad-win-19-9-2

AMD tends to email them to me all the time since they don't know my one Ryzen is not a fast one and my other AMD's are almost as old as this project

I know this is a different brand but still something to be careful of...

"Just an FYI - if using multiple GPU’s the newest nvidia game ready drivers seem to have optimised my system for gaming. This meant only one of my gpu’s was performing at full power. The other two became significantly slower - by nearly a factor of 10. I swapped my driver for the creator ready option and all went back to normal."

Running multiple WUs on Ryzen 1800x paired with rx570?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner