does anyone know what a good app-config.xml would be for an 8GB 5500 XT? I WAS doing 3 concurrent GW tasks but now that the VRAM usage has gone way up on that task only running 2 tasks is using 6.5GB+ in afterburner. I also have it set to use 1 CPU thread per task, is that efficient?
in FGRPB1G I seem to be able to run 4 concurrent tasks pretty well.
BTW this card is only PCIE 8x....
Copyright © 2024 Einstein@Home. All rights reserved.
My understanding of how
)
My understanding of how cpu_usage works is rudimentary, so I'll let you know what I empirically learned. In app_config.xml, cpu_usage doesn't determine how many threads are used per task, at least not like how we think about gpu_usage; all CPU threads/cores can be recruited to run whatever the app needs.
In my 4-thread system, when running an earlier version of the GW GPU app, I could run my two RX 570s with gpu_usage at 0.25 (8 concurrent tasks), but had to lower cpu_usage to 0.4 (or was it 0.3?) to have the boinc-client run all eight tasks, otherwise it would limit it to 6 or 7 - I can't recall. Also, the task scheduler looks at cpu_usage and the lower it is, the more tasks it will assume your system is capable of running and thus download into your queue. In that respect it functions somewhat like the boinc Computer Preferences function of 'days of work'.
When you say that the gamma ray tasks run pretty well with 4 concurrent, do you mean that their normalized task time is lower than with 3 concurrent (run time/4 < run time/3)? If so, that's great! If it's about the same time (or longer), then consider the additional power used per task completed.
For FGRPB1G tasks, PCIe channels don't much matter. It will run at 1x, and do pretty much the same from 2x-16x, as I recall from posts from folks using risers.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
charbo187 wrote:does anyone
)
I'm a bit puzzled by your comments. I had a look at the computers linked to the account you are posting under - and there are quite a few. The problem is that the most recently active host in the list, last made contact with the project in Jan, 2017. Is your machine with the 5500XT attached to a different account?
I was looking to see what other hardware you might have so as to understand the PCIe 8x comment.
I know nothing about a 5500XT but if you have a good, fast CPU with sufficient cores, you may well be able to continue running it at 3x. I have a 6C/12T Ryzen 5 2600 driving a 4GB RX 570 that has been running GW tasks 3x (it used to run 4x) quite happily. With the change in task type (a different pulsar source of GW being analysed) I've needed to reduce to 2x in order not to have some tasks error out due to lack of VRAM. In experiments, I have seen 3 VelaJr1 tasks able to run but not consistently so. Everything is fine at 2x - except for a loss of overall output. If you have 8GB, I would have thought 3x should be OK. The only way to know for sure is to experiment.
Here is a brief summary of what to expect with the two different GPU searches, FGRPB1G (gamma-ray pulsar search) and O2MDF (multi-directed search for continuous GW using O2 LIGO data targeting 3 nearby pulsars). There are bound to be others reading so I'll try to cater for them as well. The comments relate to AMD GPUs. It may be quite different for nvidia.
The FGRPB1G search uses the GPU almost completely so that (for AMD GPUs) very little CPU support is needed. Running a second concurrent task can increase throughput by perhaps 7-10% or thereabouts. For more than 2, the further gains are minimal and the risk of 'glitches' causing task failures or crashes seems to increase. This tends to make it 'not worth the effort'. The app is quite mature and reliable and perfect for those wanting 'set and forget' behaviour.
The O2MDF search uses an immature app that doesn't run all parts of the calculation on the GPU. Significant sections of the code are difficult to parallelize - so the CPU is heavily involved for those bits. This may improve over time. Memory requirements have been steadily increasing as the 'frequency' component being analysed has been increasing. This is in addition to sudden changes when the pulsar being analysed changes. The indications are that there probably wont be larger requirements than currently exist.
Because of the 'under-utilization' of the GPU, CPU strength is very important and (with the right CPU) there is considerable scope for improving throughput by running extra concurrent tasks - as many as the available memory will allow. The only way to really know is to perform experiments and be prepared to change when either pulsar type or frequency being analysed changes significantly.
With regard to the use of app_config.xml to run concurrent tasks. The most important fact to fully understand is that parameters you specify in the file for cpu_usage do not in any way constrain what the app will actually use. The app will 'consume' whatever it needs, irrespective of the values you set. The figures you set are only used by the BOINC client for budgeting purposes. In other words if the app already has sufficient CPU support already available, increasing it further will not make it run faster. What it will do is potentially prevent the client from running more CPU only tasks than it otherwise would be allowed to. If you under-specify, the client may take advantage of that and run more CPU threads which would tend to cripple GPU performance by denying it the support it needs.
So, before anyone can give you proper advice, you need to specify (or allow to be viewed) the exact hardware you propose using. You also need to be very clear about your desired mix of both CPU tasks and GPU tasks - and what different projects you intend to support. It can be quite a complex situation, perhaps requiring a fair bit of 'suck it and see', unfortunately :-).
EDIT: I focused on the message content rather than the thread title so hadn't noticed the CPU details. That CPU would be fine for FGRPB1G but I suspect the old architecture and lower frequency may have a significant impact on O2MDF GPU task performance. I found that problem for similar generation quad core CPUs, even for low GPU task concurrency. Admittedly, that was in the early days of this GPU app and the bug fixing changes that have occurred may have had some impact on that.
Cheers,
Gary.
I'm a bit puzzled by your
)
yes I'm currently running via the gridcoin pool, my machine is here https://einsteinathome.org/host/12808945
the rx 5500 xt only runs at pcie x8 rather than the full x16 that pcie can take advantage of, that is just how they designed it...
I seem to be able to run 4 concurrent FGRPB1G tasks just fine. but I've had to lower down to 2 concurrent O2MDF tasks because even with only two tasks running sometimes VRAM usage goes over 7GB.....before these new "velajr" tasks I could run 3 concurrent as each task would use about 1.5-1.7gb of vram
I try to give every gpu task a thread in 02MDF (so I also run WCG and it uses 10 out of the 12 threads) but I'm not sure if it needs more? with FGRPB1G I have it only set to 0.4 CPU in app_config and that seems to be enough.