curios. on my linux host with amd gpu i am seeing the following: run time/cpu time of 1081/525, but on some other members windows pc 1346/1061. The WUs are both "Gravitational Wave search O2 Multi-Directional v1.10 () x86_64-pc-linux-gnu" The windows pc is utilizing an AMD Radeon (TM) R9 390 Series (8192MB) while I am running a AMD Radeon (TM) RX 480 Graphics (8097MB) I suppose the cpu time difference of 525 and 1061 could be attributed to the GPUs.
I think it could be a 'linux vs windows' thing. I'm running 3x with RX 580 on another host. Total run times may fluctuate somewhat (and occasionally there are some strange black sheeps included) but cpu time / run time factor seems to be very constantly 0.7 for that RX 580 + windows. I checked my two R9 390 + windows hosts and same factor for them is constantly 0.8. None of those systems are currently setup for dual-boot. Would've been nice to find out what the cpu times would be under linux. Mmmm, I have a faint memory that maybe something similar about the cpu time of a gpu application in linux vs. windows has been discussed earlier at this forum.
edit: I see you have a Ryzen cpu in that host. Perhaps different type of cpus and systems may have an effect on the cpu times in general... with this new GW gpu app. I don't remember how it has been with the previous gpu applications.
In this message in the O2AS discussion thread, I posted data for crunch times for the V1.09 GPU app when running on 4 different CPU/GPU combinations for task multiplicities up to 4x. The results showed significant improvement in output (ie reduction in secs/task) in all cases when using the higher multiplicities. With the advent of the O2MD1 search using GPUs, I'm keen to get similar information for the new V1.10 app. It's a different type of search (directed at specific targets rather than covering the whole sky) so performance is likely to be quite different.
I decided to use 2 hosts that were used in the previous tests, the Q6600/RX 460 and the i5-3470/RX 570 which were the 1st and 4th from the previous list. Both hosts got work for frequencies around the 215Hz mark so well above the low end values that were reported by others earlier on.
I found these tasks were able to crunch very quickly. There are already enough returned results to provide some information about expected crunch times so I'll give details here using the same format (columns, abbreviations, etc) as previously. Each concurrent GPU task had access to the support of a full CPU core.
Only small numbers were crunched at the lowest multiplicities - just enough to get a basic value for the crunch time. The bulk of the results were at 3x for the RX 460 and 4x for the RX 570. The crunch times seemed to become rather more variable for the 570 at 4x so I didn't try 4x for the 460 or anything higher than 4x for the 570. There was good consistency in the times for both hosts up to 3x.
The CPU time component for each task was surprisingly constant irrespective of multiplicity. I guess that suggests a fairly constant amount of CPU work per task which shows as a relatively uniform time if there's always a full core available. The slower CPU will use more time to provide that constant amount of work. Here is a small table to show the typical values of elapsed time/CPU time for both hosts at the multiplicities used.
All results so far are pending. Since it may be a while before any validations are performed, I've switched the hosts back to FGRPB1G until it becomes clear that validation is OK. I don't see much point in crunching more until we see how validation goes.
Wow, these GPU O2MD1 run really fast compared to their CPU counterparts. 190 seconds compared to 25K seconds.
Looks like that particular gpu can finish its duty cycle before the cpu get's its own workload done.... run times are smaller than cpu time That's computational speed metal !
After I did some initial testing yesterday, I got quite a few more tasks but then decided to not run them. I was tempted to think all would be well but, .... so back to FGRPB1G they went for the overnight run. As I survey the scene this morning, I'm sure glad I was cautious :-).
Unfortunately, a new app with more sensitivity sounds like longer crunch times .... I guess that dramatic speed increase we were seeing may just be too good to be true :-).
curios. on my linux host
)
curios. on my linux host with amd gpu i am seeing the following: run time/cpu time of 1081/525, but on some other members windows pc 1346/1061. The WUs are both "Gravitational Wave search O2 Multi-Directional v1.10 () x86_64-pc-linux-gnu" The windows pc is utilizing an AMD Radeon (TM) R9 390 Series (8192MB) while I am running a AMD Radeon (TM) RX 480 Graphics (8097MB) I suppose the cpu time difference of 525 and 1061 could be attributed to the GPUs.
I think it could be a 'linux
)
I think it could be a 'linux vs windows' thing. I'm running 3x with RX 580 on another host. Total run times may fluctuate somewhat (and occasionally there are some strange black sheeps included) but cpu time / run time factor seems to be very constantly 0.7 for that RX 580 + windows. I checked my two R9 390 + windows hosts and same factor for them is constantly 0.8. None of those systems are currently setup for dual-boot. Would've been nice to find out what the cpu times would be under linux. Mmmm, I have a faint memory that maybe something similar about the cpu time of a gpu application in linux vs. windows has been discussed earlier at this forum.
edit: I see you have a Ryzen cpu in that host. Perhaps different type of cpus and systems may have an effect on the cpu times in general... with this new GW gpu app. I don't remember how it has been with the previous gpu applications.
Wow, these GPU O2MD1 run
)
Wow, these GPU O2MD1 run really fast compared to their CPU counterparts. 190 seconds compared to 25K seconds.
Edit.. Also looks like the app got updated to 1.10 from 1.09
In this message in the O2AS
)
In this message in the O2AS discussion thread, I posted data for crunch times for the V1.09 GPU app when running on 4 different CPU/GPU combinations for task multiplicities up to 4x. The results showed significant improvement in output (ie reduction in secs/task) in all cases when using the higher multiplicities. With the advent of the O2MD1 search using GPUs, I'm keen to get similar information for the new V1.10 app. It's a different type of search (directed at specific targets rather than covering the whole sky) so performance is likely to be quite different.
I decided to use 2 hosts that were used in the previous tests, the Q6600/RX 460 and the i5-3470/RX 570 which were the 1st and 4th from the previous list. Both hosts got work for frequencies around the 215Hz mark so well above the low end values that were reported by others earlier on.
I found these tasks were able to crunch very quickly. There are already enough returned results to provide some information about expected crunch times so I'll give details here using the same format (columns, abbreviations, etc) as previously. Each concurrent GPU task had access to the support of a full CPU core.
CPU / GPU (Cores / Threads / GHz) Tsks Multi Pnd Val Inc Inv Err Productivity values (secs/task) ================================= ==== ===== === === === === === =============================== Q6600 / RX 460 (4C / 4T / 2.4 GHz) 20 1,2,3 20 0 0 0 0 1300s, 975s, 712s i5-3470/RX 570 (4C / 4T / 3.2 GHz) 28 1,2,3,4 28 0 0 0 0 586s, 380s, 330s, 312s
Only small numbers were crunched at the lowest multiplicities - just enough to get a basic value for the crunch time. The bulk of the results were at 3x for the RX 460 and 4x for the RX 570. The crunch times seemed to become rather more variable for the 570 at 4x so I didn't try 4x for the 460 or anything higher than 4x for the 570. There was good consistency in the times for both hosts up to 3x.
The CPU time component for each task was surprisingly constant irrespective of multiplicity. I guess that suggests a fairly constant amount of CPU work per task which shows as a relatively uniform time if there's always a full core available. The slower CPU will use more time to provide that constant amount of work. Here is a small table to show the typical values of elapsed time/CPU time for both hosts at the multiplicities used.
GPU Type Multi Elapsed CPU Tsks ======== ===== ======= === ==== RX 460 1x 1300 496 1 RX 460 2x 1950 509 4 RX 460 3x 2135 452 15 RX 570 1x 586 278 1 RX 570 2x 760 278 4 RX 570 3x 990 286 3 RX 570 4x 1246 310 20
All results so far are pending. Since it may be a while before any validations are performed, I've switched the hosts back to FGRPB1G until it becomes clear that validation is OK. I don't see much point in crunching more until we see how validation goes.
Cheers,
Gary.
Zalster wrote:Wow, these GPU
)
Looks like that particular gpu can finish its duty cycle before the cpu get's its own workload done.... run times are smaller than cpu time That's computational speed metal !
For the v1.10 app on my two
)
For the v1.10 app on my two RX570s, running at 3X, I have 10 valids, with 280 pending and no errors or invalids, yet. That's looking hopeful!
Ideas are not fixed, nor should they be; we live in model-dependent reality.
cecht wrote:For the v1.10 app
)
Yes, except for: https://einsteinathome.org/goto/comment/173777
robl wrote:cecht wrote:For
)
Ahh, the joy of beta testing. :/
Ideas are not fixed, nor should they be; we live in model-dependent reality.
cecht wrote:Ahh, the joy of
)
After I did some initial testing yesterday, I got quite a few more tasks but then decided to not run them. I was tempted to think all would be well but, .... so back to FGRPB1G they went for the overnight run. As I survey the scene this morning, I'm sure glad I was cautious :-).
Unfortunately, a new app with more sensitivity sounds like longer crunch times .... I guess that dramatic speed increase we were seeing may just be too good to be true :-).
Cheers,
Gary.
Our internal test showed a
)
Our internal test showed a runtime increase by about 20% (both CPU and GPU). We thought this to be justified.
BM