for the new O3ASBu tasks, they have such a short relative time in the recalc step that I'm not overcommiting stuff nearly as much as I was with O3ASHF, and CPU use isnt really slowing them down since there's such a large portion of it in GPU only stage
so for 2x tasks, running MPS at 70%
for 3x tasks, running MPS at 40%
using the 1.08/1.15 CPU app, my recalc step only takes about 1-2 minutes on my EPYC Rome systems, allowing CPU load to stay around 75% from other projects. adding only 1-2 mins on a task that runs for 2700s is hardly any impact. less than 5% of the overall runtime for me. compared with O3ASHF tasks where CPU recalc would take at least 50% of the runtime.
I'm kind of surprised that it's taking 5 mins on that Threadripper system, even if it's only a Zen1+ chip. are you running a bunch of CPU work on it too?
Thanks for the MPS info.
Yep, those systems are running full-time CPU work as well. That gives me an idea to test to see how bad the bottleneck actually is.. Let me see what happens when I stop CPU work, close BOINC to clear RAM, and then reopen to only GPU work. More just out of curiosity than anything else.
EDIT/UPDATE: It took ~80 seconds when no other CPU work was active and RAM was cleared.
It took ~107 seconds when 50% of the cores were doing CPU work.
It took ~132 seconds when 75% of the cores were doing CPU work.
I understand the clock slows when more cores are filled but still a pretty severe bottleneck at high/full CPU usage. Now to find the "magic" CPU usage percentage to maximize both core count usage and speed...
but again, it's not a huge loss anyway with the new tasks. only losing a tiny fraction of overall speed with the CPU recalc portion extending by a couple mins when the overall runtime is so long anyway.
but again, it's not a huge loss anyway with the new tasks. only losing a tiny fraction of overall speed with the CPU recalc portion extending by a couple mins when the overall runtime is so long anyway.
For sure. I was also thinking in the context of all of the other work on the cpu. Relatively insignificant improvement for this work, but could be a significant difference/improvement for strictly cpu work. Always something to explore!
I have just switched my Windows box (Ryzen 5700G) over the brp7/meerKat. Who knows, I might even make it past 100,000 Rac.
I have now run a very few OS3GW at 1x and more brp7 at 2x and 3x.
It looks like the highest production is brp7/meerKat at 2x.
####edit###
I have a GTX 1060 6GB laying around. So I will be installing it on the Windows box to see what kind of numbers I can get.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I am beginning to think that for my Linux Titan V system a 40 percent MPS, OS3GW v1.08/v1.15 at 5x (5 tasks at once) is my best producer.
I suspect that if I still had an rtx 3080 ti I would get similar results at either 5x or 4x.
Respectfully,
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
So unless the average settles down into 6300 range, no 5x is not more productive. :(
I still need to go back and either re-analyze from more of my Brp7/meerKat numbers or run some more tasks.
Respectfully
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
looks like his effective per task time is about 1250s?
it's like 3% faster than my average times with me running 2x (70% MPS). so not really a big difference.
So if that number is at 4x it would be a big difference? ;)
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
A longer run confirms more of the same without out any of the "lows".
So dropping back to 4x. Apparently I can get more total production at 4x.
I have previously switched the profile to start feeding brp7/meerKat into the cache. Once it starts processing those I will set the MPS to 70% to go with 2x.
Respectfully,
===edit===
Assuming the iffy CPU loads were not screwing up my testing.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Ian&Steve C. wrote:for the
)
Thanks for the MPS info.
Yep, those systems are running full-time CPU work as well. That gives me an idea to test to see how bad the bottleneck actually is.. Let me see what happens when I stop CPU work, close BOINC to clear RAM, and then reopen to only GPU work. More just out of curiosity than anything else.
EDIT/UPDATE: It took ~80 seconds when no other CPU work was active and RAM was cleared.
It took ~107 seconds when 50% of the cores were doing CPU work.
It took ~132 seconds when 75% of the cores were doing CPU work.
I understand the clock slows when more cores are filled but still a pretty severe bottleneck at high/full CPU usage. Now to find the "magic" CPU usage percentage to maximize both core count usage and speed...
75% is pretty good :) but
)
75% is pretty good :)
but again, it's not a huge loss anyway with the new tasks. only losing a tiny fraction of overall speed with the CPU recalc portion extending by a couple mins when the overall runtime is so long anyway.
_________________________________________________________________________
Ian&Steve C. wrote: 75% is
)
For sure. I was also thinking in the context of all of the other work on the cpu. Relatively insignificant improvement for this work, but could be a significant difference/improvement for strictly cpu work. Always something to explore!
Tom M wrote:I have just
)
I have now run a very few OS3GW at 1x and more brp7 at 2x and 3x.
It looks like the highest production is brp7/meerKat at 2x.
####edit###
I have a GTX 1060 6GB laying around. So I will be installing it on the Windows box to see what kind of numbers I can get.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I am beginning to think that
)
I am beginning to think that for my Linux Titan V system a 40 percent MPS, OS3GW v1.08/v1.15 at 5x (5 tasks at once) is my best producer.
I suspect that if I still had an rtx 3080 ti I would get similar results at either 5x or 4x.
Respectfully,
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: I am beginning
)
Do you have times per task? I saw negligible difference between 3x and 4x on our Titan V (3x was slightly faster but not significantly).
looks like his effective per
)
looks like his effective per task time is about 1250s?
it's like 3% faster than my average times with me running 2x (70% MPS). so not really a big difference.
_________________________________________________________________________
Boca Raton Community HS
)
12-19-2024 21:39
Bumped v1.14 to 3x still 40%
Was running about 2711s/task on 2x. About 1.27M/gpu.
1355s 1x equiv.
12-20-2024 7:08am
v1.14
3x is running about 3836s -> 1279s 1x equiv.
About 64.4 minutes
Switching to v1.15 at 3x.
12-20-2024 17:50
Slowest task listed is 4,022s -> 1340s 1x equiv.
About 67 minutes.
v1.15 at 3x, so what does 4x look like?
About 1.28M/gpu
12-21-2024 10:11
v1.15, 4x
5120s -> 1x equiv. 1280s
About 1.35M/gpu
The #'s I have for 5x are varying widely. I want to see what tomorrow brings.
For instance:
12-22-2024 10:22
Bumped OS3GW to 40%/v1.15/5x
12-22-2024 16:57
v1.15
5x
6938s / 5 = 1387.6s 1x equiv. highest pending
6683s / 5 = 1336.6s 1x equiv. valid
6301s / 5 = 1260s 1x equiv. pending
So unless the average settles down into 6300 range, no 5x is not more productive. :(
I still need to go back and either re-analyze from more of my Brp7/meerKat numbers or run some more tasks.
Respectfully
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Ian&Steve C. wrote: looks
)
So if that number is at 4x it would be a big difference? ;)
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote:12-22-2024
)
17:43
Dropped CPU load from 97% to 85%. Maybe more free threads will speed v1.15 up by tomorrow.
12-23-2024 12:45 pm
v1.15, 5x
cpu@85%
grabbed 6415s -> 1283s 1x equiv
lowest! 2052s
lower 4004s -> 800s 1x equiv
high 6938s -> 1387s 1x equiv
Oh, well.
==============
A longer run confirms more of the same without out any of the "lows".
So dropping back to 4x. Apparently I can get more total production at 4x.
I have previously switched the profile to start feeding brp7/meerKat into the cache. Once it starts processing those I will set the MPS to 70% to go with 2x.
Respectfully,
===edit===
Assuming the iffy CPU loads were not screwing up my testing.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!