We run 4x on the RTX a4500 GPUs. I cannot say it was much different for us when running 3x. I have not tried 5x but it looks like the GPU utilization "stays" at 100%. Of course, there is always the "convergence" moments where two (or 3) work units coincidentally move from the GPU to CPU but even with an effort to offset, this will still happen.
When i ran 5x on my rdna 2 cards i ran into reboots. I dont know if it is a coincidence or not. Went down to 4x and runs great.
I suspect that I may have the same problem once I try 5x. 4x took 80 odd % so the question is, will the 5th task leave enough elbow room for everything to work. It will be several days be for I hold my breath and jump in for that 5th task.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I was looking around and the rtx 3090 has 24 GB of memory. What do you want to bet they could run at least 8 tasks per GPU?
Ditto on any other more than 12 GB video cards.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Hi, just want to throw my 2 cents... I was planning to make a more detailed post with the findings with my gpu performance, but since the discussion is happening now... I'm still testing, but I'm running 6 concurrent GW tasks on my RX 7800 XT (13168644). It's doing great, but i'm reaching the limit the Ryzen 5 3600 can do haha
There might be the VRAM to run 8, but other factors such as actual GPU utilization would max out far before you got to that based on what I have seen.
So far my system jump from 3x to 4x processing time has not been linear. I am guessing that the fact that I have been running 99% utilization on the GPU since 2x is not the limiting factor.
===edit==
I propose that total production will go up the more tasks per GPU a system is running. Even if the GPU utilization is already 99%. This supposition depends on the non-trival amount of processing time that each CPU thread has to do when the GPU thread basically suspended.
While I am sure there is an upper limit to the increase in total production of a large VRAM GPU I don't believe we have enough empirical evidence to know what it is yet.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
While I am sure there is an upper limit to the increase in total production of a large VRAM GPU I don't believe we have enough empirical evidence to know what it is yet.
Tom M
Fair enough. Let's test.
Here is what I will do. We run this app on our "twin" (identical) 24 core TR systems on single RTX a4500 GPUs. Right now, they are running these work units at 4x. Starting today, and through next week, I will change the amount of work units running simultaneously. Since both systems are absolutely identical, I can cover twice as much "ground". Keep in mind, our results on these systems will NOT be the exact results someone else would obtain on a different system.
We have WCG running on these systems as well but the work has been steady and I want to keep that work running. If there is a lapse in WCG work, we will void those results since the clock speeds speed up since there would be idle cores. This CPU is fast and usually runs around 4.1GHz across fully loaded cores.
We will collect data for about ~7 hours each day then determine mean times. I will not be able to actually watch the application running and record times, so we will use the time shown in BOINC. I will disconnect from the internet so these work units just accumulate over the ~7 hours.
I will compile on a spreadsheet and then share results as I collect them.
If anyone wants to add any test ideas, just let me know.
My system is running a mix of universe at home and small # of wcg CPU tasks. My GPU is at 4x with the production apparently rising. I will be trying 5x next.
I am assuming a stepwise progression of, in your case, 5, 6, 7, 8 etc.
Your results should be very interesting! At least to me.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I am assuming a stepwise progression of, in your case, 5, 6, 7, 8 etc.
Correct. Right now, one system is running 1x and the other 2x. We all know this is not ideal, but it gives us a baseline to work from. I will increase by 1 work unit every day until I crash the system (20GB vram on the a4500). I will not be running these tests over the weekends.
Question for the devs. They supply the files for some unknown reason.
Just a quick reply to this question: yes, as already suspected here, this is essential for splitting a workunit in two. It works a bit like this: "Take the values from the command line and then add options in sais files to the commandline, one by one for the two halves, resulting in the command lines for the two halves differeng in one option that applies an offset in search frequency "
We run 4x on the RTX a4500
)
We run 4x on the RTX a4500 GPUs. I cannot say it was much different for us when running 3x. I have not tried 5x but it looks like the GPU utilization "stays" at 100%. Of course, there is always the "convergence" moments where two (or 3) work units coincidentally move from the GPU to CPU but even with an effort to offset, this will still happen.
Weber462 wrote: When i ran
)
I suspect that I may have the same problem once I try 5x. 4x took 80 odd % so the question is, will the 5th task leave enough elbow room for everything to work. It will be several days be for I hold my breath and jump in for that 5th task.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I was looking around and the
)
I was looking around and the rtx 3090 has 24 GB of memory. What do you want to bet they could run at least 8 tasks per GPU?
Ditto on any other more than 12 GB video cards.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Hi, just want to throw my 2
)
Hi, just want to throw my 2 cents... I was planning to make a more detailed post with the findings with my gpu performance, but since the discussion is happening now... I'm still testing, but I'm running 6 concurrent GW tasks on my RX 7800 XT (13168644). It's doing great, but i'm reaching the limit the Ryzen 5 3600 can do haha
There might be the VRAM to
)
There might be the VRAM to run 8, but other factors such as actual GPU utilization would max out far before you got to that based on what I have seen.
Boca Raton Community HS
)
So far my system jump from 3x to 4x processing time has not been linear. I am guessing that the fact that I have been running 99% utilization on the GPU since 2x is not the limiting factor.
===edit==
I propose that total production will go up the more tasks per GPU a system is running. Even if the GPU utilization is already 99%. This supposition depends on the non-trival amount of processing time that each CPU thread has to do when the GPU thread basically suspended.
While I am sure there is an upper limit to the increase in total production of a large VRAM GPU I don't believe we have enough empirical evidence to know what it is yet.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote:While I am sure
)
Fair enough. Let's test.
Here is what I will do. We run this app on our "twin" (identical) 24 core TR systems on single RTX a4500 GPUs. Right now, they are running these work units at 4x. Starting today, and through next week, I will change the amount of work units running simultaneously. Since both systems are absolutely identical, I can cover twice as much "ground". Keep in mind, our results on these systems will NOT be the exact results someone else would obtain on a different system.
Test systems:
CPU: AMD Ryzen Threadripper PRO 5965WX (128 MB cache, 24 cores, 48 threads, 3.8GHz to 4.5GHz)
RAM: 64GB 8x8GB DDR4 3200MHz RDIMM ECC
GPU: NVIDIA RTX A4500
Storage: 2TB, M.2, PCIe NVMe, SSD, Class 40
We have WCG running on these systems as well but the work has been steady and I want to keep that work running. If there is a lapse in WCG work, we will void those results since the clock speeds speed up since there would be idle cores. This CPU is fast and usually runs around 4.1GHz across fully loaded cores.
We will collect data for about ~7 hours each day then determine mean times. I will not be able to actually watch the application running and record times, so we will use the time shown in BOINC. I will disconnect from the internet so these work units just accumulate over the ~7 hours.
I will compile on a spreadsheet and then share results as I collect them.
If anyone wants to add any test ideas, just let me know.
Thank you. My system is
)
Thank you.
My system is running a mix of universe at home and small # of wcg CPU tasks. My GPU is at 4x with the production apparently rising. I will be trying 5x next.
I am assuming a stepwise progression of, in your case, 5, 6, 7, 8 etc.
Your results should be very interesting! At least to me.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: I am assuming a
)
Correct. Right now, one system is running 1x and the other 2x. We all know this is not ideal, but it gives us a baseline to work from. I will increase by 1 work unit every day until I crash the system (20GB vram on the a4500). I will not be running these tests over the weekends.
Keith Myers wrote: Question
)
Just a quick reply to this question: yes, as already suspected here, this is essential for splitting a workunit in two. It works a bit like this: "Take the values from the command line and then add options in sais files to the commandline, one by one for the two halves, resulting in the command lines for the two halves differeng in one option that applies an offset in search frequency "