How does this respond to the settings of amount of cpu threads you can specify in the boinc App_config? Did you vary this with the affinity or did you use the same ratio while testing?
I don't run any BOINC work on the machine save for Einstein Gamma-Ray Pulsar GPU units. So there is no opportunity for BOINC to manage the amount of non-GPU work launched--it is not launching any.
To answer your question more directly, I had to go look to see what I had in the way of an app_config.xml. The answer is that I have not changed it since May, 2017, when I configured <fraction_done_exact/> for three Einstein applications. So nothing at all about "settings of amount of cpu threads" there on my machines.
Cores Average elapsed time
1 31:14
2 30:18
3 30:12
4 30:14
5 30:16
6 30:17
You’re experiment with feeding the gpu with different amounts of cpu percentage got me thinking.
I already noticed a bit of a boost disabling SMT on the CPU, but you lose half of the amount of threads so overall productivity, besides running bionic suffers a decent amount. So I just kept it on after that and gave one cpu thread per GPU WU to crunch.
Now my threadripper 2950x has a lot of cores, but it probably has less throughput on a thread then the I5 with a complete core.
Also the Radeon VII has a much higher throughput then the RX 570, so that would mean that the gpu still runs under fed.
Since you had a decent boost going from one to two cores I decided to follow.
I now have 4 threads/2cores assigned to each WU on the gpu. I run 2 WU units because at 3 the VII would get instable over longer time and give more errors or invalids.
I used to have in the upper 6min or low 7 min for a FGPR (I also run underclocked on the gpu so I know it’s a bit on the longside).
I got it down to 6.17 min, so that’s anywhere between half a minute and almost a full minute faster on average.
On my Linux system with a 2-core, 4-thread Pentium G5600, I have been running two RX 570s in mining bios with 0.5 GPU and 0.25 CPU specified in app_config.xml. In accord with past discussion threads, I found no real difference in task times when I increased to 1 CPU per task:
Cores Average elapsed time
0.25 20:06
1 20:04
At 1 core/task, CPU utilization is ~11% on each core (thread).
How does no difference with fractional cores gibe with faster times with >1 cores per task?
I'm going to try 0.33 GPU per task now.
EDITED;Addition:
With 3x tasks, times were a little better for a few extra watts:
Cores Average elapsed time
0.5 29:40
and CPU utilization was about 20%
With 2x tasks running on 2 CPU each (could run only one GPU), task times essentially were the same as with other 2x runs:
Cores Average elapsed time
2 20:07
Ideas are not fixed, nor should they be; we live in model-dependent reality.
A set of observations on CPU affinity restrictions:
I recently raised the multiplicity on my second RX 570 from 2X to 3X, and got a nice little productivity improvement, with small enough power consumption increase to make me decide to keep it. While I was fiddling, I decided to spend a couple of days running restriction (by the CPU affinity controls in Process Lasso) to varying numbers of cores on the 6-core non-hyperthreaded i5-9400F. I averaged each test condition over several hours. I continue to run the RX 570 at a -20% power limitation, imposed by MSIAfterburner, and with an Afterburner fan curve which has it reporting 62C GPU temperature most of the time. This is an XFX brand RX 570 with the BIOS switch on the "mining" position.
The results were simple: restricting to a single core does noticeable harm, but the other five options are surprisingly similar with the (slightly) best result observed with three allowed cores (the same as the number of GPU tasks). My longstanding observation that restricting the GPU support task to anything less than all available cores always does harm was not borne out. This does support my long-standing advice that it is better to test than just to invoke "known truths" for these settings.
Cores Average elapsed time
1 31:14
2 30:18
3 30:12
4 30:14
5 30:16
6 30:17
I can see the more concurrent tasks a GPU is running the more CPU threads you'll need dedicated. The crunch time at the end is a period where limited CPU time for GPU tasks can slow things down.
I run 2x tasks with 1 CPU thread open via Process Lasso. I didn't check GPU times like you did but for the most part GPU util stayed pegged with 1 CPU open with dips turning task switching/etc.
My second RX 570 machine had an anomaly during the night. When I did my morning productivity logging I noticed a downtick in fleet production. All three current tasks on the affect machine showed about six hours of run time, the GPU reported temperature was down by tens of degrees from usual, and the wall power meter showed 45 watts instead of 168.
Rebooting put the power levels up part way, but not the full way, so I downloaded and installed the latest AMD driver, selecting "clean install" in case some setting had gone amiss.
After all that, I have over an hour of successful running. Oddly, the elapsed times which had been running about 30:10, are now improved to about 28:02.
As rebooting gave me some of the usual Windows indications of update activity, I speculate that perhaps my middle of the night "failure" may have been a consequence of Windows updating activity replacing or altering the state of my AMD driver in a way inconsistent with Einstein success.
My second RX 570 machine had an anomaly during the night. When I did my morning productivity logging I noticed a downtick in fleet production. All three current tasks on the affect machine showed about six hours of run time, the GPU reported temperature was down by tens of degrees from usual, and the wall power meter showed 45 watts instead of 168.
Rebooting put the power levels up part way, but not the full way, so I downloaded and installed the latest AMD driver, selecting "clean install" in case some setting had gone amiss.
After all that, I have over an hour of successful running. Oddly, the elapsed times which had been running about 30:10, are now improved to about 28:02.
As rebooting gave me some of the usual Windows indications of update activity, I speculate that perhaps my middle of the night "failure" may have been a consequence of Windows updating activity replacing or altering the state of my AMD driver in a way inconsistent with Einstein success.
One would think that by now MS KNOWS of the problem and would STOP doing it, but noooo the idiots continue on their merry way like mice following some guy with a flute!!
Sorry for your down time. Hope you get that new card crunching soon. When you do, don't forget to flip the dual BIOS switch to the mining position. As far as I know, XFX are the only 570 cards that offer that golden opportunity for faster tasks and lower power with a pre-loaded mining BIOS.
So today I finally tried to flip the switch on my first RX 570, and I think it is now worse (whereas the second one eventually got noticeably better). Possibly it was delivered to me in mining position. Which way is which?
On the Sapphire RX580 the mining or silent BIOS comes with a 122W power limit, while the default BIOS was 170-180'ish... So flipping the switch and booting the machine, it was instantly clear that it worked. Temps were low, clocks were lower, PL was fixed at 122W, but just using 80W under load. Awesome!
Possibly it was delivered to me in mining position. Which way is which?
Mining position is toward the ports (what I call the front of the card, which is at the back of the machine). It's odd though - you should have seen a big immediate difference, as Koschi described it, whichever way the switch was flipped.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
I currently operate a flotilla of just three Einstein machines, all single-GPU with AMD GPU(one Radeon VII and two RX 570). As of yesterday, I've given up on all three of them as regards running 3X on current Einstein Gamma-Ray Pulsar (1.18) work.
The detailed symptoms vary a bit from machine to machine, and as all three can run successfully (with improved productivity and power efficiency compared to 2X on the same work) for hours or days, I am not dead sure as to cause and effect.
On the two RX 570 machines, the two primary seemingly 3X-related issues of concern are:
1. "Validate error" -- the kind that error out without comparison when the sanity check done after quorum fulfillment flunks.
2. Sloth Mode -- From one minute to the next, the GPU reported temperature drops by roughly 10C, the CPU consumption by the support task drops by over a factor of ten, and the completion elapsed time for tasks about triples or worse, but the core clock and memory clock rates reported by GPU-Z remain unchanged
In many months of running, on five different Nvidia card models, and two different AMD card models, I'm accustomed to getting zero "validate error" cases (Yes, I get very roughly 1% "Completed, marked as invalid" but that is something different) so that alone I think troublesome.
The Sloth Mode behavior is puzzling, and I frankly suspect the clock rate reported is false, as otherwise the low power consumption seems implausible. As I have sometimes not noticed it for hours, and it requires at least a reboot, and sometimes a driver re-install to escape, Sloth mode is unacceptable to me at any appreciable rate. I've not gotten so much as three straight days of running 3X on an RX 570 machine without dropping into Sloth Mode.
On my Nvidia cards running this application, 3X gave too little a productivity boost to leave me wanting to let the third instance have enough CPU to be happy, but the AMD cards all gave a nice little increment of performance. It is a pity to give it up.
Perhaps others might mention here whether they succeed or fail in running 3X for Einstein GRP work on Polaris (e.g. RX 570) or Radeon VII cards.
Peter van Kalleveen wrote:How
)
I don't run any BOINC work on the machine save for Einstein Gamma-Ray Pulsar GPU units. So there is no opportunity for BOINC to manage the amount of non-GPU work launched--it is not launching any.
To answer your question more directly, I had to go look to see what I had in the way of an app_config.xml. The answer is that I have not changed it since May, 2017, when I configured <fraction_done_exact/> for three Einstein applications. So nothing at all about "settings of amount of cpu threads" there on my machines.
archae86 wrote:Cores
)
You’re experiment with feeding the gpu with different amounts of cpu percentage got me thinking.
I already noticed a bit of a boost disabling SMT on the CPU, but you lose half of the amount of threads so overall productivity, besides running bionic suffers a decent amount. So I just kept it on after that and gave one cpu thread per GPU WU to crunch.
Now my threadripper 2950x has a lot of cores, but it probably has less throughput on a thread then the I5 with a complete core.
Also the Radeon VII has a much higher throughput then the RX 570, so that would mean that the gpu still runs under fed.
Since you had a decent boost going from one to two cores I decided to follow.
I now have 4 threads/2cores assigned to each WU on the gpu. I run 2 WU units because at 3 the VII would get instable over longer time and give more errors or invalids.
I used to have in the upper 6min or low 7 min for a FGPR (I also run underclocked on the gpu so I know it’s a bit on the longside).
I got it down to 6.17 min, so that’s anywhere between half a minute and almost a full minute faster on average.
That’s worth the extra cores/threads
On my Linux system with a
)
On my Linux system with a 2-core, 4-thread Pentium G5600, I have been running two RX 570s in mining bios with 0.5 GPU and 0.25 CPU specified in app_config.xml. In accord with past discussion threads, I found no real difference in task times when I increased to 1 CPU per task:
At 1 core/task, CPU utilization is ~11% on each core (thread).
How does no difference with fractional cores gibe with faster times with >1 cores per task?
I'm going to try 0.33 GPU per task now.
EDITED;Addition:
With 3x tasks, times were a little better for a few extra watts:
and CPU utilization was about 20%
With 2x tasks running on 2 CPU each (could run only one GPU), task times essentially were the same as with other 2x runs:
Ideas are not fixed, nor should they be; we live in model-dependent reality.
archae86 wrote:A set of
)
I can see the more concurrent tasks a GPU is running the more CPU threads you'll need dedicated. The crunch time at the end is a period where limited CPU time for GPU tasks can slow things down.
I run 2x tasks with 1 CPU thread open via Process Lasso. I didn't check GPU times like you did but for the most part GPU util stayed pegged with 1 CPU open with dips turning task switching/etc.
My second RX 570 machine had
)
My second RX 570 machine had an anomaly during the night. When I did my morning productivity logging I noticed a downtick in fleet production. All three current tasks on the affect machine showed about six hours of run time, the GPU reported temperature was down by tens of degrees from usual, and the wall power meter showed 45 watts instead of 168.
Rebooting put the power levels up part way, but not the full way, so I downloaded and installed the latest AMD driver, selecting "clean install" in case some setting had gone amiss.
After all that, I have over an hour of successful running. Oddly, the elapsed times which had been running about 30:10, are now improved to about 28:02.
As rebooting gave me some of the usual Windows indications of update activity, I speculate that perhaps my middle of the night "failure" may have been a consequence of Windows updating activity replacing or altering the state of my AMD driver in a way inconsistent with Einstein success.
archae86 wrote:My second RX
)
One would think that by now MS KNOWS of the problem and would STOP doing it, but noooo the idiots continue on their merry way like mice following some guy with a flute!!
cecht wrote:Sorry for your
)
So today I finally tried to flip the switch on my first RX 570, and I think it is now worse (whereas the second one eventually got noticeably better). Possibly it was delivered to me in mining position. Which way is which?
On the Sapphire RX580 the
)
On the Sapphire RX580 the mining or silent BIOS comes with a 122W power limit, while the default BIOS was 170-180'ish... So flipping the switch and booting the machine, it was instantly clear that it worked. Temps were low, clocks were lower, PL was fixed at 122W, but just using 80W under load. Awesome!
archae86 wrote:Possibly it
)
Mining position is toward the ports (what I call the front of the card, which is at the back of the machine). It's odd though - you should have seen a big immediate difference, as Koschi described it, whichever way the switch was flipped.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
I currently operate a
)
I currently operate a flotilla of just three Einstein machines, all single-GPU with AMD GPU(one Radeon VII and two RX 570). As of yesterday, I've given up on all three of them as regards running 3X on current Einstein Gamma-Ray Pulsar (1.18) work.
The detailed symptoms vary a bit from machine to machine, and as all three can run successfully (with improved productivity and power efficiency compared to 2X on the same work) for hours or days, I am not dead sure as to cause and effect.
On the two RX 570 machines, the two primary seemingly 3X-related issues of concern are:
1. "Validate error" -- the kind that error out without comparison when the sanity check done after quorum fulfillment flunks.
2. Sloth Mode -- From one minute to the next, the GPU reported temperature drops by roughly 10C, the CPU consumption by the support task drops by over a factor of ten, and the completion elapsed time for tasks about triples or worse, but the core clock and memory clock rates reported by GPU-Z remain unchanged
In many months of running, on five different Nvidia card models, and two different AMD card models, I'm accustomed to getting zero "validate error" cases (Yes, I get very roughly 1% "Completed, marked as invalid" but that is something different) so that alone I think troublesome.
The Sloth Mode behavior is puzzling, and I frankly suspect the clock rate reported is false, as otherwise the low power consumption seems implausible. As I have sometimes not noticed it for hours, and it requires at least a reboot, and sometimes a driver re-install to escape, Sloth mode is unacceptable to me at any appreciable rate. I've not gotten so much as three straight days of running 3X on an RX 570 machine without dropping into Sloth Mode.
On my Nvidia cards running this application, 3X gave too little a productivity boost to leave me wanting to let the third instance have enough CPU to be happy, but the AMD cards all gave a nice little increment of performance. It is a pity to give it up.
Perhaps others might mention here whether they succeed or fail in running 3X for Einstein GRP work on Polaris (e.g. RX 570) or Radeon VII cards.