Does anybody know why I can't merge some of my VMs from the "Computers" tab on the Web?
I have manually assigned different external IP addresses to some of them upon creation. Could that be the reason?
EDIT: I'm also seeing some weird GPU load dips from 98% to 55% on multiple GPUs in the load profiling software that the data center runs on the entire mainframe. I can confirm the CPU cores are at low utilization, so that's not the problem. Any ideas what could be causing that. Is there a debugging feature available for Boinc apps?
The app in question is "Gamma-ray pulsar binary search #1 on GPUs v1.24 () x86_64-pc-linux-gnu"
EDIT: I'm also seeing some weird GPU load dips from 98% to 55% on multiple GPUs in the load profiling software that the data center runs on the entire mainframe. I can confirm the CPU cores are at low utilization, so that's not the problem. Any ideas what could be causing that. Is there a debugging feature available for Boinc apps?
Im sure someone can tell better than me, if im wrong, please, correct it. You mentioned on other post that the GPUs are FP32 capable, as far as i know, when that's the case, with FGRPB1G tasks (the Gamma-ray app you mentioned), the FP64 portion of the task is done on the CPU, so the GPU load drops. But as you also mentioned the CPU cores are at low utilization, i dont really know what could be.
Until last year i had a notebook with a Nvidia MX-150 dedicated graphics crunching FGRPB1G tasks, the last 10% of each task took longer and the GPU load drops with the CPU load increasing. But, the MX-150 is FP64 capable, so i'm not sure whats happening.
EDIT: The Tesla T4 is also FP64 capable, like the MX-150.
Glad you got it running. On the regular leaderboard you would need all those systems running under a single computer ID to compete for the top individual system.
You should be able to compete for the top user however.
Your nodes will use considerable more power under load than idling. Who is paying for it?
I note you are showing 46 distinct systems. Not 1,000 :)
May this activity not cause you harm.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I thought I got it running, but no. If I deploy around 20 VMs concurrently, everything seems to work just fine, but as I increase the number of active VMs, everything becomes unstable. On some VMs the Boinc client would randomly crash, on others I see inexplicable performance degradation, chaotic fluctuations in GPU load, while CPU utilization never exceeds 40% per thread. I've looked extensively at the performance and load profiling data from the mainframe and I can't explain why these issues occur. This is a typical case of demonic possesion. Perhaps, I should hire an exorcist. WTF.
To answer your question about who is paying for the electricity - nobody. The entire data center has an independent power grid. The source of that power is renewable energy. When you see big companies brag about being "carbon neutral", this is what it means.
I thought I got it running, but no. If I deploy around 20 VMs concurrently, everything seems to work just fine, but as I increase the number of active VMs, everything becomes unstable. On some VMs the Boinc client would randomly crash, on others I see inexplicable performance degradation, chaotic fluctuations in GPU load, while CPU utilization never exceeds 40% per thread. I've looked extensively at the performance and load profiling data from the mainframe and I can't explain why these issues occur. This is a typical case of demonic possesion. Perhaps, I should hire an exorcist. WTF.
I have no clue but your process to figure it out should help the programmers once you have to give it back to them again.
Quote:
To answer your question about who is paying for the electricity - nobody. The entire data center has an independent power grid. The source of that power is renewable energy. When you see big companies brag about being "carbon neutral", this is what it means.
Did he ever demonstrate the ability to place a system in the top 5 or 10 at e@h?
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Mikey, your spreadsheet needs
)
Mikey, your spreadsheet needs a bit of updating to current project states. Who runs what, who is active still etc.
Dido wrote: I was wondering,
)
yes I do. The EPYC systems make great multi-GPU systems.
_________________________________________________________________________
Does anybody know why I can't
)
Does anybody know why I can't merge some of my VMs from the "Computers" tab on the Web?
I have manually assigned different external IP addresses to some of them upon creation. Could that be the reason?
EDIT: I'm also seeing some weird GPU load dips from 98% to 55% on multiple GPUs in the load profiling software that the data center runs on the entire mainframe. I can confirm the CPU cores are at low utilization, so that's not the problem. Any ideas what could be causing that. Is there a debugging feature available for Boinc apps?
The app in question is "Gamma-ray pulsar binary search #1 on GPUs v1.24 () x86_64-pc-linux-gnu"
Keith Myers wrote: Mikey,
)
It's not mine it's the Official Grid Coin White List page
but yes I agree WCG is no longer run by IBM for a one and Minecraft is no longer active for another.
EDIT: I'm also seeing some
)
Im sure someone can tell better than me, if im wrong, please, correct it. You mentioned on other post that the GPUs are FP32 capable, as far as i know, when that's the case, with FGRPB1G tasks (the Gamma-ray app you mentioned), the FP64 portion of the task is done on the CPU, so the GPU load drops. But as you also mentioned the CPU cores are at low utilization, i dont really know what could be.
Until last year i had a notebook with a Nvidia MX-150 dedicated graphics crunching FGRPB1G tasks, the last 10% of each task took longer and the GPU load drops with the CPU load increasing. But, the MX-150 is FP64 capable, so i'm not sure whats happening.
EDIT: The Tesla T4 is also FP64 capable, like the MX-150.
Dido, Glad you got it
)
Dido,
Glad you got it running. On the regular leaderboard you would need all those systems running under a single computer ID to compete for the top individual system.
You should be able to compete for the top user however.
Your nodes will use considerable more power under load than idling. Who is paying for it?
I note you are showing 46 distinct systems. Not 1,000 :)
May this activity not cause you harm.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I thought I got it running,
)
I thought I got it running, but no. If I deploy around 20 VMs concurrently, everything seems to work just fine, but as I increase the number of active VMs, everything becomes unstable. On some VMs the Boinc client would randomly crash, on others I see inexplicable performance degradation, chaotic fluctuations in GPU load, while CPU utilization never exceeds 40% per thread. I've looked extensively at the performance and load profiling data from the mainframe and I can't explain why these issues occur. This is a typical case of demonic possesion. Perhaps, I should hire an exorcist. WTF.
To answer your question about who is paying for the electricity - nobody. The entire data center has an independent power grid. The source of that power is renewable energy. When you see big companies brag about being "carbon neutral", this is what it means.
Dido wrote: I thought I got
)
I have no clue but your process to figure it out should help the programmers once you have to give it back to them again.
WOO HOO!!
Did he ever demonstrate the
)
Did he ever demonstrate the ability to place a system in the top 5 or 10 at e@h?
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Did he ever
)
each host was setup as a VM with a single GPU. Nvidia T4s aren’t that powerful and probably wouldn’t even make top 50.
but I’m not sure if he ever got it working. He said before that he had a lot of weird issues when all systems were under load.
_________________________________________________________________________