Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519431407

RAC: 15512

Ian&Steve C. wrote:what about

16 May 2020 13:53:11 UTC

Message 177808 in response to message 177802

(moderation:

)

Ian&Steve C. wrote:

what about those two tasks from just yesterday that errored out from not enough memory?

https://einsteinathome.org/host/12735373/tasks/6/0

it works fine... until it doesn't. that's why you shouldn't use cards with less than 4GB right now. unless you're fine trashing work every now and then and letting someone else process it?

I guess it depends on whether the Einstein servers want the most work done or care about network and disk bandwidth sending things out twice.

I've switched all mine to Gamma. Four 3GB cards and one 4GB (but slower) card.

Can't the server work out what the RAM requirement is and send out only little tasks to smaller cards?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

Two errors on May 14, 5 on

16 May 2020 15:22:43 UTC

Message 177809

(moderation:

)

Two errors on May 14, 5 on May 16. In these last all wingmen errored out.

In May 14 only one out of 5 wingmen completed a task. Is there something wrong with the tasks?

Tullio

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3985

Credit: 47426792642

RAC: 60743687

Peter Hucker wrote: Can't

16 May 2020 15:22:16 UTC

Message 177810 in response to message 177808

(moderation:

)

Peter Hucker wrote:

Can't the server work out what the RAM requirement is and send out only little tasks to smaller cards?

it tries to. but as I showed in one of my previous posts the scheduler has a bug in it that isn't always estimating the GPU RAM required properly. it doesnt seem to ever think a tasks needs more than like 1800MB, even when the task does need more. so it gets sent to cards with at least 1800MB memory. that's the first problem.

the second problem is that they are looking at global (total) memory on the GPU, and not how much is actually available. BOINC records both values. by looking at this global value, these 1800MB tasks will be sent to 2GB GPUs, but if that GPU is driving a desktop environment (which is likely the case), then ~300MB of the GPU is not available and the 1800MB tasks will still fail.

they need to switch to looking at available memory, and fix the bug that's underestimating the ram required. I posted this info in the technical news thread. they either haven't seen it or haven't had the time/resources to get to it. best thing to do for now is that if you're aware of the problem, the user can and should just not crunch GW tasks on GPUs with <4GB VRAM, as you have done.

_________________________________________________________________________

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3985

Credit: 47426792642

RAC: 60743687

tullio wrote:Two errors on

16 May 2020 15:40:55 UTC

Message 177811 in response to message 177809

(moderation:

)

tullio wrote:

Two errors on May 14, 5 on May 16. In these all wingmen errored out.

Tullio

no they didn't. all the GPUs without enough memory, including yours, are the ones that errored it out. the two from May 14th have a successful completion by a host with RTX 2070 Super GPUs which have sufficient amount of GPU memory for the task. it's waiting to be sent to a GPU with enough memory.

the tasks from today, May 16th, also have only been sent to GPUs without enough memory so far, and are waiting to be sent to a host that can actually process them.

they will succeed once sent to a proper host. The WU isnt "bad" as you're trying to imply.

the apparently large amount of hosts with 2-3GB GPUs have got to be the main reason there are so many resends necessary and why so many tasks are still waiting for validation. the proper fix should be applied server side at the project, but until that is done, users with 2-3GB GPUs can help the situation by just removing them from GW for now.

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519431407

RAC: 15512

Ian&Steve C. wrote:they need

16 May 2020 16:40:10 UTC

Message 177813 in response to message 177810

(moderation:

)

Ian&Steve C. wrote:

they need to switch to looking at available memory, and fix the bug that's underestimating the ram required. I posted this info in the technical news thread. they either haven't seen it or haven't had the time/resources to get to it. best thing to do for now is that if you're aware of the problem, the user can and should just not crunch GW tasks on GPUs with <4GB VRAM, as you have done.

I just had a quick go at running GW on one of my new machines, and the RAM wasn't the problem in this case (I was lucky enough to get small ones).

CPU: Intel Xeon X5650 (x2)
GPU: AMD Radeon R9 280X (x2)

For the purposes of the test, I paused all CPU WUs.

Running 1 GW task per GPU
GPU RAM is at 1.7GB used per card out of 3GB (neither is used to display the screen)
One CPU core is maxed out per card/WU
Each card is running at only 25%!

What are the ratios of speeds of your cards (on your GW machine) compared to mine, and your CPU compared to mine? Because it seems like I'd need a much more powerful CPU, but if I had better cards like you do, I'd need a CPU that hasn't been invented yet!

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3985

Credit: 47426792642

RAC: 60743687

Peter Hucker wrote:I just

16 May 2020 17:38:39 UTC

Message 177814 in response to message 177813

(moderation:

)

Peter Hucker wrote:

I just had a quick go at running GW on one of my new machines, and the RAM wasn't the problem in this case (I was lucky enough to get small ones).

CPU: Intel Xeon X5650 (x2)
GPU: AMD Radeon R9 280X (x2)

For the purposes of the test, I paused all CPU WUs.

Running 1 GW task per GPU
GPU RAM is at 1.7GB used per card out of 3GB (neither is used to display the screen)
One CPU core is maxed out per card/WU
Each card is running at only 25%!

What are the ratios of speeds of your cards (on your GW machine) compared to mine, and your CPU compared to mine? Because it seems like I'd need a much more powerful CPU, but if I had better cards like you do, I'd need a CPU that hasn't been invented yet!

I see you're running windows, where are you getting the 25% value from? is that GPU utilization?

but after inspecting the GW_nvidia binary a little, I think your issue is the lack of AVX instruction support on your CPU. Your CPUs are old, yes. but it looks like the CPU portion of the GW app uses AVX for some of its calculations, and your CPU doesn't have AVX so it's probably failing over to a slower method which is making the GPU wait around longer and not be used as much. My CPUs do support AVX. my gaming system's 9700K supports AVX2.

you should drop that card into your i5-8600k (it supports AVX and AVX2) system and see if the GPU utilization is better

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519431407

RAC: 15512

Ian&Steve C. wrote:I see

16 May 2020 18:19:01 UTC

Message 177815 in response to message 177814

(moderation:

)

Ian&Steve C. wrote:

I see you're running windows, where are you getting the 25% value from? is that GPU utilization?

GPU utilization is in agreement from Windows task manager (compute 1 graph), MSI Afterburner (a fan speed and overclocking tool for AMD cards), and GPU-Z. Windows task manager shows a full CPU core taken. If I run two GW simultaneously on a card (providing enough RAM is available on the card), two CPU cores are taken fully, and the GPU usage increases to 50%.

Ian&Steve C. wrote:

but after inspecting the GW_nvidia binary a little, I think your issue is the lack of AVX instruction support on your CPU. Your CPUs are old, yes. but it looks like the CPU portion of the GW app uses AVX for some of its calculations, and your CPU doesn't have AVX so it's probably failing over to a slower method which is making the GPU wait around longer and not be used as much. My CPUs do support AVX. my gaming system's 9700K supports AVX2.

you should drop that card into your i5-8600k (it supports AVX and AVX2) system and see if the GPU utilization is better

Not easy to do. My i5 is the main computer in the living room and I don't want it taken to bits or making noise.

But it does have a weaker card in it, the Radeon RX 560, and it only gets that up to 70%, so I doubt it would do well with the bigger cards. Maybe even better CPUs have more extensions it likes, or have a faster AVX part?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3985

Credit: 47426792642

RAC: 60743687

I only suggested it as a

16 May 2020 18:38:22 UTC

Message 177816

(moderation:

)

I only suggested it as a test, it was just curiosity on my part if you'd see greater GPU utilization on the same GPU with a more powerful CPU pushing it. you don't necessarily have to leave it that way. but it's your hardware. there may be more factors involved anyway, since the AMD cards use a different app, and I don't have any AMD systems to be able to inspect the AMD/ATI GW linux binary.

_________________________________________________________________________

TBar

Joined: 3 Apr 20

Posts: 24

Credit: 891961726

RAC: 0

I have compared the GW App on

16 May 2020 19:11:52 UTC

Message 177817 in response to message 177815

(moderation:

)

I have compared the GW App on both an older Core2 Quad and newer i7-8700 using a NV 970 in Ubuntu. The App is around 25% faster on the 8700, and it doesn't use a full CPU core on the newer CPUs running AMD GPUs. There isn't much difference with the GR App, but there is a large difference with GW App between older and newer CPUs. A recent GW test on a 'new' AMD 570 on an i7-6700 showed CPU usage around 45% and GPU usage around 60-70% with higher spikes. All this is under Ubuntu.

Peter Hucker wrote:

Ian&Steve C. wrote:
I see you're running windows, where are you getting the 25% value from? is that GPU utilization?

GPU utilization is in agreement from Windows task manager (compute 1 graph), MSI Afterburner (a fan speed and overclocking tool for AMD cards), and GPU-Z. Windows task manager shows a full CPU core taken. If I run two GW simultaneously on a card (providing enough RAM is available on the card), two CPU cores are taken fully, and the GPU usage increases to 50%.

Ian&Steve C. wrote:
but after inspecting the GW_nvidia binary a little, I think your issue is the lack of AVX instruction support on your CPU. Your CPUs are old, yes. but it looks like the CPU portion of the GW app uses AVX for some of its calculations, and your CPU doesn't have AVX so it's probably failing over to a slower method which is making the GPU wait around longer and not be used as much. My CPUs do support AVX. my gaming system's 9700K supports AVX2.

you should drop that card into your i5-8600k (it supports AVX and AVX2) system and see if the GPU utilization is better

Not easy to do. My i5 is the main computer in the living room and I don't want it taken to bits or making noise.

But it does have a weaker card in it, the Radeon RX 560, and it only gets that up to 70%, so I doubt it would do well with the bigger cards. Maybe even better CPUs have more extensions it likes, or have a faster AVX part?

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519431407

RAC: 15512

Ian&Steve C. wrote: I only

16 May 2020 19:21:08 UTC

Message 177818 in response to message 177816

(moderation:

)

Ian&Steve C. wrote:

I only suggested it as a test, it was just curiosity on my part if you'd see greater GPU utilization on the same GPU with a more powerful CPU pushing it. you don't necessarily have to leave it that way. but it's your hardware. there may be more factors involved anyway, since the AMD cards use a different app, and I don't have any AMD systems to be able to inspect the AMD/ATI GW linux binary.

I could. I had one of those cards in there before, it's just a lot of hassle to get into it. 5 of my machines are easily accessible, that one is not. Oh well, as far as I'm concerned, I'm happy as long as the cards are being fully utilised. At the moment that's Gamma and Milkyway. If I ever couldn't find a project I liked that fully used them, I'd build a more modern computer to run them. I'll let people like yourself with more modern chips do the gravity and I'll take care of the gamma.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner