When the WU reaches the 90% it stops. I know why but when will this be fixed? I feel like i'm wasting around the 33% of my, let's call it, GPU time. My GPU crunches for 15-20 minutes and then remains stuck for almost 7-8 minutes while using the CPU. Why can't i start a new GPU WU while the last WU is finishing with the CPU? Or when this will be fixed?
Copyright © 2024 Einstein@Home. All rights reserved.
Luca wrote: When the WU
)
The easy answer is to start a 2nd task when you get to the 90% spot where it switches to cpu crunching to finish up the task. You can set it on the website by going below where you pick which kind of tasks you want to run and setting it to 50%, then once the 2nd task starts suspend it until you get to where the first task is not doing any work on the gpu and unsuspend the 2nd task, In practice you may have to suspend every Einstein gpu task except the 1st task and then resume them once the 1st task gets far enough along for you. Just be sure your gpu has at least 8gb of ram on it so you don't get out of memory errors for the tasks.
Should i do it manually? It
)
Should i do it manually? It looks quite time consuming and impossible for most of the tasks.
Luca wrote:Should i do it
)
Unless your tasks all have the same exact time of running, you will either need to occasionally get manually involved, or just let them ride it out. If you are so intent on using 100% of the GPUs 100% of the time, then yes, you will be required to monitor it constantly and intervene when necessary.
The reason your tasks are using both the GPU and CPU is that to finish the task, the CPU does a higher percentage of double precision than a GPU. It will not be "fixed" by the task's developers. Just monitor your tasks and start a second task when the first one completes in using the GPU.
Proud member of the Old Farts Association
Luca wrote: Should i do it
)
Exactly. It would be very time consuming on your part.
There is no easy fix. And even when I tried running 2x and suspending the tasks till they were offset significantly, they still ended up in the 90% together most of the time.
It appears I get my best production at 1x.
Tom M
A Proud member of the O.F.A. (Old Farts Association).
Tom M wrote:And even when I
)
Interesting, with GW task I offset them once and didn't have to intervene. However, since I run 2 GW tasks but only 1 MeerKAT task at at time - if the application is switching tasks and goes back to GW of course 2 tasks start at the same time again. So by my experience the solution is to either run all tasks at 2x, just run one type, or manually offset them again.
With my AMD W7600 I get the highest RAC with the BRP7 (MeerKAT) tasks and one task at a time, but the scheduler prefers to give that machine GW tasks so I go with it and GW tasks profit a lot from being offset as they require more CPU crunching. If your goal is the highest possible RAC you might want to try which tasks run best on your GPU and then stick to them.
In one sense it will never be
)
In one sense it will never be fixed in the way that you mean. It another fashion it already has been!
By that I mean : the re-examination of the GPU data by the CPU is inevitable ( various reasons including double precision ) given the relatively poor implementation of IEEE standards for floating point on the commonest GPUs that E@H contributors have ( on 'consumer' or 'gaming' cards ). That lack of standards compliance is just not going to yield sensible science if not accounted for in the search strategy ie. the validity of the entire search is at risk otherwise.
{ Aside : we don't want to get a reputation for misleading work! }
However the search is still faster overall ( in general ) than doing the initial search ( fast Fourier transforms ) via CPU followed by a toplist candidate filtering scheme, again on CPU. An FFT of the size typical for E@H ( ~222 points ) is simply at awesome speed when done on the parallel architecture that GPUs offer. In this sense we have already converged on the best solution - or close to it - for the commonest host hardware combinations that E@H encounters.
So that's the balance that has been struck between speed on the one hand versus reliable answers on the other. But please do try the other suggestions made here, they may help.
{ Now in a perfect world we could all afford DGX-A100 systems that carry eight Nvidia A100 Tesla cards @ $200K USD ..... drool :-) }
{ The currently unobtainable 'unicorn solution' for this is a coherent search over a year long data set. There is not enough computing power on the planet for that! }
Cheers, Mike.
( edit ) The sensitivity of the numerical analysis depends upon the methodology of the search, as does the computational cost. With regard to searching for continuous GWs, the raison d'etre of E@H, see the full gore of that, say, here and here. To date we have not conclusively discovered a continuous GW, but have set bounds on the parameters of any that might exist, see here for example. Note that one is accustomed to thinking of noise as a fraction of the signal strength, but for continuous GW detection the reverse is true. The expected signal is a mild drift back & forth upon a noisy ( non-target ) background and this is responsible for much subtlety in the signal processing. If only there could be a system-on-chip solution for this quandary!
( edit ) It occurs to me that it would be useful to know to what degree, if any, does the candidate data have to remain on the GPU while the CPU is doing it's follow on thing. Put another way : what's the exact detail of the handover of the tasks from GPU to CPU ? Does anyone know this ?
( edit ) Silly me. Take, say, an 'All-Sky Gravitational Wave search on O3 v1.06 (GW-opencl-nvidia)' work unit stderr output. It looks to me like the results from the GPU stage are written to a temporary file ( if windows, found in the <*PUT_YOUR_BOINC_DISK_HERE*>:\ProgramData\BOINC\projects\einstein.phys.uwm.edu directory ) which is then taken up by the CPU for candidate filtering. So that would imply that once the initial candidate list is formed by the GPU it is indeed free for other things.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Mike Hewson wrote: { Now
)
So would the Tesla gpu's work here at Einstein?
mikey wrote: Mike Hewson
)
If I had that system I wouldn't care, it's 7nm technology ! ;-)
Seriously : if OpenCL compliant drivers emerge then they might, and I can't find any reference to that on NVidia documents. At least it is IEEE compliant for FP64. Anyway if we all had one then I'm sure that E@H devs would oblige with 8 x 7936 = 63,488 CUDA cores per system to play with.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
mikey wrote: Mike Hewson
)
why wouldnt they? they've shown up here before. I've temporarily rented some hosts like this before. they work fine as long as you have drivers installed.
_________________________________________________________________________
mikey wrote: Mike Hewson
)
I feel like those show up every once in a while.
Also, I will call your DGX-A100 system and raise you with the DGX-H100 system. What's a couple more hundred thousand dollars? Almost 3x the FP32 computational power of the A100. Absolutely insane. Also, you will need a small power plant to use one of these.