Very long delay on Nvidia GPU workloads

masitcgitw
masitcgitw
Joined: 22 Jul 23
Posts: 2
Credit: 1224794
RAC: 0
Topic 229841

When using my laptop, I get a very long delay on tasks which are labeled as ".9 CPU 1 Nvidia GPU." It processes up to the point of being at 99.000% and then just sits there for anywhere from 5 to 25 minutes. The laptop has a 4070 and an intel iGPU. The iGPU workloads go through fine. In fact one time, they started at almost the same time, and the Nvidia task got to 99.000% while the iGPU was still at around 35%, then the Nvidia task sat there long enough that the iGPU finished its task, and started another one before the Nvidia one finally finished. To begin with I had Milkyway@home running at the same time, with a "16 CPU" task going, and there were 4 "1 CPU" einstein tasks going. I thought maybe since the CPU has 20 threads, there just wasn't any resources for the Nvidia task. I told einstein not to download any new tasks, and suspended milkyway. The 4 "1CPU" tasks that had already been downloaded for einstein started, giving a total of 8 "1 CPU" tasks going, and the Nvidia task still sat there at 99.000% for more than 5 minutes before I had to get up and do something else. Eventually they do finish, but it's literally just wasting between 5 and 25 minutes for each of these tasks. I do not see the same thing happen on my 2019 MacBook Pro with an AMD Radeon Pro 5500M. They process through much slower, but at a pretty steady rate all the way through to completion. Any suggestions?

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3117
Credit: 5008726749
RAC: 1570133

Hello MASITCGITW, You

Hello MASITCGITW,

You haven't told what tasks you are using, and we can only infer your Milkyway CPU tasks as the "16 CPU".

If you are using both your iGPU and your 4070 to render GPU tasks, that may be why you are waiting so long for the tasks to complete.

It would help us a lot if you were to un-hide your computer so we could actually see what is going on.

George

Proud member of the Old Farts Association

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861024624
RAC: 1443392

masitcgitw wrote: When using

masitcgitw wrote:

When using my laptop, I get a very long delay on tasks which are labeled as ".9 CPU 1 Nvidia GPU." It processes up to the point of being at 99.000% and then just sits there for anywhere from 5 to 25 minutes. The laptop has a 4070 and an intel iGPU. The iGPU workloads go through fine. In fact one time, they started at almost the same time, and the Nvidia task got to 99.000% while the iGPU was still at around 35%, then the Nvidia task sat there long enough that the iGPU finished its task, and started another one before the Nvidia one finally finished. To begin with I had Milkyway@home running at the same time, with a "16 CPU" task going, and there were 4 "1 CPU" einstein tasks going. I thought maybe since the CPU has 20 threads, there just wasn't any resources for the Nvidia task. I told einstein not to download any new tasks, and suspended milkyway. The 4 "1CPU" tasks that had already been downloaded for einstein started, giving a total of 8 "1 CPU" tasks going, and the Nvidia task still sat there at 99.000% for more than 5 minutes before I had to get up and do something else. Eventually they do finish, but it's literally just wasting between 5 and 25 minutes for each of these tasks. I do not see the same thing happen on my 2019 MacBook Pro with an AMD Radeon Pro 5500M. They process through much slower, but at a pretty steady rate all the way through to completion. Any suggestions? 

The new All Sky tasks are very computationally intense, hence the 5k credits for each one, and take longer than the other tasks here at Einstein, and yes there is a some cpu stuff going in the beginning and ending of each gpu task.

masitcgitw
masitcgitw
Joined: 22 Jul 23
Posts: 2
Credit: 1224794
RAC: 0

Sorry, computers are now

Sorry, computers are now unhidden. Didn't think about that.

Yes the "16 CPU" workloads were for milkyway. After doing some more testing it looks to me like the counter is just inaccurate/very optimistic. I closed out of steam/epic, closed all background apps, paused all tasks except one of the ".9 CPU 1 Nvidia GPU" All-Sky tasks. After running two that way both performed almost the same. First task took 13:12 to get to 99.000%, second one took 12:44 to get to 99.000%. Both times the CPU usage stayed at a very steady 15% for the first 99%, and then dropped to 14% after getting to 99.000%. Both times the GPU usage was mainly staying between 93% and 100% with very brief dips to 60-65% until reaching that 99.000% complete point, at which time it dropped to 0% usage. The first time estimated that there was 25 seconds remaining at that point, the second time estimated 22 seconds remaining. Both times just went down 1 or 2 seconds, and then back up to either 22 or 25 seconds. The first time actually took just over 5 more minutes to get that last 1% and the second time was 4:45 to get that final 1%. The CPU is obviously still working on stuff, as even though it did drop by 1% down to 14%, it sits down at 2% usage with very brief increases to 3% when all tasks are suspended. It makes perfect sense that if it takes between 4 and 5 minutes for the CPU to complete after the GPU portion is done, when nothing else is running and CPU usage is only at 14% or 15%, that it would take the much longer times I was seeing when all the other tasks are running and CPU usage is spiked at 100% non-stop. It was just confusing me, as I wouldn't think of that as being 99.000% complete when that part was actually only 70-75% of the time needed to complete the task. And the estimated time remaining showing as 25 or 22 seconds once reaching that point makes me wonder what massive single core performance must be needed to reach that time. Lol.

TL;DR, sorry for the confusion, it appears as if it is actually still doing work and not just sitting there doing nothing as it first appeared to me. Thanks guys!!!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118328989120
RAC: 25366200

masitcgitw wrote:... I

masitcgitw wrote:
... I wouldn't think of that as being 99.000% complete when that part was actually only 70-75% of the time needed to complete the task. And the estimated time remaining showing as 25 or 22 seconds once reaching that point makes me wonder what massive single core performance must be needed to reach that time. Lol.

Thank you for deciding to support Einstein@Home.

Unfortunately, since you joined very recently, you wouldn't be aware of a particular long-time characteristic of many E@H CPU and GPU searches.  The calculations are usually done in two separate stages.  The first stage is the longest (~90% or 99% of the time) but these values are just guesses based on previous performance over the years.

The primary stage is to extract a list of potential candidate signals.  The secondary stage is to re-evaluate those candidates (possibly in double precision) to create a 'toplist' of the most promising candidates.  For GPU searches, the GPU may be used for both, but in the current GW search, the announcement thread did mention that this time the list of candidates was going to be much longer and that the processing of candidates would be entirely on the CPU, at least initially.  Unfortunately the estimate must have been left at the usual 99% for the point where the second stage would kick in, despite the warning.  People with experience do understand that the 'followup stage' can be lengthy and will have no visible signs of progress.   I'm sorry you got caught with that :-).

For future reference, the announcement threads when new searches are started is always worth reading since 'unusual' features or behaviour tend to get mentioned :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.