I could isolate the problem: it's an issue in the thread management where a specific local memory synchronization barrier isn't reached by all threads, so the process waits forever.
I tried a few ways to get around the problem but without any luck. Sure, removing the barrier gets things moving on the Turing (and Volta) but the computations would likely be incorrect, so we need to find the actual root cause.
An educated guess makes me think that NVIDIA's Independent Thread Scheduling (introduced with Volta) might be the culprit here, so that's what I'll look into next. Not sure how that applies to OpenCL, though...
If you are running on a Turing or Volta GPU, I'd say yes. You can then opt-out of this application until we fixed the problem on those GPU architectures.
So is this the new plan of attack for Turing/Volta? Prevent sending incompatible tasks vice fixing the application itself?
Actually we will do both. The limiting factor is manpower, and producing and sending compatible workunits is the most efficient thing we can do right now.
The reason for this problem appears to be a new feature of Turing/Volta ("independent thread scheduling"), which we have very limited control of in OpenCL. We might again intensify our efforts to develop a CUDA version that will likely give us more performance on NVidia cards and solve this problem as well. But we need more time for this than what we currently have.
Thanks for that detail. I'll
)
Thanks for that detail. I'll dig out a similar test case then to avoid any red herrings...
Update:
That's something to work with! Off debugging...
Oliver
Einstein@Home Project
Weekend update:I could
)
Weekend update:
Hang in there, we're on it...
Oliver
Einstein@Home Project
Thanks for the Update
)
Thanks for the Update
Supporting BOINC, a great concept !
Hope I have the "right"
)
Hope I have the "right" thread for my question/problem:
FGRPopencl1K-nvidia ... LATeah2103L VERY long elapsed times ( around 2 hours !).
WU 388601504
computer 12761622
Should I cancel/abort these ?
I am no "expert". Do you need more informations?
THANKS
UPDATE:
Task ID 827356700 says after over 2 hours "new" and zero time ?
Used to run under 10 minutes.
If you are running on a
)
If you are running on a Turing or Volta GPU, I'd say yes. You can then opt-out of this application until we fixed the problem on those GPU architectures.
Oliver
Einstein@Home Project
OK - tried to read this
)
OK - tried to read this thread, but coudn't really grasp what/which tasks are meant.
THANKS.
The workunits currently being
)
The workunits currently being generated (LATeah10xxL) should work also on Turing and Volta cards. We'll continue to investigate.
BM
We'll try to figure out a way
)
We'll try to figure out a way to prevent sending the scheduler older tasks to such cards.
BM
So is this the new plan of
)
So is this the new plan of attack for Turing/Volta? Prevent sending incompatible tasks vice fixing the application itself?
Keith Myers wrote:So is this
)
Actually we will do both. The limiting factor is manpower, and producing and sending compatible workunits is the most efficient thing we can do right now.
The reason for this problem appears to be a new feature of Turing/Volta ("independent thread scheduling"), which we have very limited control of in OpenCL. We might again intensify our efforts to develop a CUDA version that will likely give us more performance on NVidia cards and solve this problem as well. But we need more time for this than what we currently have.
BM