O3 almost not using any GPU power on a Mac

[AF>Le_Pommier] Jerome_C2005
[AF>Le_Pommier]...
Joined: 1 May 10
Posts: 38
Credit: 111591916
RAC: 0

My personal conclusion is

My personal conclusion is that it cannot be doing anything useful from boinc/project/science perspective, so what is the point ?

Or could it actually still "do something useful" ?

[AF>Le_Pommier] Jerome_C2005
[AF>Le_Pommier]...
Joined: 1 May 10
Posts: 38
Credit: 111591916
RAC: 0

As I have a head for ideas

As I have a head for ideas and after a suggestion on l'AF forum, I decided to do a new test to measure the GPU temperature and the GPU electrical consumption when I run O3 task on the ATI GPU of my iMac.

And it's interesting : I switch from a primegrid task (I suspended the project) to one O3 GW-opencl-ati-2 task, and few minutes later I configured an app_config to run 3 in parallels (because each one uses max 1/3 of a CPU thread and I don't want to use more than one).

The GPU usage as before seem to drop down to nothing

But the temperature first falls but then rises again, to a lower level than the primegrid task, but "not that low"

and the same happen with the watts : a sudden drop and then a "somewhat lower level", but not "nothing"

I decided to let it work for the night and maybe tomorrow and I'll see what happens to these tasks on my project account page.

mikey
mikey
Joined: 22 Jan 05
Posts: 12676
Credit: 1839076411
RAC: 3989

[AF>Le_Pommier wrote:

  ..

[AF>Le_Pommier] Jerome_C2005
[AF>Le_Pommier]...
Joined: 1 May 10
Posts: 38
Credit: 111591916
RAC: 0

?

?

 

(I see nothing in your answer)

mikey
mikey
Joined: 22 Jan 05
Posts: 12676
Credit: 1839076411
RAC: 3989

[AF>Le_Pommier wrote:

[AF>Le_Pommier wrote:

Jerome_C2005]

?

 

(I see nothing in your answer)

It was in the wrong thread so  I deleted it

[AF>Le_Pommier] Jerome_C2005
[AF>Le_Pommier]...
Joined: 1 May 10
Posts: 38
Credit: 111591916
RAC: 0

Correct me if I'm wrong but

Correct me if I'm wrong but these 3 tasks (1, 2, 3) have been running for 15 hours, they end in success status are actually crashing at some point (like 1 hour after I started them)

-- signal handler called: signal 15
SIGILL: illegal instruction
Stack trace (7 frames):
0   einstein_O3AS_1.07_x86_64-apple-dar 0x000000010e741a40 boinc_catch_signal + 224
1   ???                                 0x0000000000000000 0x0 + 0
2   libdispatch.dylib                   0x00007ff816a2d923 _dispatch_dispose + 90
3   OpenCL                              0x00007ffa337d53ed clCreateCommandQueueWithPropertiesAPPLE + 4372
4   OpenCL                              0x00007ffa337bdfe0 clFlush + 65804
5   OpenCL                              0x00007ffa337ada0f clReleaseCommandQueue + 25
6   einstein_O3AS_1.07_x86_64-apple-dar 0x000000010e5cb224 XLALOpenCLDestroyStream + 68

Crashed executable name: einstein_O3AS_1.07_x86_64-apple-darwin__GW-opencl-ati-2
Machine type Intel x86-64h Haswell (64-bit executable)
System version: Macintosh OS 14.1.2 build 23B92
Mon Jan 29 22:36:30 2024

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
atos cannot load symbols for the file einstein_O3AS_1.07_x86_64-apple-darwin__GW-opencl-ati-2 for architecture x86_64.
0 einstein_O3AS_1.07_x86_64-apple-dar 0x000000010e75532e

Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0100002f rbx: 0x7ff7b1a648b0 rcx: 0x7ff7b1a64828 rdx: 0x2800001513
rdi: 0x7ff7b1a648b0 rsi: 0x200000003 rbp: 0x7ff7b1a64890 rsp: 0x7ff7b1a64828
r8: 0xe1300000000 r9: 0x80300000000 r10: 0x80300000103 r11: 0x00000206
r12: 0x00000000 r13: 0x200000003 r14: 0x00001470 r15: 0x80300000000
rip: 0x7ff816b96a6e rfl: 0x00000206

and continue to "crunch" all that time, a few hours after the previous "event" it's like restarting it in debug mode or something ?

Exiting...
putenv 'LAL_DEBUG_LEVEL=3'
2024-01-30 00:20:10.5628 (4701) [normal]: This program is published under the GNU General Public License, version 2
2024-01-30 00:20:10.5636 (4701) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2024-01-30 00:20:10.5636 (4701) [normal]: This Einstein@home App was built at: Nov  9 2023 13:28:22

2024-01-30 00:20:10.5637 (4701) [normal]: Start of BOINC application 'einstein_O3AS_1.07_x86_64-apple-darwin__GW-opencl-ati-2'.

and then at 6am it finishes

2024-01-30 06:22:47.7088 (4701) [normal]: Finished main analysis.
2024-01-30 06:22:47.7089 (4701) [normal]: Recalculating statistics for the final toplist(s)...
2024-01-30 06:28:16.0005 (4701) [normal]: Finished recalculating toplist statistics.
2024-01-30 06:28:16.0016 (4701) [normal]: Finished in 0.00 s with peak RAM usage: 1564.0 MB on CPU 'Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz', peak VRAM usage: 1665.3 MB on GPU Device: 'AMD Radeon Pro 5700 XT Compute Engine ( Platform: Apple )' with backend: 'OpenCL'.
2024-01-30 06:28:16.0018 (4701) [debug]: Writing output ... Closing temp output file '../../projects/einstein.phys.uwm.edu/h1_1205.80_O3aC01Cl1In0__O3ASHF1b_1206.00Hz_22274_1_0.tmp' ... renaming temp output file '../../projects/einstein.phys.uwm.edu/h1_1205.80_O3aC01Cl1In0__O3ASHF1b_1206.00Hz_22274_1_0.tmp' to '../../projects/einstein.phys.uwm.edu/h1_1205.80_O3aC01Cl1In0__O3ASHF1b_1206.00Hz_22274_1_0' ... done.
2024-01-30 06:28:16.3887 (4701) [normal]: Restarted from checkpoint 455

but not really, because several hours later it finishes again "for real" the same "main analysis"

2024-01-30 13:50:53.0023 (4701) [normal]: Finished main analysis.
2024-01-30 13:50:53.0038 (4701) [normal]: Recalculating statistics for the final toplist(s)...
2024-01-30 13:57:02.0763 (4701) [normal]: Finished recalculating toplist statistics.
2024-01-30 13:57:02.0772 (4701) [normal]: Finished in 26925.55 s with peak RAM usage: 1564.0 MB on CPU 'Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz', peak VRAM usage: 1665.4 MB on GPU Device: 'AMD Radeon Pro 5700 XT Compute Engine ( Platform: Apple )' with backend: 'OpenCL'.

The 3 tasks have the exact same weird behaviour... are they really doing anything useful ?

I have 3 more tasks that have been running for 6 hours now, I'll let them finish too to see if it looks similar.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3060
Credit: 4961624353
RAC: 1397076

[AF>Le_Pommier wrote:

[AF>Le_Pommier wrote:

Jerome_C2005]

Correct me if I'm wrong but these 3 tasks (1, 2, 3) have been running for 15 hours, they end in success status are actually crashing at some point (like 1 hour after I started them)

and continue to "crunch" all that time, a few hours after the previous "event" it's like restarting it in debug mode or something ?

The 3 tasks have the exact same weird behaviour... are they really doing anything useful ?

I have 3 more tasks that have been running for 6 hours now, I'll let them finish too to see if it looks similar.

Quote:  "...they end in success status..."

Your All-Sky tasks are not ending successfully, you are getting "0" credit for completion.

Of the 9 project's tasks you have selected, you are receiving at least 3 project's credits, none of which is Einstein.  Your project preferences should be set up according to priority.  Also, take into consideration how much GPU utilization is needed for each  project's tasks.  Though your AMD Radeon Pro 5700 XT has 16GB of VRAM memory, you could be stretching your GPUs limits past the point of also completing Einstein's All-Sky tasks to a successful end with credits.

Suggestion:  Put a hold (i.e. suspend) all but Einstein in your Boinc Manager projects and then see what happens to your All-Sky tasks.  If they do complete with successful credits, and in much less time, then you have some work to do.

If they still do not complete successfully, then you may have a problem running them on your GPU.

HTH

George

Proud member of the Old Farts Association

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117464210605
RAC: 35507883

GWGeorge007 wrote:Your

GWGeorge007 wrote:
Your All-Sky tasks are not ending successfully, you are getting "0" credit for completion.

George, you should look more carefully.  The status of each task says "waiting for validation" and not "invalid".

The tasks were successfully completed, albeit at an excruciatingly slow pace, but a second result is needed before validation can proceed.

I have much older, legacy AMD GPUs (RX 570 4GB) completing these tasks in 25 - 30 mins per task running at x2 (2 tasks finish in just over 50 mins on average).  My best guess is that there is something wrong with his OpenCL installation.  I know nothing about Apple hardware but haven't Apple ditched OpenCL in favor of something called Metal??

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117464210605
RAC: 35507883

_AF>Le_Pommier_ Jerome_C2005

_AF>Le_Pommier_ Jerome_C2005 wrote:

...

Exiting...
putenv 'LAL_DEBUG_LEVEL=3'
2024-01-30 00:20:10.5628 (4701) [normal]: This program is published ...

I don't know anything about what might have been going on to cause the crash but somehow things seem to have been reset because the "putenv" line is exactly what you see for a normal task at the very beginning.

_AF>Le_Pommier_ Jerome_C2005 wrote:

...

and then at 6am it finishes

2024-01-30 06:22:47.7088 (4701) [normal]: Finished main analysis.

Because the "upgrade" to the O3AS app split the full task into two separate 'half-tasks' you should always see two of these messages, one for each 'half-task' as it completes. Unfortunately, this is normal and doesn't give any clue as to why the processing is so abysmally slow. Does your OpenCL installation come with the clinfo utility? Can you run that to see what it says?

You should really stop trying to run at x3 until you can see single tasks completing successfully in a reasonable time. Probably a very small part of your current problem stems from the fact that you stated earlier that there would be no problem running 3 tasks using the support of just one CPU thread. Since each task has substantial portions of 'CPU only' running (the end part of each half-task), things will slow down quite a lot if there is only 1 CPU available for the job each time those sections are encountered.

Cheers,
Gary.

[AF>Le_Pommier] Jerome_C2005
[AF>Le_Pommier]...
Joined: 1 May 10
Posts: 38
Credit: 111591916
RAC: 0

Thanks for your comments !

Thanks for your comments ! I'm actually just diving into PrimeGrid February Tour de Primes so I'm afraid new test will have to wait... I just let finish the last 3 O3 that are almost completed, after such an abysmally and excruciatingly long processing time it would be a shame to cancel them, right ? :D

But I will try "one single task with nothing else running" to see how it goes... in the future.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.