app version refers to missing GPU type

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 405335435

RAC: 409929

21 Aug 2023 1:41:49 UTC

Topic 229977

(moderation:

)

It started a couple of days ago... on routine work fetches I'm getting these lines in the event log:

20-Aug-2023 16:44:06 [Einstein@Home] Sending scheduler request: To fetch work.
20-Aug-2023 16:44:06 [Einstein@Home] Requesting new tasks for CPU and NVIDIA GPU
20-Aug-2023 16:44:15 [---] app version refers to missing GPU type ibo,GBT,long) is not available for your type of computer.
20-Aug-2023 16:44:15 [Einstein@Home] Scheduler request completed: got 2 new tasks
20-Aug-2023 16:44:15 [Einstein@Home] App version uses non-existent ibo,GBT,long) is not available for your type of computer. GPU

... bold text added for emphasis ...

It appears that (Arecibo,GBT,long) got mangled somehow to come through as just ...ibo,GBT,long) and probably more text is missing, but it's not clear to me whether it's just me or whether it is an E@H server thing. I can't seem to find out which app it is referring to. If anybody else is seeing something like this then it's not just me... OTOH, if it IS JUST ME... is a project reset the best way to recover? I did think of deleting "the app" and let BOINC reload it. The only reference to (Arecibo,GBT,long) that I can find is in the client_state.xml file where it is shown as the "user friendly name" for the BRP4G app. Would it make sense to delete that app and see if BOINC recovers? Or, just do a project reset to cover all bases? Meanwhile, FGRP5 and O3AS work is continuing normally.

GWGeorge007

Joined: 8 Jan 18

Posts: 3176

Credit: 5156926723

RAC: 3841749

Eugene Stemple wrote: It

21 Aug 2023 3:21:10 UTC

Message 216103

(moderation:

)

Eugene Stemple wrote:

It started a couple of days ago... on routine work fetches I'm getting these lines in the event log:

20-Aug-2023 16:44:06 [Einstein@Home] Sending scheduler request: To fetch work.
20-Aug-2023 16:44:06 [Einstein@Home] Requesting new tasks for CPU and NVIDIA GPU
20-Aug-2023 16:44:15 [---] app version refers to missing GPU type ibo,GBT,long) is not available for your type of computer.
20-Aug-2023 16:44:15 [Einstein@Home] Scheduler request completed: got 2 new tasks
20-Aug-2023 16:44:15 [Einstein@Home] App version uses non-existent ibo,GBT,long) is not available for your type of computer. GPU

... bold text added for emphasis ...

It appears that (Arecibo,GBT,long) got mangled somehow to come through as just ...ibo,GBT,long) and probably more text is missing, but it's not clear to me whether it's just me or whether it is an E@H server thing. I can't seem to find out which app it is referring to. If anybody else is seeing something like this then it's not just me... OTOH, if it IS JUST ME... is a project reset the best way to recover? I did think of deleting "the app" and let BOINC reload it. The only reference to (Arecibo,GBT,long) that I can find is in the client_state.xml file where it is shown as the "user friendly name" for the BRP4G app. Would it make sense to delete that app and see if BOINC recovers? Or, just do a project reset to cover all bases? Meanwhile, FGRP5 and O3AS work is continuing normally.

I'm not quite sure what has happened, but I'll try to offer some explanations as to what I think it could be.

First, I'd recommend that you upgrade your client from 7.14 to at least 7.18. If I'm correct, the outdated client may have something to do with your missing GPU because it is no longer recognizing your computer, therefore can not see your GPU. In addition, BOINC is no longer using HTTP for addressing your computer now, it only sees HTTPS which was not in operation at the time you had 7.14 installed.

Second, I'd recommend that you reduce your project selections to one and see if it does recognize it again. Yes, you do have 6GB in your 1060 GPU, but you are using only 4GB of it. Some of these projects take over 4GB now and therefore won't recognize your GPU.

Try using one at a time until you get one that works.

If this does or does not work, we'd appreciate it if you get back to us and tell us the good or bad news.

George

Proud member of the Old Farts Association

mikey

Joined: 22 Jan 05

Posts: 12886

Credit: 1884401265

RAC: 108697

Eugene Stemple wrote: It

21 Aug 2023 17:13:11 UTC

Message 216119

(moderation:

)

Eugene Stemple wrote:

It started a couple of days ago... on routine work fetches I'm getting these lines in the event log:

20-Aug-2023 16:44:06 [Einstein@Home] Sending scheduler request: To fetch work.
20-Aug-2023 16:44:06 [Einstein@Home] Requesting new tasks for CPU and NVIDIA GPU
20-Aug-2023 16:44:15 [---] app version refers to missing GPU type ibo,GBT,long) is not available for your type of computer.
20-Aug-2023 16:44:15 [Einstein@Home] Scheduler request completed: got 2 new tasks
20-Aug-2023 16:44:15 [Einstein@Home] App version uses non-existent ibo,GBT,long) is not available for your type of computer. GPU

... bold text added for emphasis ...

It appears that (Arecibo,GBT,long) got mangled somehow to come through as just ...ibo,GBT,long) and probably more text is missing, but it's not clear to me whether it's just me or whether it is an E@H server thing. I can't seem to find out which app it is referring to. If anybody else is seeing something like this then it's not just me... OTOH, if it IS JUST ME... is a project reset the best way to recover? I did think of deleting "the app" and let BOINC reload it. The only reference to (Arecibo,GBT,long) that I can find is in the client_state.xml file where it is shown as the "user friendly name" for the BRP4G app. Would it make sense to delete that app and see if BOINC recovers? Or, just do a project reset to cover all bases? Meanwhile, FGRP5 and O3AS work is continuing normally.

A reset will delete every Einstein task you have on your pc and force you to get all new ones

Keith Myers

Joined: 11 Feb 11

Posts: 5055

Credit: 19164388660

RAC: 5361195

Quote:Yes, you do have 6GB in

21 Aug 2023 18:23:10 UTC

Message 216120 in response to message 216103

(moderation:

)

Quote:

Yes, you do have 6GB in your 1060 GPU, but you are using only 4GB of it. Some of these projects take over 4GB now and therefore won't recognize your GPU.

This is incorrect. The displaying of only 4GB of memory for the 1060 GPU is only a flaw in older BOINC versions. Remedied in later versions to use 64 bit calls for probing memory. The OP really should upgrade their client.

The gpu applications will use ALL of the card installed memory regardless of what BOINC reports if the application is well designed.

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 405335435

RAC: 409929

See below for some more

22 Aug 2023 7:34:06 UTC

Message 216140

(moderation:

)

See below for some more recent diagnostic searching...

But first, responding to some of the suggestions/issues in the responses.

[gwgeorge & keith] suggest upgrading the client (to something later than my 7.14.2). I am intentionally holding at the 7.14.2 version for two reasons. (1) using "project_max_concurrent" in the app_config.xml file fails catastrophically in later versions - see other threads regarding downloading work units endlessly when using that parameter; (2) the later versions do NOT have, in the FILE pull-down menu, "shutdown connected client" and "exit BOINC manager". I find both of those functions very useful to shutdown, and resume, BOINC gracefully in my setup with two instances of boinc and boincmgr running different projects.

[gwgeorge] The https: configuration is set up in the project global_prefs.xml file and as far as I know is not dependent on the client version. And, anyway, that part of the server link is working properly. And, to clarify, e@h is not failing to detect the GPU. It is running O3AS (opencl) tasks normally.

[mikey] Yes, I know all the bad things a project "reset" would do. I would do an NNT and drain the cache before going down that path. But if nothing else helps then that is always an option.

[keith] Following up on your 4GB reporting limit in older clients... I'm finding all kinds of <gpu_ram> parameters reported in the client_state.xml file. 7.864G for the FGRPB1G app down to 2.004G for the O3MDF app. And in <coproc_cuda> parameters <available_ram> is 4.167G while <coproc_opencl> shows <global_mem_size> as 6.359G. Never looked at that stuff before and I have no idea where those numbers come from.

Some additional file scanning gave some interesting (relevant?) information. These 2 lines from client_state.xml.

<name>einsteinbinary_BRP4G</name>
<user_friendly_name>Binary Radio Pulsar Search (Arecibo,GBT,long)</user_friendly_name>

and these lines from sched_reply_einstein.phys.uwm.edu.xml.

<coproc>
<type>ibo,GBT,long) is not available for your type of computer.</type>
<count>647500445489094944987862487032585421213412207048870776038754297507971394031017770238076521610590413666285928503412500294475737461178726716849130298534562777624215719772160.000000</count>
</coproc>

SORRY about that exceedingly long line. It's what was in the sched_reply file !!!

Something is terribly wrong here. As I understand it, a sched_request goes up to the server and it responds with a sched_reply. There is nothing like a ...(Arecibo,GBT,long)... in the sched_request so where does that mangled reply come from. And what's with that ~200 digit "count" in the reply? That's a lot of coprocessors...<grin>!

I've set NNT with the expectation that a project reset may be the best/only recovery. This error condition does not occur on every work request. As best as I can deduce, it is only when the server is trying to send me a BRP4G task, which does not happen on every work request.

Aren't computers fun...?

Glenn Carver

Joined: 25 Apr 18

Posts: 3

Credit: 36891371

RAC: 4035

I was about to post exactly

22 Aug 2023 8:10:22 UTC

Message 216143

(moderation:

)

I was about to post exactly the same issue. I have been seeing this problem on a new machine I just attached to E@H which is failing on the hsgamma_FGRP5 task.

The problem is there's garbage in the <coproc> tag in the client_state.xml file for this app which shouldn't be there:

<app_version>

<app_name>hsgamma_FGRP5</app_name>
<version_num>108</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<avg_ncpus>1.000000</avg_ncpus>
<flops>1000000000.000000</flops>
<plan_class>FGRPSSE</plan_class>
...........
<coproc>
<type>ibo,GBT,long) is not available for your type of computer.</type>
<count>647500445489094944987862487032585421213412207048870776038754297507971394031017770238076521610590413666285928503412500294475737461178726716849130298534562777624215719772160.000000</count>
</coproc> etc

Notice that hsgamma is defined in an <app_version> block.

The text in bold exactly matches the error message I see in the system logs & boincmgr

If I look on another machine I have which is successfully running the hsgamma app, then I do NOT have the <coproc> block for this app_version.

So it looks as if the project is sending out a malformed app description, or, something very weird happened on my machine (but now I know it's not just me!)

I will run down the existing tasks and try a project reset to see if that cures it. However, it may not offer an explanation as to why; which I am curious about as I work with CPDN.

Glenn Carver

Joined: 25 Apr 18

Posts: 3

Credit: 36891371

RAC: 4035

Detaching/attaching the

22 Aug 2023 11:16:28 UTC

Message 216145

(moderation:

)

Detaching/attaching the project didn't solve the problem.

I cleared running tasks. I then removed the project; checked that all instances of hsgamma had gone from client_state.xml; reattached and watched the log.

And again I see:

Tue 22 Aug 2023 10:06:31 BST | Einstein@Home | Master file download succeeded
Tue 22 Aug 2023 10:06:36 BST | Einstein@Home | Sending scheduler request: Project initialization.
Tue 22 Aug 2023 10:06:36 BST | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
Tue 22 Aug 2023 10:06:39 BST |  | app version refers to missing GPU type ibo,GBT,long) is not available for your type of computer.
Tue 22 Aug 2023 10:06:39 BST |  | 
Tue 22 Aug 2023 10:06:39 BST | Einstein@Home | Scheduler request completed: got 2 new tasks
Tue 22 Aug 2023 10:06:39 BST | Einstein@Home | Project requested delay of 60 seconds
Tue 22 Aug 2023 10:06:39 BST | Einstein@Home | [error] App version uses non-existent ibo,GBT,long) is not available for your type of computer. GPU
Tue 22 Aug 2023 10:06:39 BST | Einstein@Home | 
Tue 22 Aug 2023 10:06:39 BST | Einstein@Home | [error] Missing coprocessor for task Ter5_1_dns_cfbf00052_segment_5_dms_200_40000_52_3500000_1; aborting
Tue 22 Aug 2023 10:06:39 BST | Einstein@Home | [error] Missing coprocessor for task LATeah1090F_1208.0_4653420_0.0_2; aborting

It appears E@H is responsible.

Maybe it's related to specific hardware? In this case the machine is a 5900x + 1650 card. I have another machine 12400 + 1650 which doesn't have this issue.

Can someone at E@H investigate this? Appears it's adding a corrupt <coproc> XML block. I think I've done all I can here.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253166852

RAC: 41398

I did some changes to the

22 Aug 2023 13:37:00 UTC

Message 216152

(moderation:

)

I did some changes to the server code last week in particular with communicating the coproc usage to the client, in order to get the Apple M GPU app version delivered and working. Likely something went wrong there.

1. Can you find out and report when you started getting this error, as precisely as possible?

2. Does this happen on Macs only?

3. Does this actually hinder work fetch or is just a strange error?

Thanks a lot for reporting!

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253166852

RAC: 41398

I just found a flaw in the

22 Aug 2023 13:50:12 UTC

Message 216153

(moderation:

)

I just found a flaw in the code (uninitialized variable) and fixed it. Does the problem persist?

Glenn Carver

Joined: 25 Apr 18

Posts: 3

Credit: 36891371

RAC: 4035

That's fixed it. I

22 Aug 2023 17:03:41 UTC

Message 216161

(moderation:

)

That's fixed it. I reattached to E@H, none of previous errors now appear in logs & hsgamma tasks running normally.

Thanks for the quick response. Appreciated.

Wedge009

Joined: 5 Mar 05

Posts: 138

Credit: 17884425479

RAC: 7150570

I was going nuts, thinking

25 Aug 2023 6:37:49 UTC

Message 216228

(moderation:

)

I was going nuts, thinking there was a problem on my end. Two machines were suffering this problem over the past week or so. But things seem to be stabilising. Thanks for the information.

Soli Deo Gloria

app version refers to missing GPU type

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports