"Exited with zero status but no 'finished' file"

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

I noticed the 3100M was not

I noticed the 3100M was not listed there but I went through the driver select and picked the NVS 3100M and it gave me that driver.

I'll post a question about it in the NVIDIA forum.

As far as disabling the GPU tasks, I need to do it on just this machine, I have a a few others that work just fine (this is the only laptop) the others are 220 and 240 they seem to work fine in UNIX and Windows. And BOINC seems to be the only thing that has trouble with the GPU.

Thanks for your help!

Joe

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

You can specify one of three


You can specify one of three locations (work, home or school) for each of your computers. Go to your list of computers in your account and click Details. Scroll to the bottom of the screen and set your 2 other machines to the same location. Set your laptop to a different location. Then go into your Einstein prefs and for the location for your 2 other machines make sure CUDA tasks is selected. For the location for your laptop deselct CUDA.

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

RE: You can specify one of

Quote:

You can specify one of three locations (work, home or school) for each of your computers. Go to your list of computers in your account and click Details. Scroll to the bottom of the screen and set your 2 other machines to the same location. Set your laptop to a different location. Then go into your Einstein prefs and for the location for your 2 other machines make sure CUDA tasks is selected. For the location for your laptop deselct CUDA.


That looks like it will work just fine.

Hopefully the next version of the NVIDIA drivers will solve this.

Joe

robertmiles
robertmiles
Joined: 8 Oct 09
Posts: 127
Credit: 29734209
RAC: 14607

RE: Usually this message is

Quote:

Usually this message is preceded by "No heartbeat from core client" in stderr_out of that tasks. There are a couple of reasons for this: the Core Client being busy with other things than communicating with the App (e.g. waiting for a slow DNS), or the time ticking differently between App and Client, as it happens occasionally when the time on your machine is adjusted e.g. to synchronize with an external timeserver.

Revisiting and rewriting or possible replacing the 'heartbeat' mechanism is a long standing item on the todo list of BOINC developers, but AFAIK hasn't been done yet. Making the Core Client talk to the App in a separate thread might also help, but AFAIK this also hasn't been addressed yet.

As long as BOINC sticks to the current implementation, the only thing you can do is to find out what prevents the Client from answering and stop that, e.g. network issues.

BM

An idea to consider: Does the current implementation allow this? If one heartbeat test fails, do not terminate immediately. Instead, start a second one, and only terminate if the second one fails also.

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

RE: RE: Usually this

Quote:
Quote:

Usually this message is preceded by "No heartbeat from core client" in stderr_out of that tasks. There are a couple of reasons for this: the Core Client being busy with other things than communicating with the App (e.g. waiting for a slow DNS), or the time ticking differently between App and Client, as it happens occasionally when the time on your machine is adjusted e.g. to synchronize with an external timeserver.

Revisiting and rewriting or possible replacing the 'heartbeat' mechanism is a long standing item on the todo list of BOINC developers, but AFAIK hasn't been done yet. Making the Core Client talk to the App in a separate thread might also help, but AFAIK this also hasn't been addressed yet.

As long as BOINC sticks to the current implementation, the only thing you can do is to find out what prevents the Client from answering and stop that, e.g. network issues.

BM

An idea to consider: Does the current implementation allow this? If one heartbeat test fails, do not terminate immediately. Instead, start a second one, and only terminate if the second one fails also.

That wouldn't be a bad idea but remember that the application merely terminates itself and later BOINC restarts it. BOINC doesn't terminate the task and give it Compute Error status until the application terminates itself 99 times. I know that's not quite what you're recommending but...

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

I just want to report that I

I just want to report that I solved this problem by uninstalling every nVidia driver and updating to 280.13 drivers. Well at least they are completing and reporting with out error, I'm still waiting for one to validate.

The problem, I believe, is the same one we discussed in the Linux nVidia 280.13 thread.

The short version is that nVidia and Canonical installers put things in different places so installing with one and upgrading with the other leaves old dynamic libraries around and I was getting bit by version mismatches. The proper procedure to switch sources of the driver is to uninstall the old one before switching.

Also the suggestion by Dagorath for using a different location (work, school...) and disabling the GPU in one worked well to keep my laptop running CPU only tasks while other systems on my account continued unaffected.

Joe

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.