All tasks end with exit code 69 (0x45) Netbios session limit exceeded

CeSinge
CeSinge
Joined: 13 Apr 10
Posts: 2
Credit: 173935785
RAC: 0
Topic 215697

Hello,

2860 Einstein@Home 7/7/18 14:29:38 [task] Process for LATeah1007L_180.0_0_0.0_17484320_0 exited, exit code 69, task state 1 

2861 Einstein@Home 7/7/18 14:29:38 [task] task_state=EXITED for LATeah1007L_180.0_0_0.0_17484320_0 from handle_exited_app

2862 Einstein@Home 7/7/18 14:29:38 [task] result state=COMPUTE_ERROR for LATeah1007L_180.0_0_0.0_17484320_0 from CS::report_result_error

2863 Einstein@Home 7/7/18 14:29:38 [task] Process for LATeah1007L_180.0_0_0.0_17484320_0 exited

2864 Einstein@Home 7/7/18 14:29:38 [task] exit code 69 (0x45): The network BIOS session limit was exceeded. (0x45)

2865 7/7/18 14:29:38 [statefile] set dirty: ACTIVE_TASK_SET::poll

2866 Einstein@Home 7/7/18 14:29:38 Computation for task LATeah1007L_180.0_0_0.0_17484320_0 finished

2867 Einstein@Home 7/7/18 14:29:38 Output file LATeah1007L_180.0_0_0.0_17484320_0_0 for task LATeah1007L_180.0_0_0.0_17484320_0 absent

2868 Einstein@Home 7/7/18 14:29:38 Output file LATeah1007L_180.0_0_0.0_17484320_0_1 for task LATeah1007L_180.0_0_0.0_17484320_0 absent

2869 Einstein@Home 7/7/18 14:29:38 [task] result state=COMPUTE_ERROR for LATeah1007L_180.0_0_0.0_17484320_0 from CS::app_finished

Sorry for the format mess: pasted from BoincTasks; I can't find the Boinc file corresponding to this.

So I did some Googling around to try to at least have some indication about the issue. I suspect, because I see NETBIOS, that it has to do with connections to my NAS. Fact is, I run everything BOINC from my NAS (via a drive mapping so that all Boinc data is in S:\.

Question is then, why do the Einstein job require so many, more than whatever default, Netbios sessions, or connections to the NAS, presumably in parallel. It ran fine in the past; I don't know when it stopped working as I only rarely check it these days. Note that Einstein is the only project I run that uses my GPUs (2x GTX1070).

Is there a way to increase these netbios sessions? Or to have Einstein use less of these?

Thank you.

Francois

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Hi Cesinge! I believe that

Hi Cesinge!

I believe that the NetBios/Network Bios message is a red herring and comes from something trying to interpret the fault code and using windows fault codes.

I went and had a look on your failed tasks for host 12629348 and picked a few task results at random, #1, #2, #3, #4, and #5.
All show the same error in the stderr output:

Couldn't create OpenCL context (error: 999)!
initialize_ocl returned error [2007]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
02:37:42 (7728): [CRITICAL]: ERROR: MAIN() returned with error '5'

I would start by checking the graphics card driver and download the latest from Nvidia and install that by choosing "Advanced" and clean install.

CeSinge
CeSinge
Joined: 13 Apr 10
Posts: 2
Credit: 173935785
RAC: 0

Sorry, had not much time to

Sorry, had not much time to deal with it, then had to wait to confirm that jobs were running. Indeed, upgrading to the latest GeForce drivers seems to do the trick.

I'm not sure where you got you detailed information, however? Server-side only? Now, that wouldn't have helped much anyway... 

Thank you for this ! Cool

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

CeSinge skrev:Sorry, had not

CeSinge wrote:
Sorry, had not much time to deal with it, then had to wait to confirm that jobs were running. Indeed, upgrading to the latest GeForce drivers seems to do the trick.

Good to hear that things seem to be working now!

Quote:
I'm not sure where you got you detailed information, however? Server-side only? Now, that wouldn't have helped much anyway..

The info is available locally while the task is running and after it finishes until it's reported. But you can't read it through Boinc Manager, you have to navigate to the Boinc data directory and either look in the appropriate slot or in client_state.xml.
It's much easier to find the info on the webpage after the task is reported. Wink

Quote:
Thank you for this ! Cool

You're welcome!

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1408247898
RAC: 1157615

After running GPU's on this

After running GPU's on this one host for 3 years it decided today to do this

https://einsteinathome.org/task/1288209418

Over and over and I tried rebooting and updating the driver but it made no difference.

 

I see the 

The network BIOS session limit was exceeded. 

And the error at the end but I have no idea why it just happened all of a sudden since this was the one I usually never had to check since it isn't a video card.

It was always running pretty good for a AMD Ryzen 3 2300U

https://einsteinathome.org/host/12769534

and I usually never ask any questions but my headache doesn't want me to search for the answer

So I just suspended this one since all these GPU errors are not something I like to have here.

Might have to fire up one of my GPU's that I used to have running here to make up for this for now.

- Samson

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46834192642
RAC: 64304588

MAGIC Quantum Mechanic

MAGIC Quantum Mechanic wrote:

After running GPU's on this one host for 3 years it decided today to do this

https://einsteinathome.org/task/1288209418

Over and over and I tried rebooting and updating the driver but it made no difference.

 

I see the 

The network BIOS session limit was exceeded. 

And the error at the end but I have no idea why it just happened all of a sudden since this was the one I usually never had to check since it isn't a video card.

It was always running pretty good for a AMD Ryzen 3 2300U

https://einsteinathome.org/host/12769534

and I usually never ask any questions but my headache doesn't want me to search for the answer

So I just suspended this one since all these GPU errors are not something I like to have here.

Might have to fire up one of my GPU's that I used to have running here to make up for this for now.

- Samson

turn off beta tasks. all of your errors are with the v1.28 beta app. all of your previous successes are with the 1.22 standard app.

_________________________________________________________________________

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1408247898
RAC: 1157615

Ian&Steve C. wrote: turn off

Ian&Steve C. wrote:

turn off beta tasks. all of your errors are with the v1.28 beta app. all of your previous successes are with the 1.22 standard app.

 

Not sure how that happened but you are right of course so back to work and I just got up and my headache is almost gone finally)

Thanks Steve

maeax
maeax
Joined: 6 Aug 12
Posts: 21
Credit: 1423633213
RAC: 2509070

Endstatus:69 (0x00000045)

Endstatus:69 (0x00000045) Unknown error code

Gamma-ray pulsar binary search #1 on GPU's 1.28 (FGRPopencl2-ati)

AMD Radeon Pro WX 3200 - Driver 30.0.21020.2 from 22/05/24.

Version changed to 1.22. Running now.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1408247898
RAC: 1157615

Well now I am having that

Well now I am having that problem again and it isn't because of running the wrong version.

 https://einsteinathome.org/task/1437788090

Is again doing this and quite a lot this time

The network BIOS session limit was exceeded.


MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1408247898
RAC: 1157615

Well I figured it out myself

Well I figured it out myself and it had nothing to do with what version of GPU tasks I had running as I was first told here.

It had to do with my GeForce 660Ti SC being an "OC'd GPU or in this case "super-clocked"

I decided to just figure it out since many times I got Valids but started getting more Invalids.

So I went to the EVGA program I used to control the fan speed and used it to change the settings of the Clock Speed from 1337 to 979 and left the voltage the same.

And now it has ran over 40 Valids in a row and no more problems.

Only difference is the lower Clock Speed adds about 500 seconds to the run time.

I could try turning up the Clock Speed a little but I will just let it run these Valids instead.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117589723149
RAC: 35198815

MAGIC Quantum Mechanic

MAGIC Quantum Mechanic wrote:
Well I figured it out myself and it had nothing to do with what version of GPU tasks I had running as I was first told here.

It's not cool to malign someone else for your incorrect interpretation of the problem. I'm responding just to correct the notion that you were misled.

I guess you chose this thread to report your problems because of the "network BIOS session limit was exceeded" error message, as reported by the OP back in July, 2018.  If you had read that report and the reply provided by Holmis at the time, you would have found,

Holmis wrote:
I believe that the NetBios/Network Bios message is a red herring and comes from something trying to interpret the fault code and using windows fault codes.

and you would have seen that the problem was graphics driver related.

This same type of 'Windows misinformation' seems to crop up quite regularly.  You really need to examine the stderr.txt stuff on the website to find better information.

You first used this thread for your 9 May 2022 problem report which had the very same Windows misinterpretation of an app error code.  It was a totally different problem this time since you were immediately (and correctly) advised that you were using a beta app that was not appropriate for your type of GPU.

Fast forward to 8 March 2023 and you again have a bogus Windows message.  Once again it was not related to the problem.  The task you linked at the time no longer exists so I've chosen this one which does.  I looked at a few around the same date and they all show something similar, with the same bogus Network BIOS message.  At the top of the 'Stderr Output' you will see the "exit code 69" which Windows tries to interpret.  There's lots of things which give the '69' code.  I seem to recall this code being referred to as "unspecified error" under Linux.

You need to look down towards the bottom of the output to get a better interpretation.  I'm not a programmer so this is just a guess as to what actually happened.  The message is:-

Error during OpenCL FFT (error: -36)followed by:-

ERROR: gen_fft_execute() returned with error -282502848which indicates that a particular routine crashed whilst trying to perform an FFT.

If you keep following, you see that the science app's main routine 'main()' returned an error code of '5' when handing things back to BOINC.  In turn, BOINC called the boinc_finish() routine and passed the value '69' which is exactly what Windows misinterpreted.  It seems to me that '69' is just a 'catch-all' code which just means "something crashed" :-).

When you use old hardware that is probably well past its 'use by' date, you need to expect unspecified, hardware related errors.  Sometimes it's power quality from old PSUs, particularly from capacitor degradation.  Sometimes it may be that doping elements in silicon chips do suffer from increased diffusion due to elevated temperatures over the 'continuous use' lifetime of the device.  Reducing frequency may buy you a bit more lifetime but eventually the crashing may return.

MAGIC Quantum Mechanic wrote:
It had to do with my GeForce 660Ti SC being an "OC'd GPU or in this case "super-clocked"

Not necessarily, since the 'headroom' that any particular GPU has is just a 'luck of the draw' type of thing.  Since manufacturers tend to select chips with higher headroom to use in their "super-clocked" variants, it's probably possible for a device using the default frequency to still have a lower headroom.  I guess a lot depends on how carefully the manufacturer selects the 'better' chips for the SC cards.

MAGIC Quantum Mechanic wrote:
I decided to just figure it out since many times I got Valids but started getting more Invalids.

The only person that can sort out hardware issues like this is the person with physical access to the hardware.  Please don't suggest that you've been given wrong advice when you didn't work out that it's a totally different problem this time.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.