Hello,
2860 Einstein@Home 7/7/18 14:29:38 [task] Process for LATeah1007L_180.0_0_0.0_17484320_0 exited, exit code 69, task state 1
2861 Einstein@Home 7/7/18 14:29:38 [task] task_state=EXITED for LATeah1007L_180.0_0_0.0_17484320_0 from handle_exited_app
2862 Einstein@Home 7/7/18 14:29:38 [task] result state=COMPUTE_ERROR for LATeah1007L_180.0_0_0.0_17484320_0 from CS::report_result_error
2863 Einstein@Home 7/7/18 14:29:38 [task] Process for LATeah1007L_180.0_0_0.0_17484320_0 exited
2864 Einstein@Home 7/7/18 14:29:38 [task] exit code 69 (0x45): The network BIOS session limit was exceeded. (0x45)
2865 7/7/18 14:29:38 [statefile] set dirty: ACTIVE_TASK_SET::poll
2866 Einstein@Home 7/7/18 14:29:38 Computation for task LATeah1007L_180.0_0_0.0_17484320_0 finished
2867 Einstein@Home 7/7/18 14:29:38 Output file LATeah1007L_180.0_0_0.0_17484320_0_0 for task LATeah1007L_180.0_0_0.0_17484320_0 absent
2868 Einstein@Home 7/7/18 14:29:38 Output file LATeah1007L_180.0_0_0.0_17484320_0_1 for task LATeah1007L_180.0_0_0.0_17484320_0 absent
2869 Einstein@Home 7/7/18 14:29:38 [task] result state=COMPUTE_ERROR for LATeah1007L_180.0_0_0.0_17484320_0 from CS::app_finished
Sorry for the format mess: pasted from BoincTasks; I can't find the Boinc file corresponding to this.
So I did some Googling around to try to at least have some indication about the issue. I suspect, because I see NETBIOS, that it has to do with connections to my NAS. Fact is, I run everything BOINC from my NAS (via a drive mapping so that all Boinc data is in S:\.
Question is then, why do the Einstein job require so many, more than whatever default, Netbios sessions, or connections to the NAS, presumably in parallel. It ran fine in the past; I don't know when it stopped working as I only rarely check it these days. Note that Einstein is the only project I run that uses my GPUs (2x GTX1070).
Is there a way to increase these netbios sessions? Or to have Einstein use less of these?
Thank you.
Francois
Copyright © 2024 Einstein@Home. All rights reserved.
Hi Cesinge! I believe that
)
Hi Cesinge!
I believe that the NetBios/Network Bios message is a red herring and comes from something trying to interpret the fault code and using windows fault codes.
I went and had a look on your failed tasks for host 12629348 and picked a few task results at random, #1, #2, #3, #4, and #5.
All show the same error in the stderr output:
I would start by checking the graphics card driver and download the latest from Nvidia and install that by choosing "Advanced" and clean install.
Sorry, had not much time to
)
Sorry, had not much time to deal with it, then had to wait to confirm that jobs were running. Indeed, upgrading to the latest GeForce drivers seems to do the trick.
I'm not sure where you got you detailed information, however? Server-side only? Now, that wouldn't have helped much anyway...
Thank you for this !
CeSinge skrev:Sorry, had not
)
Good to hear that things seem to be working now!
The info is available locally while the task is running and after it finishes until it's reported. But you can't read it through Boinc Manager, you have to navigate to the Boinc data directory and either look in the appropriate slot or in client_state.xml.
It's much easier to find the info on the webpage after the task is reported.
You're welcome!
After running GPU's on this
)
After running GPU's on this one host for 3 years it decided today to do this
https://einsteinathome.org/task/1288209418
Over and over and I tried rebooting and updating the driver but it made no difference.
I see the
And the error at the end but I have no idea why it just happened all of a sudden since this was the one I usually never had to check since it isn't a video card.
It was always running pretty good for a AMD Ryzen 3 2300U
https://einsteinathome.org/host/12769534
and I usually never ask any questions but my headache doesn't want me to search for the answer
So I just suspended this one since all these GPU errors are not something I like to have here.
Might have to fire up one of my GPU's that I used to have running here to make up for this for now.
- Samson
MAGIC Quantum Mechanic
)
turn off beta tasks. all of your errors are with the v1.28 beta app. all of your previous successes are with the 1.22 standard app.
_________________________________________________________________________
Ian&Steve C. wrote: turn off
)
Not sure how that happened but you are right of course so back to work and I just got up and my headache is almost gone finally)
Thanks Steve
Endstatus:69 (0x00000045)
)
Endstatus:69 (0x00000045) Unknown error code
Gamma-ray pulsar binary search #1 on GPU's 1.28 (FGRPopencl2-ati)
AMD Radeon Pro WX 3200 - Driver 30.0.21020.2 from 22/05/24.
Version changed to 1.22. Running now.
Well now I am having that
)
Well now I am having that problem again and it isn't because of running the wrong version.
https://einsteinathome.org/task/1437788090
Is again doing this and quite a lot this time
Well I figured it out myself
)
Well I figured it out myself and it had nothing to do with what version of GPU tasks I had running as I was first told here.
It had to do with my GeForce 660Ti SC being an "OC'd GPU or in this case "super-clocked"
I decided to just figure it out since many times I got Valids but started getting more Invalids.
So I went to the EVGA program I used to control the fan speed and used it to change the settings of the Clock Speed from 1337 to 979 and left the voltage the same.
And now it has ran over 40 Valids in a row and no more problems.
Only difference is the lower Clock Speed adds about 500 seconds to the run time.
I could try turning up the Clock Speed a little but I will just let it run these Valids instead.
MAGIC Quantum Mechanic
)
It's not cool to malign someone else for your incorrect interpretation of the problem. I'm responding just to correct the notion that you were misled.
I guess you chose this thread to report your problems because of the "network BIOS session limit was exceeded" error message, as reported by the OP back in July, 2018. If you had read that report and the reply provided by Holmis at the time, you would have found,
and you would have seen that the problem was graphics driver related.
This same type of 'Windows misinformation' seems to crop up quite regularly. You really need to examine the stderr.txt stuff on the website to find better information.
You first used this thread for your 9 May 2022 problem report which had the very same Windows misinterpretation of an app error code. It was a totally different problem this time since you were immediately (and correctly) advised that you were using a beta app that was not appropriate for your type of GPU.
Fast forward to 8 March 2023 and you again have a bogus Windows message. Once again it was not related to the problem. The task you linked at the time no longer exists so I've chosen this one which does. I looked at a few around the same date and they all show something similar, with the same bogus Network BIOS message. At the top of the 'Stderr Output' you will see the "exit code 69" which Windows tries to interpret. There's lots of things which give the '69' code. I seem to recall this code being referred to as "unspecified error" under Linux.
You need to look down towards the bottom of the output to get a better interpretation. I'm not a programmer so this is just a guess as to what actually happened. The message is:-
Error during OpenCL FFT (error: -36)
followed by:-ERROR: gen_fft_execute() returned with error -282502848
which indicates that a particular routine crashed whilst trying to perform an FFT.If you keep following, you see that the science app's main routine 'main()' returned an error code of '5' when handing things back to BOINC. In turn, BOINC called the boinc_finish() routine and passed the value '69' which is exactly what Windows misinterpreted. It seems to me that '69' is just a 'catch-all' code which just means "something crashed" :-).
When you use old hardware that is probably well past its 'use by' date, you need to expect unspecified, hardware related errors. Sometimes it's power quality from old PSUs, particularly from capacitor degradation. Sometimes it may be that doping elements in silicon chips do suffer from increased diffusion due to elevated temperatures over the 'continuous use' lifetime of the device. Reducing frequency may buy you a bit more lifetime but eventually the crashing may return.
Not necessarily, since the 'headroom' that any particular GPU has is just a 'luck of the draw' type of thing. Since manufacturers tend to select chips with higher headroom to use in their "super-clocked" variants, it's probably possible for a device using the default frequency to still have a lower headroom. I guess a lot depends on how carefully the manufacturer selects the 'better' chips for the SC cards.
The only person that can sort out hardware issues like this is the person with physical access to the hardware. Please don't suggest that you've been given wrong advice when you didn't work out that it's a totally different problem this time.
Cheers,
Gary.