Validate error - What this really means!

Beercules48
Beercules48
Joined: 18 Jun 23
Posts: 7
Credit: 54420989
RAC: 112439

Hello, I'm getting a lot

Hello,

I'm getting a lot of "validate error" messages on my second machine. I read somewhere in passing, that it may be somehow caused by remotely accessing the BOINC Client on that machine. Doesn't really make sense to me, but I do access the BOINC client on that machine form a BOINC Manager session on my main rig. From the timing of the errors, it seems to fit. But why would the simple act of accessing the Client remotely via the in built ways ruin the validity of the tasks?? Am I missing something?

 

Machine that throws the errors:

Computer 13153390

CPU type: AuthenticAMD AMD Ryzen 5 3600 6-Core Processor [Family 23 Model 113 Stepping 0]

Number of processors: 12

Coprocessors: AMD AMD Radeon Pro W5700 (8176MB)

Operating system: Microsoft Windows 10 Core x64 Edition, (10.00.19045.00)

BOINC client version: 7.22.2

Memory: 16292.49 MiB

Cache: 512 KiB

Swap space: 18724.49 MiB

Total disk space: 237.83 GiB

Free disk space: 181.43 GiB

 

It is extremely convenient for me to access that machine remotely so any input is welcome!

Thanks!

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18725734321
RAC: 6594857

With your computers hidden on

With your computers hidden on your account, we are unable to help you by examining the task errors for cause.

Remote monitoring should not have any impact on the local machine production UNLESS it ties up resources so severely that it pulls resources away from the running tasks.

Possible causes, all cpu resources pulled to supporting the remote session. The monitoring consuming all memory or stealing away enough I/O that task file results can't be read or written.

 

Beercules48
Beercules48
Joined: 18 Jun 23
Posts: 7
Credit: 54420989
RAC: 112439

I wasn't aware I had the

I wasn't aware I had the setting turned on that hides my computers, apologies!

 

The hint with the CPU, or machine in general, being overwhelmed is something I can look into. But I made sure to only allow 5 tasks to run on the CPU, so that one entire core (or seven threads) remain(s) free. This should leave enough breathing room......

 

Here is the observed behaviour in more detail:

 

I click "update" in the Project tab remotely, it sends/reports the task. It is immediately marked as "validate error".

 

I close the remote session, click "update" in the project tab on the local machine, it reports, all is well. I am stumped. (different task, obv, bc the other earlier one was already sent away)

 

I doubt I/O is the problem it runs on an SSD and 16 GB of RAM HAS to be enough even these days....

 

Thanks for the quick reply, I appreciate that ✌

Beercules48
Beercules48
Joined: 18 Jun 23
Posts: 7
Credit: 54420989
RAC: 112439

Okay I think I got it sorted.

Okay I think I got it sorted. Hopefully. As suggested, it wasn't the remote connection, that was indeed a weird coincidence that threw me off the correct path. It was a driver version issue.

 

It would be nice, if the validator could provide some feedback as to why the tasks didn't pass the sanity check, because I was not able to find that anywhere. That could have spared me this public humiliation xD

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18725734321
RAC: 6594857

You can always find the

You can always find the reason for a task error in the result file from each task. Click on the task ID link and go to the end of the output and look for the error failure or warning message.

Task 1513165427

Quote:

[14:24:23][11600][ERROR] Input file on command line ../../projects/einstein.phys.uwm.edu/p2030.20161217.G203.11+01.02.S.b6s0g0.00100_680.bin4 doesn't agree with input file  from checkpoint header.

Task 1513573229

Quote:

[14:46:25][11944][ERROR] Header checkpoint file Ter5_1_dns_cfbf00052_segment_5_dms_200_93.cpt contains inconsistent information about total number of templates to work with. (0 != 50000).

I don't know anything about Windows but I seem to remember something about using Remote Desktop.

Questions and problems : Boinc stop using GPU when receive a Remote Desktop Connection(RDP)

GPUs : Remote access stops GPU calculations

Might have something to do with all the Aborted tasks that I assume were caused by your manual Update through your remote session.

 

Giorgio Cannella
Giorgio Cannella
Joined: 2 May 19
Posts: 4
Credit: 658477
RAC: 0

Good morning, since

Good morning,

since September 4th, 2023, Einstein@Home projects on my pc show the report "Calculation error (1 CPU + 1 Intel GPU)".

This did not happen before.

Other projects on Boinc Manager in my pc run without any problem.

I read Moderator Administrator message on September 7th, 2011, saying:

"My current suspicion is that these validate errors do happen 'preferably' on 64Bit machines, either Linux or recent Mac OS versions.

My Boinc Manager version is 7.22.2 (x64).

What can I do to solve this problem?

I thank you in advance.

Giorgio

 

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1053
Credit: 17896984
RAC: 10940

Giorgio Cannella schrieb:What

Giorgio Cannella wrote:
What can I do to solve this problem?

Hello Giorgio,

You may want to publish information about the computers you use with BOINC/Einstein. This is done in the Einstein preferences within your account: Account --> Preferences --> Privacy --> "Should Einstein@Home show your computers on its website?". direct link to your Einstein preferences

Then the experienced crunchers here can check the type of your computer, GPU type, your successful and failing tasks, corresponding logfiles etc. Then it's easier to give you some useful advice.

Scrooge

Giorgio Cannella
Giorgio Cannella
Joined: 2 May 19
Posts: 4
Credit: 658477
RAC: 0

Hello Scrooge, I did what

Hello Scrooge,

I did what you asked.

Let me know.

Thanks.

Giorgio

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18725734321
RAC: 6594857

Either you did not do as

Either you did not do as requested or the website is very slow in effecting the change.

Your computers are still hidden when I just checked.

 

Beercules48
Beercules48
Joined: 18 Jun 23
Posts: 7
Credit: 54420989
RAC: 112439

The output file I have access

The output file I have access to does not show any ERRORs. Only INFOs.

 

"

[01:44:25][2092][INFO ] Checkpoint committed!
[01:45:32][2092][INFO ] Checkpoint committed!
[01:46:08][2092][INFO ] OpenCL shutdown complete!
[01:46:08][2092][INFO ] Statistics: count dirty SumSpec pages 67364 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1926214
[01:46:08][2092][INFO ] Data processing finished successfully!
01:46:08 (2092): called boinc_finish(0)

</stderr_txt>
]]>"

 

This is an excerpt of the output file of a task marked as "validate error". I checked the whole thing. No mention of an error.

 

Source: https://einsteinathome.org/task/1519993958

 

I have downloaded BoincTasks after I read that it solved the issue for another user, so we'll see. Still curious where you found that ERROR msg bc I couldn't. And I checked thoroughly, or so I thought....

 

Thanks again for the reply, it is appreciated! While that boosted my confidence in the BOINC community the same cannot be said about the internal structure of BOINC I experience xD

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.