No "Validation inconclusive" filter on Tasks page

Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5985121202
RAC: 9433252

I see a couple of 'validation

I see a couple of 'validation inconclusive'  results for BRP-7 Meerkat jobs.  The log file shows in the top summary 'checked but no consensus yet' and near the last few lines -  'Statistics: count dirty SumSpec pages xxxxxxx....'

This is on a ryzen cpu box with amd radeon 7900XT.    Ubuntu 22.4.4.  

I was wondering what's the difference in this error vs. just the 'invalid result' error?

--------  here's the log ----

TASK 1616371136

Name:Ter5_1_dns_cfbf00010_segment_6_dms_200_40000_170_6950000_1

Workunit ID:808091280

Created:24 May 2024 12:19:54 UTC

Sent:24 May 2024 20:31:00 UTC

Report deadline:7 Jun 2024 20:31:00 UTC

Received:25 May 2024 3:11:32 UTC

Server state:Over

Outcome:Success

Client state:Done

Exit status:0 (0x00000000)

Computer:13167060

Run time (sec):526.18

CPU time (sec):143.69

Peak working set size (MB):431.71

Peak swap size (MB):6814.56

Peak disk usage (MB):0.05

Validation state:Checked, but no consensus yet

Granted credit:0

Application:Binary Radio Pulsar Search (MeerKAT) v0.17 (BRP7-opencl-ati)
x86_64-pc-linux-gnu


Stderr output

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<stderr_txt>
[19:58:36][147371][INFO ] Application startup - thank you for supporting Einstein@Home!
[19:58:36][147371][INFO ] Starting data processing...
[19:58:36][147371][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[19:58:36][147371][INFO ] Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
[19:58:39][147371][INFO ] Number of generated templates to be used: 50000
[19:58:39][147371][INFO ] Checkpoint file unavailable: Ter5_1_dns_cfbf00010_segment_6_dms_200_170.cpt (No such file or directory).
------> Starting from scratch...
[19:58:39][147371][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/Terzan5/epoch1/30min_segments/dedispersed_files/cfbf00010/Ter5_1_dns_cfbf00010_segment_6_dms_200_DM247.00
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59097.776226693291
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 174804.62
------> DEC (J2000): -244640.799999
------> Galactic l: 0
------> Galactic b: 0
------> Name: J1748-2446M
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 247 cm^-3 pc
------> Scale factor: 1.04225
[19:58:39][147371][INFO ] Seed for random number generator is 1085814639.
[19:58:41][147371][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 3.24424e-09
------> thr1 = 19.5464
------> thr2 = 22.7124
------> thr4 = 27.844
------> thr8 = 36.3869
------> thr16 = 50.9469
[19:59:06][147371][INFO ] Checkpoint committed!
[19:59:36][147371][INFO ] Checkpoint committed!
[20:00:06][147371][INFO ] Checkpoint committed!
[20:00:36][147371][INFO ] Checkpoint committed!
[20:01:06][147371][INFO ] Checkpoint committed!
[20:01:36][147371][INFO ] Checkpoint committed!
[20:02:06][147371][INFO ] Checkpoint committed!
[20:02:36][147371][INFO ] Checkpoint committed!
[20:03:06][147371][INFO ] Checkpoint committed!
[20:03:36][147371][INFO ] Checkpoint committed!
[20:04:06][147371][INFO ] Checkpoint committed!
[20:04:36][147371][INFO ] Checkpoint committed!
[20:05:06][147371][INFO ] Checkpoint committed!
[20:05:36][147371][INFO ] Checkpoint committed!
[20:06:07][147371][INFO ] Checkpoint committed!
[20:06:37][147371][INFO ] Checkpoint committed!
[20:07:07][147371][INFO ] Checkpoint committed!
[20:07:21][147371][INFO ] OpenCL shutdown complete!
[20:07:21][147371][INFO ] Statistics: count dirty SumSpec pages 267803 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1926214
[20:07:21][147371][INFO ] Data processing finished successfully!
20:07:21 (147371): called boinc_finish(0)

</stderr_txt>
]]>

---------------

I also have some BRP-7 WU's from as far back as May 13 still 'waiting for validation'.    I wonder if that is due project server issues - or probably more likely local power issues going on here ?  My local utility replaced my power feed line to my house in early May and I had the power up and down a few times.  Not always with clean shutdowns of my hardware as well - unfortunately.  Of course that's always dicey.

- Mike

 

Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5985121202
RAC: 9433252

Looking a little closer at

Looking a little closer at one other intel cpu box with amd gpu - I see a couple of other 'validation inconclusive' results from around May 25th.   At this point I'd even be inclined to blame it on solar flares and my satellite comms.  LOL.   I see the almanac shows it was raining and cool here.   Beats me. 

-Mike

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117791491907
RAC: 34682873

Mike wrote:I see a couple of

Mike wrote:
I see a couple of 'validation inconclusive'  results for BRP-7 Meerkat jobs.

So do many other participants as well, from time to time.

Mike wrote:
I was wondering what's the difference in this error vs. just the 'invalid result' error?

First of all, an inconclusive result is NOT an error.  It is just that a comparison of the two results gives an agreement that is not quite close enough.  The reasons for this include things like 'noisy' data, OS differences, app version differences, math library/driver differences, GPU model differences, validation tolerances too strict, etc.

To break the impasse, a third result is sent out and a further comparison made when the result is returned.  If two of the three now agree closely enough, the third will be declared invalid.  Chances are that all 3 were error free anyway.  A common cause of a 'good' result being declared invalid is when two results from Windows machines get compared to a single result from a Linux machine and the Windows results win.  I have seen this quite a few times.

If you follow your inconclusives until the 3rd result is returned, you will find that some win and some lose.  If there is something really wrong with a result that points to something you need to fix, it will be reported as  a 'validate error' and not an inconclusive.

Cheers,
Gary.

Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5985121202
RAC: 9433252

Gary, Thanks for that.  I

Gary,

Thanks for that.  I was hoping the 'inconclusive' meant it's still a bit up in the air as it were, and still had a chance at getting another shot at validation.  Not 'invalid' yet, but your observations on windows machines results sometimes winning the contest a bit more often  vs. some of the linux boxes is interesting.

Great info.           Cheers!

 

-Mike

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.