I see a couple of 'validation inconclusive' results for BRP-7 Meerkat jobs. The log file shows in the top summary 'checked but no consensus yet' and near the last few lines - 'Statistics: count dirty SumSpec pages xxxxxxx....'
This is on a ryzen cpu box with amd radeon 7900XT. Ubuntu 22.4.4.
I was wondering what's the difference in this error vs. just the 'invalid result' error?
Application:Binary Radio Pulsar Search (MeerKAT) v0.17 (BRP7-opencl-ati)
x86_64-pc-linux-gnu
Stderr output
<core_client_version>7.20.5</core_client_version>
<![CDATA[
<stderr_txt>
[19:58:36][147371][INFO ] Application startup - thank you for supporting Einstein@Home!
[19:58:36][147371][INFO ] Starting data processing...
[19:58:36][147371][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[19:58:36][147371][INFO ] Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
[19:58:39][147371][INFO ] Number of generated templates to be used: 50000
[19:58:39][147371][INFO ] Checkpoint file unavailable: Ter5_1_dns_cfbf00010_segment_6_dms_200_170.cpt (No such file or directory).
------> Starting from scratch...
[19:58:39][147371][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/Terzan5/epoch1/30min_segments/dedispersed_files/cfbf00010/Ter5_1_dns_cfbf00010_segment_6_dms_200_DM247.00
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59097.776226693291
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 174804.62
------> DEC (J2000): -244640.799999
------> Galactic l: 0
------> Galactic b: 0
------> Name: J1748-2446M
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 247 cm^-3 pc
------> Scale factor: 1.04225
[19:58:39][147371][INFO ] Seed for random number generator is 1085814639.
[19:58:41][147371][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 3.24424e-09
------> thr1 = 19.5464
------> thr2 = 22.7124
------> thr4 = 27.844
------> thr8 = 36.3869
------> thr16 = 50.9469
[19:59:06][147371][INFO ] Checkpoint committed!
[19:59:36][147371][INFO ] Checkpoint committed!
[20:00:06][147371][INFO ] Checkpoint committed!
[20:00:36][147371][INFO ] Checkpoint committed!
[20:01:06][147371][INFO ] Checkpoint committed!
[20:01:36][147371][INFO ] Checkpoint committed!
[20:02:06][147371][INFO ] Checkpoint committed!
[20:02:36][147371][INFO ] Checkpoint committed!
[20:03:06][147371][INFO ] Checkpoint committed!
[20:03:36][147371][INFO ] Checkpoint committed!
[20:04:06][147371][INFO ] Checkpoint committed!
[20:04:36][147371][INFO ] Checkpoint committed!
[20:05:06][147371][INFO ] Checkpoint committed!
[20:05:36][147371][INFO ] Checkpoint committed!
[20:06:07][147371][INFO ] Checkpoint committed!
[20:06:37][147371][INFO ] Checkpoint committed!
[20:07:07][147371][INFO ] Checkpoint committed!
[20:07:21][147371][INFO ] OpenCL shutdown complete!
[20:07:21][147371][INFO ] Statistics: count dirty SumSpec pages 267803 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1926214
[20:07:21][147371][INFO ] Data processing finished successfully!
20:07:21 (147371): called boinc_finish(0)
</stderr_txt>
]]>
---------------
I also have some BRP-7 WU's from as far back as May 13 still 'waiting for validation'. I wonder if that is due project server issues - or probably more likely local power issues going on here ? My local utility replaced my power feed line to my house in early May and I had the power up and down a few times. Not always with clean shutdowns of my hardware as well - unfortunately. Of course that's always dicey.
Looking a little closer at one other intel cpu box with amd gpu - I see a couple of other 'validation inconclusive' results from around May 25th. At this point I'd even be inclined to blame it on solar flares and my satellite comms. LOL. I see the almanac shows it was raining and cool here. Beats me.
I see a couple of 'validation inconclusive' results for BRP-7 Meerkat jobs.
So do many other participants as well, from time to time.
Mike wrote:
I was wondering what's the difference in this error vs. just the 'invalid result' error?
First of all, an inconclusive result is NOT an error. It is just that a comparison of the two results gives an agreement that is not quite close enough. The reasons for this include things like 'noisy' data, OS differences, app version differences, math library/driver differences, GPU model differences, validation tolerances too strict, etc.
To break the impasse, a third result is sent out and a further comparison made when the result is returned. If two of the three now agree closely enough, the third will be declared invalid. Chances are that all 3 were error free anyway. A common cause of a 'good' result being declared invalid is when two results from Windows machines get compared to a single result from a Linux machine and the Windows results win. I have seen this quite a few times.
If you follow your inconclusives until the 3rd result is returned, you will find that some win and some lose. If there is something really wrong with a result that points to something you need to fix, it will be reported as a 'validate error' and not an inconclusive.
Thanks for that. I was hoping the 'inconclusive' meant it's still a bit up in the air as it were, and still had a chance at getting another shot at validation. Not 'invalid' yet, but your observations on windows machines results sometimes winning the contest a bit more often vs. some of the linux boxes is interesting.
I see a couple of 'validation
)
I see a couple of 'validation inconclusive' results for BRP-7 Meerkat jobs. The log file shows in the top summary 'checked but no consensus yet' and near the last few lines - 'Statistics: count dirty SumSpec pages xxxxxxx....'
This is on a ryzen cpu box with amd radeon 7900XT. Ubuntu 22.4.4.
I was wondering what's the difference in this error vs. just the 'invalid result' error?
-------- here's the log ----
TASK 1616371136
Name:Ter5_1_dns_cfbf00010_segment_6_dms_200_40000_170_6950000_1
Workunit ID:808091280
Created:24 May 2024 12:19:54 UTC
Sent:24 May 2024 20:31:00 UTC
Report deadline:7 Jun 2024 20:31:00 UTC
Received:25 May 2024 3:11:32 UTC
Server state:Over
Outcome:Success
Client state:Done
Exit status:0 (0x00000000)
Computer:13167060
Run time (sec):526.18
CPU time (sec):143.69
Peak working set size (MB):431.71
Peak swap size (MB):6814.56
Peak disk usage (MB):0.05
Validation state:Checked, but no consensus yet
Granted credit:0
Application:Binary Radio Pulsar Search (MeerKAT) v0.17 (BRP7-opencl-ati)
x86_64-pc-linux-gnu
Stderr output
</stderr_txt>
]]>
---------------
I also have some BRP-7 WU's from as far back as May 13 still 'waiting for validation'. I wonder if that is due project server issues - or probably more likely local power issues going on here ? My local utility replaced my power feed line to my house in early May and I had the power up and down a few times. Not always with clean shutdowns of my hardware as well - unfortunately. Of course that's always dicey.
- Mike
Looking a little closer at
)
Looking a little closer at one other intel cpu box with amd gpu - I see a couple of other 'validation inconclusive' results from around May 25th. At this point I'd even be inclined to blame it on solar flares and my satellite comms. LOL. I see the almanac shows it was raining and cool here. Beats me.
-Mike
Mike wrote:I see a couple of
)
So do many other participants as well, from time to time.
First of all, an inconclusive result is NOT an error. It is just that a comparison of the two results gives an agreement that is not quite close enough. The reasons for this include things like 'noisy' data, OS differences, app version differences, math library/driver differences, GPU model differences, validation tolerances too strict, etc.
To break the impasse, a third result is sent out and a further comparison made when the result is returned. If two of the three now agree closely enough, the third will be declared invalid. Chances are that all 3 were error free anyway. A common cause of a 'good' result being declared invalid is when two results from Windows machines get compared to a single result from a Linux machine and the Windows results win. I have seen this quite a few times.
If you follow your inconclusives until the 3rd result is returned, you will find that some win and some lose. If there is something really wrong with a result that points to something you need to fix, it will be reported as a 'validate error' and not an inconclusive.
Cheers,
Gary.
Gary, Thanks for that. I
)
Gary,
Thanks for that. I was hoping the 'inconclusive' meant it's still a bit up in the air as it were, and still had a chance at getting another shot at validation. Not 'invalid' yet, but your observations on windows machines results sometimes winning the contest a bit more often vs. some of the linux boxes is interesting.
Great info. Cheers!
-Mike