There is an existing thread in Technical News regarding FGRP4. Both Holmis and I have contributed observations and criticisms there.
But to keep things tidy, arguably the News forums should be for news, and observations and comments in either the Cruncher's Corner or here in Problems and Bug Reports.
So let me try to start a thread here for both observations and criticisms regarding FGRP4.
For a start, I'll quote my own post in Technical News made a few hours ago:
I got a batch of FGRP4 tasks on this laptop.
My first one ran to completion successfully--need the quorum partner to run to see whether things went right.
Observations:
1. the initial completion time estimate was very far low--something like 6x.
2. the credit awarded (as seen here and elsewhere) of 2.58 seems very low in relation to the CPU work required.
3. I'm very happy to see CPU work available in small enough doses of computation required to be suitable either for lower output machines which run 24/7, or for somewhat higher output machines which run intermittently (as does my laptop).
To that I'll add an additional observation regarding progress reporting:
4. It appears that the current FGRP4 application reports progress (as observed in the Progress column of Boinc Manager) in very coarse increments. The only three progress reports I have seen are at 0.000%, 32.333%, and 65.666%.
As completion times are short compared to some recent Einstein CPU aps, this spacing may not represent an unusually small amount of actual computation, but from experience we may predict that some users will interpret an extended period with no update of progress as a "stall", fault of either the application or their machine, and take unconstructive responses ranging from aborting the task, to disabling work request for the specific application, up to abandoning Einstein altogether.
It would be good to report progress more frequently.
I think many of us assume that checkpointing is tied to progress reporting. Is this actually true? More specifically, for the current FGRP4 roughly how frequently is there a checkpoint--so that the intermittent user may hope not to be wasting large amounts of already invested CPU time each time they shut down their PC?
Copyright © 2024 Einstein@Home. All rights reserved.
FGRP4 Observations and Problems
)
For those desiring FGRP4 work:
1. currently this is beta-test status, so your Einstein@home project preferences for the location (aka venue) of the host in question must have a "Yes" for "Run beta/test application versions?"
2. also in your Einstein@home preferences, in the "Run only the selected applications" section, you must have a Yes for "Gamma-ray pulsar search #4".
I am a bit unclear on the default setting of the Run Only... item for a newly listed application, but for my locations all were initially set to "No". So it required an active intervention on my part after the preferences page introduced listing of this application to get this type of work.
I got and processed a unit of
)
I got and processed a unit of FGRP4 work on a second host which was also a Windows 7 host but a very modern desktop CPU.
Once again the execution time was a large multiple of the prediction. Though I did not observe the prediction directly, I could easily observe that after completion of the FGRP4 result all executing work on the host went to High Priority mode because the estimated completion times were greatly elongated. A few hours after the FGRP4 results completed the estimated completion times for FGRP3 and Perseus work are between 5 and 10 times longer than recent experience.
As I have the requested work buffer size set to a little over two days, this will resolve itself pretty soon. People running larger work buffers may find this effect more disruptive.
So for two different Intel/Windows 7 hosts, the execution times estimates were low by the better part of an order of magnitude. Results on other hosts may differ.
RE: For those desiring
)
This is not quite true in my experience. One of my venues (w/ 2 hosts) was set to Yes for "run beta" but no for "GR gpu" and no for "run cpu for apps w/ gpu" and I still got a large number (~120) of FGRP4 tasks downloaded. Disabling "beta" and abort tasks cured the problem. I'm back to BRP5 only on both hosts...
Gord
RE: This is not quite true
)
I think it may well be true because you may have received the unexpected tasks for a different reason. I should add that I haven't received any FGRP4 tasks because I haven't (yet) enabled the FGRP4 preference. I'm still trying to figure out how I'm going to juggle venues (yet again) to allow me to do so in a controlled way. Also, I'm in no hurry until I see the initial problems (as reported by Holmis) corrected.
I assume you must have selected the preference for the FGRP4 run in that venue? I also assume that "run cpu for apps w/ gpu" refers to the pref setting labelled "Run CPU versions of applications for which GPU versions are available"? If so, setting this pref to 'No' may not (of itself) prevent you from getting FGRP4 tasks because there is no GPU app 'available' for FGRP4 and so the pref setting may not even be looked at.
Could you have also cured the problem by disabling FGRP4? If you did get FGRP4 tasks with that run preference already disabled, that's also a problem that will need to be rectified as well.
Cheers,
Gary.
In the technical news thread
)
In the technical news thread on this topic Gary Roberts has pointed out that some execution time and credit reporting for these tasks may be atypical because of the "short ends" problem.
So I don't know how representative my result may be but I will point out that all three hosts in my flotilla which have received this work have greatly increased their duration correction factor, and have thus been driven into executing work in high-priority mode immediately after completing their first task of this type.
While I did not log the reported duration correction factor before beginning this process, the values as of this morning a few hours after first processing this type of work are:
17.735275
9.560836
10.259696
So, for the work distributed initially, it seems all three of my hosts have had significantly low run time estimates.
One other attribute of the
)
One other attribute of the FGRP4 work I have received is tight deadlines, just 48 hours after the "sent" time.
I imagine this has to do with the beta test status of the currently distributing work. But together with the DCF transient associated with the severe underestimate of time required, this will be disruptive for some users, particularly those who choose to run long queue lengths, and also those whose machines are only actively processing BOINC work intermittently.
RE: I assume you must have
)
This was set to 'No' for FGRP4. BRP5 is the only app enabled in all my venues. I suspect it had something to do with 'Beta' being enabled, but then 'run cpu for gpu apps' was set to no. It appears no doesn't quite mean no.
Gord
With under two dozen units
)
With under two dozen units processed, I've had two errors on two different machines--far higher an error rate than I've been seeing.
The error on my laptop came almost immediately upon start of execution, and has quite a short stderr, of which only two lines look potentially interesting
The error on my fastest PC has a much longer stderr, of which one entry reads "Maximum elapsed time exceeded", although another entry buried deep in the might be interesting and reads
[edit: the first of these two errors was my only v 1.02 job. That version has been deprecated.
RE: The error on my fastest
)
That's the normal error when Boinc aborts the job because of "Maximum elapsed time exceeded".
RE: RE: The error on my
)
Rom Walton once told me it was a deliberate choice by the developers. One possible reason for a task running far longer than expected is that the execution path for that particular dataset has branched into a previously undetected infinite loop. The full program debug logs are to help the developer find that loop.