28-Sep-2016 18:49:57 [---] max memory usage when active: 2993.09MB
28-Sep-2016 18:49:57 [---] max memory usage when idle: 2993.09MB
28-Sep-2016 18:49:57 [---] max disk usage: 92.96GB
28-Sep-2016 18:49:57 [---] max CPUs used: 6
28-Sep-2016 18:49:57 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
and there are several things wrong with it.
No 'mod_time' at start of file - it should be at the start, between 'source' and 'max_cpus_pct'
'mod_time' 1475084831 between 'school' and 'work' venues - Wed, 28 Sep 2016 17:47:11 GMT, less than three minutes earlier.
'disk_max_used_gb' zero (school and work) is unfamiliar to me - I expected to see 100 GB. But my BOINC account has been open so long, and changed in so many projects, that it would be impossible to track it all the way back to source.
The log lines for 'last modified' are interesting. I remember a previous problem with preference propagation between projects, which resulted in errors like
21-Mar-2016 22:59:59 [NumberFields@home] Scheduler request completed: got 1 new tasks
21-Mar-2016 22:59:59 [NumberFields@home] [sched_op] Server version 705
21-Mar-2016 22:59:59 [NumberFields@home] Project requested delay of 91 seconds
21-Mar-2016 22:59:59 [SETI@home] General prefs: from SETI@home (last modified ---)
21-Mar-2016 22:59:59 [SETI@home] Computer location: school
21-Mar-2016 22:59:59 [SETI@home] General prefs: no separate prefs for school; using your defaults
but were fixed by some playing with the mod_time field:
21-Mar-2016 23:00:07 [SETI@home] Scheduler request completed: got 1 new tasks
21-Mar-2016 23:00:07 [SETI@home] [sched_op] Server version 707
21-Mar-2016 23:00:07 [SETI@home] Project requested delay of 303 seconds
21-Mar-2016 23:00:07 [SETI@home] General prefs: from SETI@home (last modified 21-Mar-2016 22:59:59)
21-Mar-2016 23:00:07 [SETI@home] Computer location: school
21-Mar-2016 23:00:07 [---] General prefs: using separate prefs for school
I wrote that up somewhere (probably in email) - I'll try to find it again.
That's all I can glean from this machine - I'll need to check the others.
OK, back again, and posting from host 8864187 - the only one with the problem.
And it's also the only one which received the new preference set via propagation from a traditional BOINC server:
28-Sep-2016 21:22:39 [SETI@home] Scheduler request completed: got 0 new tasks
28-Sep-2016 21:22:39 [SETI@home] [sched_op] Server version 707
28-Sep-2016 21:22:39 [SETI@home] Project has no tasks available
28-Sep-2016 21:22:39 [SETI@home] Project requested delay of 303 seconds
28-Sep-2016 21:22:39 [Einstein@Home] General prefs: from Einstein@Home (last modified ---)
28-Sep-2016 21:22:39 [Einstein@Home] Computer location: school
The global_prefs.xml here - which is copied complete into sched_request_einstein.phys.uwm.edu.xml - is missing the final </venue> on the "work" venue.
I think I can (roughly) re-create the sequence of events.
I have evidence from a cloned BOINC data folder that the global_prefs I re-activated on 21 March only had two venues - 'default' and 'school'.
On 28 September, the SETI@Home workunit generators failed to restart after weekly maintenance, and by the evening my computers were beginning to run out of work. I visited this site to re-enable work fetch for GPUs (a project preference, obviously), but because I'm still struggling with the new layout, I visited the global preference page instead. I remember seeing a message about a new venue being created and automatically saved with default values, but thought no more about it - I changed the intended project preferences, got the machines active again with Einstein work, and thought no more about it until today.
The significant features in the sequence of events leading to the 'Invalid global preferences supplied, please check' message would seem to be:
empty venue in existing global prefs set
trigger creation of automatic new venue set by Einstein, with Einstein defaults
automatic save of new venue
propagation to traditional BOINC server
propagation back to Einstein from traditional server.
I suspect it's the new Einstein web code which has generated the mis-placed '<mod_time>' and failed to generate the <\venue> tag. I suspect it's also the Einstein code which has defaulted to zero GB maximum disk usage (perhaps assuming that 'zero' means 'unrestricted').
I've tried to decode the 'traditional' default values from
$disk_prefs=array(
newPREF_OPT_NUM(
tra("Use no more than"),
tra("Limit the total amount of disk space used by BOINC."),
We've been working on updating the Web site code, which have not been fully implemented, that should solve some of these problems. For one, the auto-saving of preferences upon visiting the preferences page will be disabled.
As for the disk_max_used_db preference: zero does mean unlimited disk space. So I don't think that's the problem. It fact further in the https://github.com/BOINC/boinc/blob/master/html/inc/prefs.inc (line 279) file you will see a default value of zero.
Finally, the placement of the <mod_time> tag shouldn't matter, as long as it is available in the global_preferences field, it should be read.
I'll have a closer look on Monday as well. If it is the case there a closing </venue> tag is not being placed, that's a problem we have to solve.
Thank you for comprehensive and detailed bug report.
I've found my previous exploration of <mod_time> tags - it was on a project forum, not email. Starts at http://setiathome.berkeley.edu/forum_thread.php?id=79270&postid=1773054#1773054
In bug-hunting mode, I fell that both 'max used disk space: zero means unlimited' and 'placement of the <mod_time> tag shouldn't matter' are aspirational intentions, but we need to be skeptical about whether the implementation matches the aspiration. All my machines have global_prefs_override.xml files, so are functioning normally in spite of the oddities. I'll leave them running like that to preserve the evidence, then once we're sure we've extracted all useful information from the broken files, try re-making them to see if they behave as expected.
the problem is that we don't know yet where the missing </venue> tag is coming from. It's definitely not coming from our webcode or the BOINC webcode (when updating preferences on the website). The position of <mod_time> is also not triggering the "Invalid global preferences supplied" message. Only invalid XML is triggering this message. You should be able to check on your client if the content of <global_preferences> is the same in your client_state.xml and sched_request_einstein*.xml file.
Do you have the same set of preferences on all your computers? They shold have the same <mod_time> in client_state.xml. Can you compare a working host with the faulty host and see that both are attached to the same projects?
I think I'd call it "Where does the missing </venue> tag disappear to?". The closing tag was present in the very first appearance of the new global_prefs file direct from Einstein on 28 September, but was (is) missing on the only host which received it indirectly via SETI. Something ate it en route, either on the way to or from SETI.
All my machines are on the same user account, so all participate in the game of 'pass the preference' as far as global preferences are concerned. I do make extensive use of 'venue' preferences, especially for project preferences - it was an intention to change a couple of Einstein venue preferences to 'use NVidia' which triggered all this off.
BTW, I have long felt that BOINC is somewhat ambiguous about the use of venues. Is a host supposed to be attached to a single venue, for all purposes and all projects? Or can a host be on different venues at different projects? I've always chosen the latter interpretation, and it's always seemed to work.
My machines have been attached to up to 16 projects over the years, but I'm currently down to six active projects, and this list hasn't changed since the events of 28 September.
Einstein@Home
FiND@Home (dormant)
GPUGrid
LHC@Home 1.0 Classic (intermittent work)
NumberFields@Home
SETI@Home
The host giving the 'invalid global preferences' message isn't attached to GPUGrid, but is attached to the other five active projects.
I'll go and check the state files you suggest, and come back later if I can make any sense of what I find.
OK, I think I have some smoking guns here. David Anderson isn't going to like this...
1) on the mod_time issue.
The BOINC client is supposed to show the source and time of the last prefs update at startup. Mine were showing source, but not time. On a whim, I stopped a client, moved the <mod_time> line to the general area, so it became the last line above the first <venue> section. There, it was recognised and written in the event log at next startup.
Looking at the client code, I think the prefs parser is designed to stop listening for a <mod_time> when it enters venue parsing, and maybe doesn't start again when it leaves the venue.
2) on the missing </venue> tag.
I also bumped the mod_time by one second, to trigger the 'pass the parcel' preferences propagation. This gets quite tricky: sched_request and sched_reply files are written completely afresh for each server RPC, and only the first RPC reply after an update conveys the new information. Timing matters. But I have a pair of files which I'll send you by email.
The request to SETI has at the end of global preferences
OK, I think I have some smoking guns here. David Anderson isn't going to like this...
1) on the mod_time issue.
The BOINC client is supposed to show the source and time of the last prefs update at startup. Mine were showing source, but not time. On a whim, I stopped a client, moved the <mod_time> line to the general area, so it became the last line above the first <venue> section. There, it was recognised and written in the event log at next startup.
Looking at the client code, I think the prefs parser is designed to stop listening for a <mod_time> when it enters venue parsing, and maybe doesn't start again when it leaves the venue.
I'll take a look at this. It should be irrelevant where <mod_time> is. Since this involves a Client change a will also need to modify our server code to make sure that mod_time is before any venue which might be tricky because we let Drupal convert the PHP objects into XML and there might be no way to specify a specific order of tags.
Richard Haselgrove wrote:
2) on the missing </venue> tag.
I also bumped the mod_time by one second, to trigger the 'pass the parcel' preferences propagation. This gets quite tricky: sched_request and sched_reply files are written completely afresh for each server RPC, and only the first RPC reply after an update conveys the new information. Timing matters. But I have a pair of files which I'll send you by email.
The request to SETI has at the end of global preferences
like that - no line terminator (either LF or CRLF - the files themselves are mixed-mode) between the two successive closing tags.
And in writing that segment to the local global_prefs.xml file, the local client left out the <\venue> tag completely. QED.
That verifies that it is a server issue. Whether this is restricted to Seti I don't know yet. But since they always tend to run the latest BOINC server code it's most likely that it is a general issue for all projects.
Ok I'm going to post this here as I'm not sure if it's a server problem or what.
Ever since Einstein has since the change to the new format, it has continually ignored local preferences to limit work units to 1 days worth.
I now have near 200 GW work units on 1 machine an over 150 each on 2 others.
No way am I going to be able to complete all of them before deadline.
It was doing the same for gamma rays but those didn't have the higher level of importance that the GW do.
If this continues I'm afraid that large amount of data are going to end up with failure to process before the deadlines.(not just from me but from other Einstein users)
I see on my computing preferences this: here at Einstein the cache is set for 0.1+0.25 days (only Generic settings are stored) but at Seti it shows 0.6+0.3 days (only preferences without any name is there). But they should be the same, should they not?
I wrote more than that, but
)
I wrote more than that, but it wouldn't display. Here's the rest:
OK, back again, and posting
)
OK, back again, and posting from host 8864187 - the only one with the problem.
And it's also the only one which received the new preference set via propagation from a traditional BOINC server:
I have evidence from a cloned BOINC data folder that the global_prefs I re-activated on 21 March only had two venues - 'default' and 'school'.
new NUM_SPEC(tra("GB"), 0, 9999999, $dp->disk_max_used_gb, 1, 100)
in https://github.com/BOINC/boinc/blob/master/html/inc/prefs.inc, but failed. But I suspect it may be a zero entry which has triggered the bug report at BOINC today: What can I do with the following messages about "more disk space needed"?.
The BOINC dev problem report is with BOINC v7.6.22: I'm using nothing older than v7.6.33, and in some case home builds from the 'master' source code.
Richard, We've been working
)
Richard,
We've been working on updating the Web site code, which have not been fully implemented, that should solve some of these problems. For one, the auto-saving of preferences upon visiting the preferences page will be disabled.
As for the disk_max_used_db preference: zero does mean unlimited disk space. So I don't think that's the problem. It fact further in the https://github.com/BOINC/boinc/blob/master/html/inc/prefs.inc (line 279) file you will see a default value of zero.
Finally, the placement of the <mod_time> tag shouldn't matter, as long as it is available in the global_preferences field, it should be read.
I'll have a closer look on Monday as well. If it is the case there a closing </venue> tag is not being placed, that's a problem we have to solve.
Thank you for comprehensive and detailed bug report.
Einstein@Home Project
I've found my previous
)
I've found my previous exploration of <mod_time> tags - it was on a project forum, not email. Starts at http://setiathome.berkeley.edu/forum_thread.php?id=79270&postid=1773054#1773054
In bug-hunting mode, I fell that both 'max used disk space: zero means unlimited' and 'placement of the <mod_time> tag shouldn't matter' are aspirational intentions, but we need to be skeptical about whether the implementation matches the aspiration. All my machines have global_prefs_override.xml files, so are functioning normally in spite of the oddities. I'll leave them running like that to preserve the evidence, then once we're sure we've extracted all useful information from the broken files, try re-making them to see if they behave as expected.
Hi Richard, the problem is
)
Hi Richard,
the problem is that we don't know yet where the missing </venue> tag is coming from. It's definitely not coming from our webcode or the BOINC webcode (when updating preferences on the website). The position of <mod_time> is also not triggering the "Invalid global preferences supplied" message. Only invalid XML is triggering this message. You should be able to check on your client if the content of <global_preferences> is the same in your client_state.xml and sched_request_einstein*.xml file.
Do you have the same set of preferences on all your computers? They shold have the same <mod_time> in client_state.xml. Can you compare a working host with the faulty host and see that both are attached to the same projects?
I think I'd call it "Where
)
I think I'd call it "Where does the missing </venue> tag disappear to?". The closing tag was present in the very first appearance of the new global_prefs file direct from Einstein on 28 September, but was (is) missing on the only host which received it indirectly via SETI. Something ate it en route, either on the way to or from SETI.
All my machines are on the same user account, so all participate in the game of 'pass the preference' as far as global preferences are concerned. I do make extensive use of 'venue' preferences, especially for project preferences - it was an intention to change a couple of Einstein venue preferences to 'use NVidia' which triggered all this off.
BTW, I have long felt that BOINC is somewhat ambiguous about the use of venues. Is a host supposed to be attached to a single venue, for all purposes and all projects? Or can a host be on different venues at different projects? I've always chosen the latter interpretation, and it's always seemed to work.
My machines have been attached to up to 16 projects over the years, but I'm currently down to six active projects, and this list hasn't changed since the events of 28 September.
The host giving the 'invalid global preferences' message isn't attached to GPUGrid, but is attached to the other five active projects.
I'll go and check the state files you suggest, and come back later if I can make any sense of what I find.
OK, I think I have some
)
OK, I think I have some smoking guns here. David Anderson isn't going to like this...
1) on the mod_time issue.
The BOINC client is supposed to show the source and time of the last prefs update at startup. Mine were showing source, but not time. On a whim, I stopped a client, moved the <mod_time> line to the general area, so it became the last line above the first <venue> section. There, it was recognised and written in the event log at next startup.
Looking at the client code, I think the prefs parser is designed to stop listening for a <mod_time> when it enters venue parsing, and maybe doesn't start again when it leaves the venue.
2) on the missing </venue> tag.
I also bumped the mod_time by one second, to trigger the 'pass the parcel' preferences propagation. This gets quite tricky: sched_request and sched_reply files are written completely afresh for each server RPC, and only the first RPC reply after an update conveys the new information. Timing matters. But I have a pair of files which I'll send you by email.
The request to SETI has at the end of global preferences
...
<dont_verify_images>0</dont_verify_images>
</venue>
</global_preferences>
as you'd expect - proper structure.
But the sched_reply to the next host in the daisy chain ended
...
<dont_verify_images>0</dont_verify_images>
</venue></global_preferences>
like that - no line terminator (either LF or CRLF - the files themselves are mixed-mode) between the two successive closing tags.
And in writing that segment to the local global_prefs.xml file, the local client left out the <\venue> tag completely. QED.
Richard Haselgrove wrote:OK,
)
I'll take a look at this. It should be irrelevant where <mod_time> is. Since this involves a Client change a will also need to modify our server code to make sure that mod_time is before any venue which might be tricky because we let Drupal convert the PHP objects into XML and there might be no way to specify a specific order of tags.
That verifies that it is a server issue. Whether this is restricted to Seti I don't know yet. But since they always tend to run the latest BOINC server code it's most likely that it is a general issue for all projects.
Ok I'm going to post this
)
Ok I'm going to post this here as I'm not sure if it's a server problem or what.
Ever since Einstein has since the change to the new format, it has continually ignored local preferences to limit work units to 1 days worth.
I now have near 200 GW work units on 1 machine an over 150 each on 2 others.
No way am I going to be able to complete all of them before deadline.
It was doing the same for gamma rays but those didn't have the higher level of importance that the GW do.
If this continues I'm afraid that large amount of data are going to end up with failure to process before the deadlines.(not just from me but from other Einstein users)
I see on my computing
)
I see on my computing preferences this: here at Einstein the cache is set for 0.1+0.25 days (only Generic settings are stored) but at Seti it shows 0.6+0.3 days (only preferences without any name is there). But they should be the same, should they not?