As a test, I just now stopped BOINC, removed the .pdb file from the Einstein directory, and restarted.
It did indeed raise an error, but not on initially reading ap_info, but instead on starting up processing of an actual result. Each result--it ripped through my entire Einstein queue erroring out all Einstein results in a very few seconds. (all by the time I looked).
Don't try this at home.
Ouch!! I didn't realise it was that destructive. I've had situations where I've specified multiple versions of apps and their support files and then forgotten to make sure one of the earlier versions was actually there. When you start in this situation you get a harmless complaint that the earlier version .exe and its associated .pdb couldn't be found. This was harmless because there wasn't actually any work in the cache branded with that particular version number at the time.
This is the key point. If you specify program and support files as being needed for a particular version and if you have work branded with that particular version in your cache, the app_info mechanism will cause that work to be deleted if the specified program files are not there when needed.
Quote:
Oddly enough, when I put the .pdb file back and restarted BOINC, I _did_ get a startup error:
[error]Deleting file xxxx.pdb while in use
Then it went into a sulk--communication with project postponed for 24 hours.
This is probably something to do with the fact that, on startup, when the app_info file is first read, its contents get inserted into the state file (client_state.xml). I haven't actually checked this but perhaps what is in the state file gets modified (ie bits deleted) when certain specified files (the .pdb) can't be found. If you then put the .pdb back, perhaps the state file refuses to accept it because its not smart enough to update its own internal information and simply chooses to delete what it doesn't like instead. You would think that the app_info file would get reread each time boinc starts but perhaps it doesn't bother to do that if the app_info file hasn't changed - ie same date as when last read. This is all just speculation and I'm not suggesting that you repeat the experiment and risk another full cache just to find out :).
Thanks very much for reporting this - I've profited from your experiment without having to endure the consequences :).
I'm sorry but I can't answer this as I'm not really familiar with what other projects do or what an "opt-in option" actually is. Can you explain it a bit?
Sure. Many projects, Malaria Control for example, have a setting the user can turn on/off in the project settings portion of their account. If checked, the project server will send beta applications and/or WUs to the user's machines. Here is a snapshot:
I imagine when projects have the luxury of being able to plan new computational runs which are at least somewhat separate from the current production run and are not under any pressure to be released yesterday, for example, an opt-in beta test is the perfect way to do things. Here, I don't think there are enough Bernds around to do this and also the next production run seems to always be dependent on what happened with the previous one. For that reason, the new code can't really be developed - let alone finalised, too far in advance. I think we will be having concurrent betas for quite a while yet.
As always there are advantages and disadvantages with booth this methods of doing betas. The “opt-in option� have de advantage of making it easy for the participants to join in and start running, the drawback is that the participant isn’t aware if a beta is running, so he can keep an eye on it. The app_info have the advantage of making it possible to start running the new beta in the middle of a result, if the beta allows this. The participant is aware of that he is running the beta, exactly what version of the beta he is running and can report its progress.
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.
By specifying the .pdb file in the app_info.xml file you actually ensure that a complaint gets raised if the .pdb is missing.
That is the answer to my question. Thanks.
As a test, I just now stopped BOINC, removed the .pdb file from the Einstein directory, and restarted.
It did indeed raise an error, but not on initially reading ap_info, but instead on starting up processing of an actual result. Each result--it ripped through my entire Einstein queue erroring out all Einstein results in a very few seconds. (all by the time I looked).
Don't try this at home.
Good point! This is why I go through the following process ANY time I monkey with a BOINC project setup:
1. Disable network access (so if the modified folders go kaploey I can restore and I haven't sent anything--including screwed-up WUs--back home).
2. Shutdown BOINC
3. Copy the entire BOINC folder/subfolders to a backup folder.
4. Make changes as appropriate
5. Restart BOINC
6. Wait for the system to stablize...until I'm confident it won't flush the queue with errors
7. Enable network access.
If #6 fails, I can shutdown BOINC and restore my backup and I haven't lost any work.
Good point! This is why I go through the following process ANY time I monkey with a BOINC project setup:
1. Disable network access (so if the modified folders go kaploey I can restore and I haven't sent anything--including screwed-up WUs--back home).
2. Shutdown BOINC
3. Copy the entire BOINC folder/subfolders to a backup folder.
4. Make changes as appropriate
5. Restart BOINC
6. Wait for the system to stablize...until I'm confident it won't flush the queue with errors
7. Enable network access.
If #6 fails, I can shutdown BOINC and restore my backup and I haven't lost any work.
Thank you for a great piece of advice for those (like me) who like to tweak things "under the hood" :).
If I'm testing things that I'm not 100% confident about, I usually do exactly what you are suggesting and this has saved me from disaster a number of times.
If I'm making complex changes that I am confident about, I will also use this as a safegard against typos. This has also saved me a few times :).
If I'm making a simple, easy change, I quite often get lazy and don't bother with the extra effort. This is when most of the disasters actually happen :).
RE: As a test, I just now
)
Ouch!! I didn't realise it was that destructive. I've had situations where I've specified multiple versions of apps and their support files and then forgotten to make sure one of the earlier versions was actually there. When you start in this situation you get a harmless complaint that the earlier version .exe and its associated .pdb couldn't be found. This was harmless because there wasn't actually any work in the cache branded with that particular version number at the time.
This is the key point. If you specify program and support files as being needed for a particular version and if you have work branded with that particular version in your cache, the app_info mechanism will cause that work to be deleted if the specified program files are not there when needed.
This is probably something to do with the fact that, on startup, when the app_info file is first read, its contents get inserted into the state file (client_state.xml). I haven't actually checked this but perhaps what is in the state file gets modified (ie bits deleted) when certain specified files (the .pdb) can't be found. If you then put the .pdb back, perhaps the state file refuses to accept it because its not smart enough to update its own internal information and simply chooses to delete what it doesn't like instead. You would think that the app_info file would get reread each time boinc starts but perhaps it doesn't bother to do that if the app_info file hasn't changed - ie same date as when last read. This is all just speculation and I'm not suggesting that you repeat the experiment and risk another full cache just to find out :).
Thanks very much for reporting this - I've profited from your experiment without having to endure the consequences :).
Cheers,
Gary.
RE: RE: I'm sorry but I
)
Thanks for that. Now I understand :).
I imagine when projects have the luxury of being able to plan new computational runs which are at least somewhat separate from the current production run and are not under any pressure to be released yesterday, for example, an opt-in beta test is the perfect way to do things. Here, I don't think there are enough Bernds around to do this and also the next production run seems to always be dependent on what happened with the previous one. For that reason, the new code can't really be developed - let alone finalised, too far in advance. I think we will be having concurrent betas for quite a while yet.
Cheers,
Gary.
RE: Thanks very much for
)
You are welcome. I feel a bit less wasteful for having contributed a bit to the knowledge pool.
As always there are
)
As always there are advantages and disadvantages with booth this methods of doing betas. The “opt-in option� have de advantage of making it easy for the participants to join in and start running, the drawback is that the participant isn’t aware if a beta is running, so he can keep an eye on it. The app_info have the advantage of making it possible to start running the new beta in the middle of a result, if the beta allows this. The participant is aware of that he is running the beta, exactly what version of the beta he is running and can report its progress.
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.
RE: RE: By specifying the
)
Good point! This is why I go through the following process ANY time I monkey with a BOINC project setup:
1. Disable network access (so if the modified folders go kaploey I can restore and I haven't sent anything--including screwed-up WUs--back home).
2. Shutdown BOINC
3. Copy the entire BOINC folder/subfolders to a backup folder.
4. Make changes as appropriate
5. Restart BOINC
6. Wait for the system to stablize...until I'm confident it won't flush the queue with errors
7. Enable network access.
If #6 fails, I can shutdown BOINC and restore my backup and I haven't lost any work.
Seti Classic Final Total: 11446 WU.
RE: RE: Don't try this
)
Thank you for a great piece of advice for those (like me) who like to tweak things "under the hood" :).
If I'm testing things that I'm not 100% confident about, I usually do exactly what you are suggesting and this has saved me from disaster a number of times.
If I'm making complex changes that I am confident about, I will also use this as a safegard against typos. This has also saved me a few times :).
If I'm making a simple, easy change, I quite often get lazy and don't bother with the extra effort. This is when most of the disasters actually happen :).
I guess there's a message in there somewhere :).
Cheers,
Gary.