New Host Seems to Have Merged

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118600567513
RAC: 18210774

Sorry, I've had other

Sorry, I've had other commitments that have prevented me getting back to you in a timely manner. I will get to your followup message now that I've finished answering this one. Just bear with me a little longer ...

Quote:

It's actually had 3 separate IDs. The first was the same one as the Pentium D (ID=12129501). The second, the one you call "oldest ID", is the one I got from my work-around of uninstalling/reinstalling and attaching with boinccmd.

And the third is because I've detached again since then.
....


There's potentially a big difference between uninstalling/reinstalling and detaching/reattaching. I use the terms attach and detach because that's the way it used to be named. Add and Remove are just the current names for exactly the same process of connecting to or disconnecting from projects.

If you detach or remove a project and then you add that project back again at a later date, I believe you may very well get the same host ID that you had previously because I've seen that happen to other people. I don't know this for certain because I never attach (or detach) a new host. I choose to reuse existing host IDs by installing a complete BOINC template on any new host and the template just recycles one of my previous host IDs from a retired and replaced machine. I can imagine that detaching and reattaching might be capable of doing much the same thing if the host ID has been retained somewhere since detaching a project leaves the BOINC tree in place and doesn't wipe it. As I understand it, detaching removes a project's entry in the state file but not the state file itself. There are backup copies of the state file so perhaps if a project is reattached, the previous host ID might be obtainable from a backup copy even if it no longer exists in the master copy. This is all speculation on my part in trying to guess how a host ID might survive the removal of a project.

On the other hand uninstalling/reinstalling probably involves complete removal and subsequent recreation of the BOINC tree. If the disk was also reformatted, there's no way any contents could be retained. Under those circumstances, you are bound to get a new host ID. This is not usually a problem - there's no limit on how many you can have and you can always merge the different IDs for the same physical hardware. The only way I know of a 'different' host getting the same ID is if you move (a copy of) the disk to a different host so that two hosts have a copy of the same BOINC tree. The host ID is in the state file so if that gets copied you will have two different hosts claiming the same ID.

Quote:
It's a brand new machine I built myself. I've never installed a CPU before and I used the included stock cooler. I was monitoring temps and alarmed at how high it was even running 50% CPU time.


Although stock coolers are poor performers, I use lots of them under quite adverse conditions (room temp > 30C) and have some machines that have run like that for many years. I have about 5 with exactly the same cooler you are buying and they do run a bit cooler. I've never had a CPU with a stock cooler fail. They will throttle automatically if they get too hot. They seem quite resistant to damage through overheating. I'm not advocating running them hot. I'm just suggesting that maybe things aren't quite as bad as you think.

What actual temperature values are you seeing and what are you using to measure them? Is the CPU hot when idle or only under full load? If the CPU is hot when idle, it's a sign that your cooler may not be correctly attached and making proper contact with the heat spreader surface on the CPU. The CPU should be reasonably cool when idle, even with a stock cooler. I haven't seen stock coolers for the latest CPUs but if they still use the 4 push down pins to spread and lock the ridged plastic 'fingers' in motherboard holes, you should check closely, the back side of the board to see that the 4 sets of 'fingers' all protrude the same distance through the board with their 'ridges' properly engaged with the underside of the board and that the central black 'pin' that spreads the fingers is fully through as well. I've seen examples where the 'fingers' have been spread by the black pin even though they haven't made it all the way through the board. You can test this without removing the board to look at the back by sensing any lateral movement of the top of the black pin - the round flat end of the pin you push down to engage the pin. If there is lateral movement (ie. the pin 'wobbles' or feels loose), it's probable that the mechanism isn't properly engaged and the cooler isn't making proper contact with the heat spreader.

Quote:
I spent a lot of time trying to get Ubuntu 14 to detect my fan so I could increase the speed. It wasn't detecting so I decided to format the drive and try Ubuntu 15.


I usually set the fan to full speed in the BIOS/UEFI if the machine is to be a cruncher :-).

Quote:
*HOWEVER*, before I did that I pressed Remove in BOINC Manager because I want to be a good citizen and let the server know that I'm removing this machine.


If you are temporarily removing a machine (say for a week or less) and you don't have a large cache of work, there is no need to do anything other than stopping BOINC and saving a copy of your BOINC tree. When the machine is ready to return to service, perhaps with a new OS installed, just copy the BOINC tree back to where it was (preserving ownership and permissions of files) and then reinstall BOINC. BOINC will find the existing file structure, the host will have its former ID and the in-progress tasks will resume from the last checkpoints that were saved before BOINC was shut down.

If you suspect that the break will be longer so that deadlines will be missed, the only difference that needs to be made to the above is to abort all tasks that are likely to miss their deadlines and report those aborted tasks. There's still no need to detach if your ultimate intention is to resume crunching at some point, even well into the future. Sure, if you don't think you will be returning, by all means detach if you wish. Just be aware that the "good citizen" thing is not the detaching but rather the aborting and reporting of all the excess tasks. Detaching doesn't clear them. They have to sit in the database until they eventually time out. This is why you still see them all and why I suggested in the previous message that you should 'reclaim' them using the 'resend lost tasks' mechanism.

Quote:

This has happened twice now, two different brand new hosts have gotten the ID of a previous running host: Pentium D, ID 12129501. All three of them were attached through Manager(locally installed). The brand new hosts would immediately have an impossibly high RAC, the same RAC as the Pentium D. Except the Pentium D was now "gone". I'd then go to the Manager of the Pentium D and run Update, the Domain Name and CPU Type of ID 12129501 would change back to the Pentium D, and the new host would "disappear", until the new host updated. Then the same ID would have that Domain Name, CPU Type,etc, and the Pentium D would be gone again.

Since it's happened twice, I won't be surprised if I can actually reproduce this. I'll take screenshots if anyone is interested :)


I would love you to try to make it happen again and to document all the things you do when you re-commission the machine. If you are starting with a 'clean slate', you will get a further new host ID. It should be brand new and NOT a duplicate of any existing ID. If the slate is not perfectly clean, you could get the previous ID that this host had (and this would be a good thing because you should pick up the previous tasks) but you shouldn't be able to get the ID of the Pentium D. If you can, please show me how it's done :-).

If you would like to make sure you get one of the previous IDs of this host and hence get all the 'in progress' tasks for that host, let me know before you attempt to add the Einstein project to the new machine. It's a relatively simple procedure that just involves a bit of text editing. Have you used a text editor of any description under Linux before?

Whatever way you decide to proceed, once you are up and running, we can try to get all your previous tasks from the previous IDs.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118600567513
RAC: 18210774

RE: I'm not out of the

Quote:

I'm not out of the woods yet. I just realized my poor little dual-core Pentium D is sitting on 16 CPU tasks: https://einsteinathome.org/host/12129501

and its GPU has zero tasks. I think it's because this host id is the one that was constantly confused for something else, in this case, it was probably sent work meant for an i7 without a GPU.


Your Pentium D is fine and I'm sure that any perceived 'problems' are not really problems and you'll be able to quickly rectify them. Please answer the following (but don't try to guess why I'm asking and start changing things :-). ) :-

1. Do you set preferences on the website (globally) or do you set preferences locally on each client through boinccmd or the GUI (Manager)?

2. Do you understand the key point that local preferences trump global preferences?

3. Have you used 'venues' (default, home, school, work) as a method for separating hosts into different groups so as to implement different preferences for each group?

Quote:
Ran Update in Manager but the project is sitting on "Wont get new tasks". It'll take 10 days to work through all of those before it has a chance to resolve itself.


4. Is the "Wont get new tasks" showing under the "Status" heading on the "Projects" tab of BOINC Manager? If it is, this is why that host is not requesting further GPU tasks. Before you allow it to request new tasks, can you please click on the Tools -> Computing preferences menu item and on the window that opens, select the "network usage" tab so you can see what values are in play for "minimum work buffer" and "maximum additional work buffer". I want you to be sure you don't have excessive values for those two settings. I would suggest you set the first one to no more than say 3.0 days and the second one to a very low value - say 0.01 days. This would maintain a constant 3 day work cache for that machine. If you were to set the 2nd value to a larger number - say 2.0 days - the cache size would fill up to 5 days worth and then not fill again until it had declined to below 3 days. You may like this sort of 'hysteresis' but I don't :-). I like to keep the same amount at all times through regular fillups. However the choice is entirely yours. I'm going through all this because I just want to be sure you don't have excessive cache settings before you open the flood gates, so to speak ;-).

When you have confirmed (and reset if necessary) your cache settings, you can 'OK' any changes or 'cancel' if no changes to exit the window. Back on the Projects tab of the Manager, click the Einstein project to select it. This will activate the button on the LHS that says "Allow new tasks". Click it and it will change to "No new tasks" and the Status message of "Wont get new tasks" will disappear and new GPU tasks will come flooding in.

Just remember that the label on the button on the LHS tells you what will happen if you click it and NOT the current status. The current status (as shown in the Status column) is always the opposite of what the button reads.

Quote:
Is it better to wait 10 days, or reinstall BOINC and attach with boinccmd? I have not done it this machine because I didn't realize it too had a problem.


The Pentium D doesn't have a problem. It has quite a bit of CPU work which is why I got you to check your cache settings above. Once you allow it to get new tasks, all should be well with that machine.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.