host's current tasks don't show up on website after merging

tomjtoth
tomjtoth
Joined: 18 Sep 17
Posts: 8
Credit: 25673428
RAC: 13746
Topic 218789

 

Hey there,

 

I have the below situation:

1 - I had ubuntu 18.10 (hostname was "u15-ab125no") running fine with BAM! as well, the

2 - I checked my "live archlinux" thumbdrive (hostname: "Kulkuri128") and added that as well to BAM! - just to check if I could get openCL to work on Arch - then

3 - I installed Arch to the same machine (hostname: "15-ab125no") and added it to BAM! as well, then

4 - I merged the old ubuntu host with the current arch host

5 - There are only 10+ tasks visible - including completed and failed https://einsteinathome.org/host/12775239/tasks/0/0

6 - I counted around 80 tasks inprogress via boincmgr

 

How do I get all my tasks to show up on the website as well?

What did I do wrong?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118422285189
RAC: 25871203

Tamas Toth wrote:How do I get

Tamas Toth wrote:
How do I get all my tasks to show up on the website as well?

Actually, I believe they still do, but probably not in the way you would like to see them.  The host you linked to seems to have a companion with a somewhat newer host ID of 12775435.  This host has 75 tasks but unfortunately they all show as aborted.

Tamas Toth wrote:
What did I do wrong?

The last time I merged hosts was way more than 10 years ago so my information is long forgotten :-).  I seem to remember that it was important to allow all tasks still showing in the online database to 'disappear' before trying to merge the 'old' host into its 'reincarnation'.  I'm guessing that trying to merge two entities, each with database entries, might not be being handled correctly.  I can vaguely remember doing a few merges in the early days and being careful to wait until zero tasks were showing for the old host.

For a long time now, if I want to set up a new host to replace an old host that has either failed or has been decommissioned, I 'reuse' the previous ID (in fact it could be any ID that currently shows on my account).  There are several ways to do this but one easy way is to copy the entire BOINC tree from the machine being decomissioned (before decommissioning it) and then installing that in the same place on the new machine.  I only use Linux and never install BOINC.  I just copy a fully populated 'template' into place and launch the client as a daemon.  It also works with Windows - check out this fairly recent success story.

Just a few days ago, I acquired 6 working machines from a local small business that was closing down and added them to my farm.  I put an RX 570 in each machine.  4 had PSUs with sufficient rating and I put 2 new PSUs in the others.  I wiped Windows, installed Linux, installed the BOINC/Einstein template on each, edited each template state file to give it the host ID I wanted to recycle, and launched the BOINC client.  Here is the most recent box I've just put to work.  Note the host ID - 713788.  It was an ex-business machine I acquired at a computer auction and first put to work in 2006.  It was decommissioned around 2010 and I'm just reusing that ID rather than creating a whole new one.  My old Pentium III is now reborn as a much better looking i7 :-).

To re-use one of your previous (not current) host IDs, you need a copy of your account file (account_einstein.phys.....xml) for this project (taken from any existing machine) and a template state file (client_state.xml) - the latter with 4 particular data values in the <project>....</project> block.  Those are :-

  1. The location (venue) where you want the machine to be (generic, home, work, school).  Edit the details page on the website for the host ID you are reusing to have this same venue beforehand.  Ignore this if you don't use different locations.
  2. The host ID you are planning to reuse.
  3. The number (plus 1) of times client has contacted server - look it up on the details page.  Add 1 to that number and put that value into the template state file as <rpc_seqno>nnnn</rpc_seqno>.
  4. The directive <dont_request_more_work/>.  If you set no more work on an existing machine and browse the state file, you will see exactly where it goes.  I put it in the same place in the template.  The purpose is to give you time to see that everything is running OK, benchmarks are completed, etc, and you have reviewed your cache size settings so you can start small and not get a whole flood of work if the estimates are way out of whack.

In addition to the above critical data values, I populate the projects/einstein.phys.uwm.edu directory with the full suite of apps and standard data files that are needed.  That can save a whole raft of initial downloads when the client is launched

Cheers,
Gary.

tomjtoth
tomjtoth
Joined: 18 Sep 17
Posts: 8
Credit: 25673428
RAC: 13746

Hello Gary, Thanks for the

Hello Gary,

Thanks for the quick reply! Too bad I didn't receive any notifications and went ahead re-connecting the project.
It's even weirder now, according to my post here:
- the new ID on Arch was 12775239 - I only saw this on the website
- you mentioned it was connected to 12775435 somehow
- and 12 hours later when deleting the project and adding it again I get the same ID 12775435
Confusing, I guess I'll have to merge again after the tasks have been zeroed out. on the now-old Arch host..

Funny method you have btw, makes sense, but you also loose out on nostalgia if you don't see those machines' stats years after decommissioning. Nice achievement on Windows as well.

I checked your host, nice, I! I never tried PCLinuxOS, hopefully I won't have to hop distros anymore, now I'm back to Arch for good!
I have (similar to this host) a future plan involving used and old hardware and as many GPUs as possible, I might share it in a relevant forum topic once I started it.
How many hosts do you have in total?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118422285189
RAC: 25871203

Tamas Toth wrote:- the new ID

Tamas Toth wrote:
- the new ID on Arch was 12775239 - I only saw this on the website

That ID was created on 26th April, 17:41:31 UTC.  That's close to 7 days before your initial message about the situation.  It seems to be the original ID, not the new one.

Tamas Toth wrote:
- you mentioned it was connected to 12775435 somehow

I called it a "companion".  What I meant by that was that there was a second listing in your hosts list with what looked like an identical set of hardware.  Perhaps I should have called it a 'twin brother'.  I don't know if you have a second machine with the same hardware but if you don't then you have managed to get a second host ID for the one physical machine.  It's not really supposed to happen but it does.  It doesn't happen to me so I don't have any clues as to exactly how it happens.  The ID 12775435 was created 2 days later on 28th April, 13:34:39 UTC.  Perhaps you might be able to remember what you were doing at that time that caused the second ID to be created.

Tamas Toth wrote:
- and 12 hours later when deleting the project and adding it again I get the same ID 12775435

By 'deleting' are you referring to the button in BOINC Manager currently called 'Remove' - it used to be called 'Detach' (I think).  I just want to be sure you didn't delete the BOINC tree or something like that.  I've never had to detach and then re-attach but I believe that you are supposed to get back the same host ID if you do.  That seems to be exactly what happened with your "deleting the project", so that's not surprising.  Your duplicate ID was 12775435 and you got that same one again - nothing changed.

Tamas Toth wrote:
Confusing, I guess I'll have to merge again after the tasks have been zeroed out. on the now-old Arch host.

Your former ID 12775239 shows 8 tasks - 1 valid, 1 invalid and 6 error.  depending on how long others take to complete the 8 quorums, it could be weeks before they all disappear.  If you merge at that point, everything should successfully end up under the current ID of 12775435.  When I looked, that one showed 177 tasks - none completed, 81 errors and 96 in progress.  Is that machine actually crunching?  It's rather odd that nothing (apart from errors) has been returned.  The 'in progress' are all FGRPB1G GPU tasks.  Is there a problem with the GPU?

Tamas Toth wrote:
Funny method you have btw, makes sense, but you also loose out on nostalgia if you don't see those machines' stats years after decommissioning.

I currently have a total of of 587 host IDs under my account.  At the moment I have 108 actively returning work of which 104 have a decent GPU - about 5 or 6 have 2 GPUs.  Around 10 years ago, I decided not to unnecessarily create more new host IDs if I could avoid it.  At that time I had over 200 active machines (CPU only).  I'm not really into nostalgia but I certainly wouldn't lose out on it if I were.  When I decide to reuse an existing host ID, it's very easy to write a short entry in a notebook of the website details as of that time.

In my previous post I gave a link to the most recent machine I've added, the i7-4771 I purchased recently.  It reused the host ID of a Coppermine PIII 933MHz that ran between 2006 and 2010.  By chance, I just stumbled on this old message I wrote in 2008 within which I have a link to that very PIII 933MHz. If you click that link in that old message, you will see today's i7 :-).  What a coincidence! :-).

I chose PCLinuxOS around 2007.  Up to that time I had been attending computer auctions and buying ex-business machines sometimes workstations, sometimes servers.  I came across a large number of machines all the same brand, all with Tualatin Celeron 1300 processors.  Because of the 'Celeron' in the name, most other bidders avoided them like the plague so I had virtually no competition.  They were presented in job lots of 20 at a time with one lot per monthly sale.  Over a 9 month period I ended up with about 8 lots.

I had already had a good experience with Coppermine PIII's and excellent results from Tualatin PIII's in servers.  With a bit of research I discovered that Tualatin Celerons were just as good and were easily overclockable.  People avoided them because of the very poor performance of the early P4 Celerons.  The boxes I was buying all came with Windows XP product codes on them so I started running them under XP.  As the number of hosts grew, management became increasingly difficult so I decided to try Linux.  PCLOS was my first choice and I found it to be a perfect fit for the way I like to do things.  Back in the late 1970s I had used the Bourne shell under BSD Unix and had done quite a bit of shell scripting so the Bash shell under Linux was a joy to use for scripting purposes.  I really liked the KDE 3 desktop so converting everything to PCLOS was a no-brainer for me.  I've since had very brief encounters with other Linux distros but nothing compares with how well PCLOS works for me.  Find what works best for you and stick with it is my advice!! :-).

 

Cheers,
Gary.

tomjtoth
tomjtoth
Joined: 18 Sep 17
Posts: 8
Credit: 25673428
RAC: 13746

Actually we are talking about

Actually we are talking about 1 physical machine, it's an "hp pavilion 15-ab125no"

- About 3 months ago I installed Ubuntu 18.10 to the built-in SSD, BOINC ran just fine (I don't know what the ID was).

- I aborted all unstarted jobs and then wiped 18.10, installed Ubuntu 19.04 to the built-in SSD, but since I couldn't get openCL up and running, I didn't even install BOINC = no host ID.

- I used my "live" Archlinux thumbdrive, managed to get openCL to work, got plenty of (70..150) tasks at 26 Apr 2019 17:41:31 UTC by host ID 12775239,

- wiped the 19.04 Ubuntu install from the built-in SSD, deployed Arch in it's place at Fri Apr 26 19:35:55 2019 UTC (according to the root filesystem creation date)

- then I booted the new install which ID I do not know anymore, got again a lot of jobs and started merging with the ubuntu host ID and the live medium's ID

- and this is where I think made a huge mistake, I forgot to abort the jobs on the live medium, so I booted it back up and aborted them all, then went back to the fresh Arch install on the built-in SSD

I believe this is why the merging resulted in inheriting the live medium's ID since I used it after I already merged it's ID, but honestly I do not remember which one I did first, merged the 3 hosts, or aborted the remaining tasks on the live medium.

Yes I meant detaching by "delete", sorry. I never touch the files of packages (boinc) on linux if I don't have to. But now I'm starting to feel like I should copy paste at least those files to identify the machine. The GPU should be working fine, although now it produced my first invalid I've ever seen on Einstein. I'll keep an eye on it's performance, maybe swapping for SETI GPU WUs has something to do with it. Now I was running them as well.

Haha, that's a handful of computers to maintain :D How many different rooms/locations do you store them all? I suppose they are not all in 1 household :)
I should opt in to such auctions as well once I get my future plan started :D

Oh yes, I love BASH, I have a script I'm really proud of, I merged 10+ individual backup scripts (rsync over ssh) into 1 and since then I also added a few really useful tools into it all with --flags :D
I've been using linux distros since 2009 as my main OS and at least 50% of those years it's been Arch, I've learnt everything I know now by using it and reading their wiki.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

On the subject of merging 2

On the subject of merging 2 copies of the same machine, my understanding is that one doesn't need to wait until the results are purged from the database. The older host number should be merged into the new host number and all tasks in the database should follow along.
When one wants to delete a host record from the database, then one needs to wait until every task associated to the host is purged before that action is allowed.

tomjtoth
tomjtoth
Joined: 18 Sep 17
Posts: 8
Credit: 25673428
RAC: 13746

I think I will still wait

I think I will still wait with the merging, I'm in no hurry. But thank you for the info, I'm pretty sure now the fact I merged the 2 hosts and then booted up the older host caused the problem.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118422285189
RAC: 25871203

Tamas Toth wrote:... maybe

Tamas Toth wrote:
... maybe swapping for SETI GPU WUs has something to do with it. Now I was running them as well.

OK, that explains it.  The GPU was probably running Seti tasks rather than Einstein when I looked.  I didn't realise you were running Seti as well.

Tamas Toth wrote:
Haha, that's a handful of computers to maintain :D How many different rooms/locations do you store them all? I suppose they are not all in 1 household :)

All computers are in a single 12m deep x 6.5m wide room on the ground floor of an industrial warehouse I own.  They've been there for the last 7 years.  My space is a small separate area at the back of (and below) a much larger tenancy that is used by a commercial Real Estate business.

There is no air conditioning used for the computers.  I have installed two large industrial fans with suitable ducting, one to bring in fresh outside air and the other to expel heated air.  The airflow is through the computers which are all 'open case' and mounted in pallet racking to allow good ventilation.  It works a treat and despite the sub-tropical climate, I run all year round with very little 'excess heat' problems.  I do have (through a purpose built control script) the ability to pause crunching if the room temperature becomes 'excessive' :-).

As an example of longevity, here is a Q6600 quad core machine I built as a CPU cruncher in 2008.  These days it has an RX 580 but it has been in service 24/7 for more than 10 years and it's still going strong.  I built 6 of them at the time and they're all still running as GPU hosts.

Two years ago, I installed 100 x 340W solar panels on the roof of the building.  I commented on that in this particular message in Mike Hewson's "Sunny Thoughts" thread that he started in the Cafe.  In my next message after that (July 9), I showed a "Site Layout" for the solar panels and several pretty pictures giving graphs of power and energy stats as of mid-winter.  6 months later, at the height of summer, I showed another graph showing the massive increase in production for summer.

Tamas Toth wrote:
I've been using linux distros since 2009 as my main OS and at least 50% of those years it's been Arch, I've learnt everything I know now by using it and reading their wiki.

The Arch wiki is great.  I use it quite a lot whenever I need good detailed information about stuff I'm not familiar with.

 

Cheers,
Gary.

tomjtoth
tomjtoth
Joined: 18 Sep 17
Posts: 8
Credit: 25673428
RAC: 13746

Aaah sweet Jesus, 500 kWh :D

Aaah sweet Jesus, 500 kWh :D Nice relief with Solar.

I'm hoping for similar endurance from the parts I recently bought. They are already 7..10 years old, I couldn't resist anymore and started buying these old openCL (at least) 1.1 capable GPUs. I want to put as many GPUs as possible in one system with the help of PCIe risers and splitters. I was also thinking about buying a B250 Mining expert motherboard, but then I wouldn't be able to actually learn how these different parts fit together. So my current vision is to build at least 1 machine for all major drivers I know of on linux so that I don't have to worry about driver and KMS conflicts:

 - amdgpu
 - radeon
 - nvidia-340xx
 - nvidia-390xx
 - nvidia

So far I already have 5 cards for the nvidia-340xx and 4 cards for the radeon machine. I hope I can get more cards and machines in the near future.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118422285189
RAC: 25871203

Tamas Toth wrote:They are

Tamas Toth wrote:
They are already 7..10 years old, I couldn't resist anymore and started buying these old openCL (at least) 1.1 capable GPUs.

Older GPUs that still have a reasonable crunching capability may well have had a hard life in a gaming rig.  At the very least, expect fan lubrication problems and perhaps dubious BIOS mods or voltage mods to 'improve' gaming performance.  That would be too much of a risk for me to feel comfortable with.  They will also be power hungry and probably quite inefficient for crunching.  You need to choose carefully.

Tamas Toth wrote:
I want to put as many GPUs as possible in one system with the help of PCIe risers and splitters.

Be careful.  Although PCIe bandwidth with the current Einstein apps, is not as limiting as it used to be, it's still likely to cause problems.  The bandwidth available when using x16 and x8 is certainly quite OK and there doesn't seem to be a huge penalty if using x4.  You should at least hook up a single GPU through an x1 slot and riser (whilst still using your x16 slot(s)) and test for yourself.  It will crunch more slowly and if it seems acceptable, hook up a second or third if you can because each additional connection will subtract from the overall available bandwidth and cause slowdowns.

The bandwidth requirements are quite different to what you have with mining apps and my expectation is that the performance through x1 slots will be rather poor.  You really need to test this for yourself.

Asrock have this mining board that they claim will support 13 GPUs.  I only know about it because I noticed that one of my local computer shops has it on special for just $AUD55.  They quote the RRP of $AUD160.  That gives me a very clear message that either the board is a dud or that mining is in such a decline that the suppliers are desperate to get rid of excess stock while they can.  It's most likely the latter.

The board should have a very decent voltage control system so it should do a very good job of running a single high end GPU in the sole x16 slot.  I might buy one and stick a Radeon VII in it :-).  I do have a spare RX 570 so I could try that first - just in case the Asrock board was a dud after all :-).

Cheers,
Gary.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 292
Credit: 3444726540
RAC: 441184

Some users in the Pent were

Some users in the Pent were running with 1x risers on former mining machines ok.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.