No work available for FGRPB1G

Keith Myers

Joined: 11 Feb 11

Posts: 5055

Credit: 19192372725

RAC: 5852973

4 Sep 2018 23:08:54 UTC

Topic 216269

(moderation:

)

Anybody heard anything about the lack of FGRPB1G gpu tasks?

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

I just picked up some at

5 Sep 2018 1:35:02 UTC

Message 166685

(moderation:

)

I just picked up three at 22:34 UTC for my Ubuntu machine.

But maybe that was the last of them?

Keith Myers

Joined: 11 Feb 11

Posts: 5055

Credit: 19192372725

RAC: 5852973

I haven't received any

5 Sep 2018 2:41:36 UTC

Message 166686

(moderation:

)

I haven't received any today. Crunchers are cold from the lack of work from Seti, Einstein and MilkyWay. They got a little bit of work from GPUGrid but that didn't last long.

I knew there would be troubles when MilkyWay announced that they would be starting regular Tuesday maintenance schedules along with Seti. Always have depended on Einstein to carry the load when my other projects dropped out. Now Einstein is not dependable either. So what's the use of having backup projects when your backup projects don't have work either. Bah humbug!????

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119406569013

RAC: 25924964

Jim1348 wrote:.... maybe that

5 Sep 2018 4:49:22 UTC

Message 166687 in response to message 166685

(moderation:

)

Jim1348 wrote:

.... maybe that was the last of them?

They were probably resends - extra copies of failed or deadline miss tasks. You can easily distinguish resends from the initial 'primary' tasks. They will have an _2 (or higher) extension in their name, as opposed to _0 or _1 for the original copies.

There are currently almost 0.25M tasks in progress and at a rough guess, maybe 10-20% of those will fail in some way or not be returned. So even if the workunit generator isn't kicked into life at today's start-of-business in Hannover, there will be quite a few more resends to be picked up if you happen to be asking at just the 'right' time :-).

Only problem is the 'right' time could be anywhere between now and a few weeks into the future :-).

We just have to hope that last night wasn't 'Wild Party Night' in Hannover and that people get to work on time and sufficiently awake to notice :-).

I'm always a bit bemused when one of these 'once in a blue moon' events happens. Other projects have problems/outages more regularly, but if it happens here, this project has suddenly become unreliable?? :-).

Cheers,
Gary.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253303682

RAC: 38605

There has been a problem in

5 Sep 2018 7:47:00 UTC

Message 166688

(moderation:

)

There has been a problem in the pre-processing pipeline that we weren't able to fix before the buffers ran dry. We're frantically working on it.

Details: the central file system of our compute cluster Atlas is currently unstable, and apparently neither we nor the vendor support is able to determine the root cause to ultimately fix it. Currently it's working, and if it doesn't tip over in the next 12h we should be able to push at least one or two FGRPB1G datasets trough. Typing this with fingers crossed...

Millenium

Joined: 8 Oct 14

Posts: 21

Credit: 33102476

RAC: 0

Keith Myers wrote:Now

5 Sep 2018 19:39:54 UTC

Message 166707 in response to message 166686

(moderation:

)

Keith Myers wrote:

Now Einstein is not dependable either. So what's the use of having backup projects when your backup projects don't have work either. Bah humbug!????

A bit exagerated in my opinion luckily Einstein is a solid project! I just dislike the website graphic but that is just an aesthetic personal preference!

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119406569013

RAC: 25924964

We seem to have some work

5 Sep 2018 20:20:11 UTC

Message 166709

(moderation:

)

We seem to have some work again!! :-).

Cheers,
Gary.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1638084773

RAC: 485018

Yep, I didn't run dry life

5 Sep 2018 20:31:59 UTC

Message 166710 in response to message 166709

(moderation:

)

Yep, I didn't run dry life and is good in Einstenland.

Millenium

Joined: 8 Oct 14

Posts: 21

Credit: 33102476

RAC: 0

Confirming, received some WUs

5 Sep 2018 20:34:41 UTC

Message 166711

(moderation:

)

Confirming, received some WUs

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119406569013

RAC: 25924964

Betreger wrote:Yep, I didn't

6 Sep 2018 2:09:31 UTC

Message 166714 in response to message 166710

(moderation:

)

Betreger wrote:

Yep, I didn't run dry ....

All of mine ran dry.

I keep a fairly small work cache on each machine and I have a script that gets all machines to top up to an extra 0.5 days worth for the over-night period. As Murphy's Law would always predict, the well ran dry just in time to make sure that nobody was going to get a drink last evening :-). I hung around for an extra hour or two after Bernd said he was on to it with fingers crossed, but as we all know, crossed fingers can't defeat Murphy :-).

All of mine are pretty much back to normal now. There are two particular problems for me if hosts run dry temporarily. The first is that after trying to get work and being refused, hosts will go into increasingly longer backoffs which means that there can be quite a delay in asking for a drink when the well fills up again. I've solved that one by having an option in one of my scripts that requests the local boinccmd to force an 'update' on each host. Whatever stage of the backoff a host happens to be in, this gets canceled by the 'update' and the host can take an immediate drink.

The other problem is that when tasks finally arrive (with all my GPUs running at least 2 concurrently) concurrent tasks all start together and tend to stay that way. My experience suggests that you don't get the lowest crunch time if tasks start and finish at the same time so I try to make sure that start times are staggered so this inefficiency is avoided as much as possible. I've spent the last few hours, sorting that out manually. I'm now designing a new module that hopefully will be able to do the job automatically.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7373781687

RAC: 2153534

Gary Roberts wrote:The other

6 Sep 2018 3:40:01 UTC

Message 166715 in response to message 166714

(moderation:

)

Gary Roberts wrote:

The other problem is that when tasks finally arrive (with all my GPUs running at least 2 concurrently) concurrent tasks all start together and tend to stay that way. My experience suggests that you don't get the lowest crunch time if tasks start and finish at the same time so I try to make sure that start times are staggered so this inefficiency is avoided as much as possible. I've spent the last few hours, sorting that out manually. I'm now designing a new module that hopefully will be able to do the job automatically.

Gary, While I also try to adjust matters from time to time to get two jobs sharing a GPU to be offset in time, I suspect this is less useful with the current flavor of WU data file than it has been at other times. At least on my Nvidia/Windows 10 platforms, the time spent in the "past the end" portion has shrunk to the vanishing point. I think the main virtue of offset times was allowing one of the two jobs to be running in the main portion while the other was "past then end". As that condition can't occupy any appreciable time at the moment, I suspect taking the trouble to induce offset is unusually unfruitful at the moment.

None of which says anything about what a next batch of WU files might be like.

No work available for FGRPB1G

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner