Sporatic validate errors using Parkes PMPS XT v1.57-BRP6-Beta-cuda55

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18715278919

RAC: 6369938

I've found that both BRP4G

14 Jul 2016 17:26:45 UTC

Message 140242 in response to message 140241

(moderation:

)

I've found that both BRP4G and BRP6 tasks respond best to GPU memory clock speed increases. More for the BRP6 Beta CUDA55 tasks which seem to benefit the most. So I would try bumping memory up first until you error, then back down and try some small bumps to core speed till you error.

Sorry about the hijack, I have a bad tendency about that.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi Keith, You may like to try

16 Jul 2016 16:05:12 UTC

Message 140243 in response to message 140228

(moderation:

)

Hi Keith,
You may like to try assigning a diff project to one GPU using the 'exclude' command in cc-config.xml then simply suspend the added project, if you are monitoring the GPU's using SIV or OHM you will see which GPU goes back to basic running and which keeps on crunching.

BOINC and its picking out which GPU is which is a PITA, but it can be determined with a bit of work:-)

What I'd really like to know is why no 2 WU take even remotely the same time to crunch:-(
I've had BRP6 WU complete in 45mins, the next in 50mins and so on.. Every WU is diff from every other WU.. so working out how long it will take to work at 2x or 3x doesn't come up with a standard time..

Cliff,

Been there, Done that, Still no damm T Shirt.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7220704931

RAC: 943801

RE: What I'd really like to

16 Jul 2016 16:13:07 UTC

Message 140244 in response to message 140243

(moderation:

)

Quote:

What I'd really like to know is why no 2 WU take even remotely the same time to crunch:-(
I've had BRP6 WU complete in 45mins, the next in 50mins and so on.. Every WU is diff from every other WU.. so working out how long it will take to work at 2x or 3x doesn't come up with a standard time..

It is not a difference in the Einstein BRP6 work units. They have the same computational content, and on appropriately configured systems take remarkably similar times to complete.

When you have multiple tasks running, the question of which task gets swapped in next can be non-random if the support task running on the CPU is swapped in or not, and how the Windows scheduler decides which core to put it on, how "sticky" that assignment is, and such.

In short, the answer in is details of your machine and of the Windows scheduler, not in variations in the Work units themselves.

Some Einstein applications in the past have had substantial variation in WU work content, and that is of course also true at some other projects. My answer is narrowly about BRP6.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi Archae86, Well I am a

16 Jul 2016 20:48:04 UTC

Message 140245 in response to message 140244

(moderation:

)

Hi Archae86,

Well I am a bit OCD where BOINC is concerned, ie I monitor my computer personally and watch it like a hawk:-)

And watching BOINC crunch 1x BRP6 on a single GPU [EVGA 980ti Hybrid] I see
WU alpha complete in xx mins and nn secs and then the very next WU complete in XX+n mins and NN+n seconds, and the diff in timing can be as much as 10 mins between WU..

Hence my question, if ALL WU are precisely the same size, why don't they take the same time to complete? And run x1 and on the same GPU since I use the 'exclude' command in my cc_config to use that particular GPU for E@H tasks only and reserve my other 980ti for MW@H all of which DO take exactly the same time to complete ie 31 seconds.

Its just E@H WU that vary and by a significant amount:-( Hence trying to determine a time to complete is next to impossible for any category of GPU task,
even BRP4G vary between WU by some time.
So either my computer is unique in its manner of working, and with my version of BOINC using properties one can see the throughput rate and it varies with each WU.. or there is summat else going on..

I've been running 1x for months as my old CPU was a bit iffy heat wise so running single instance made sense, now I have a brand new CPU and heat isn't so much of a problem I've decided to try out 2x and now 3x in the GPU, but once again I'm bugged by radically diff timings.

Cliff,

Been there, Done that, Still no damm T Shirt.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

I'm not entirely sure what

16 Jul 2016 22:32:05 UTC

Message 140246 in response to message 140245

(moderation:

)

I'm not entirely sure what projects you are running, but i might suggest picking one single application eg BRP6 and run it only at x1 with no other tasks.

If stable move to running at x2 etc.

i would also run cpu-z and gpu-z in a window over the length of several tasks - it may reveal some down clocking (heat related for example) which will affect times.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

RE: When you have multiple

16 Jul 2016 22:32:51 UTC

Message 140247 in response to message 140244

(moderation:

)

Quote:

When you have multiple tasks running, the question of which task gets swapped in next can be non-random if the support task running on the CPU is swapped in or not, and how the Windows scheduler decides which core to put it on, how "sticky" that assignment is, and such.

In short, the answer in is details of your machine and of the Windows scheduler, not in variations in the Work units themselves.

Some Einstein applications in the past have had substantial variation in WU work content, and that is of course also true at some other projects. My answer is narrowly about BRP6.

Well I now have 3 WU crunching.
PM138_671_44_1 running 3% faster than the other 2
PM138_00671_140_0 & PM138_00671_134_0 are running at 55.080% per hour, the faster WU is running at 61.920% per Hour....

And that faster WU was suspended for 5 mins in order to try to get it to complete at about the same time as the other 2, and will need to be suspended yet again to get a similar completion time so the next 3 will load at the same time..

its bleeding daft:-(

Cliff,

Been there, Done that, Still no damm T Shirt.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi Agent B, RE: I'm

17 Jul 2016 1:54:43 UTC

Message 140248 in response to message 140246

(moderation:

)

Hi Agent B,

Quote:

I'm not entirely sure what projects you are running, but i might suggest picking one single application eg BRP6 and run it only at x1 with no other tasks.
E@H on GPU0 & MW@H on GPU1 I use 'exclude' to assign project WU to specific GPU's.

If stable move to running at x2 etc.

I ran 1x for quite a long time, with several GPU's and with 3 AMD CPU's, and the problem has existed since I started crunching E@H tasks..

i would also run cpu-z and gpu-z in a window over the length of several tasks - it may reveal some down clocking (heat related for example) which will affect times.

I use OHM to monitor both CPU and GPU temps, the only O/C is the GTX980ti mem P02 state using NVI and that's set to 3505Mhz for both GPU's.
Running GPUZ wouldn't give me any more info temp wise than OHM and running yet another program would simply tie up more CPU cycles.. OHM also monitors memory speed and GPU temps on the fly...

As for projects I run E@H and MW@H each on separate GPU's..

No CPU tasks are being run for any project on the machine in question.

Cliff,

Been there, Done that, Still no damm T Shirt.

mmonnin

Joined: 29 May 16

Posts: 291

Credit: 3393466540

RAC: 2829846

3505 would be your memory

20 Jul 2016 0:52:01 UTC

Message 140249

(moderation:

)

3505 would be your memory speed on a 980Ti. Is the core speed stable or fluctuating its boost OC?

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi mmonnin, RE: 3505

20 Jul 2016 2:22:39 UTC

Message 140250 in response to message 140249

(moderation:

)

Hi mmonnin,

Quote:

3505 would be your memory speed on a 980Ti. Is the core speed stable or fluctuating its boost OC?

Core speed's are stable for both 980ti's

I'm using BOINC 7.6.22 and by highlighting each WU tin turn and using the properties tab I can check the throughput as BOINC sees it, and both WU are often diff, only the odd 2x pair manage to achieve the same throughput.

Dunno what exactly is up, but far too many WU are being completed at well diff speeds.

Cliff,

Been there, Done that, Still no damm T Shirt.

Sporatic validate errors using Parkes PMPS XT v1.57-BRP6-Beta-cuda55

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports