Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2968485233

RAC: 693480

Ta. I'm used to projects

12 Dec 2022 14:58:39 UTC

Message 205088 in response to message 205087

(moderation:

)

Ta. I'm used to projects where you have to read the errors from the bottom up. Yes, 0.6/0.4 will do it - I'll go round the shrubbery again.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251147520

RAC: 41618

The workunit generator of

12 Dec 2022 15:59:52 UTC

Message 205090

(moderation:

)

The workunit generator of O3MD1(V2) (CPU) ran wild over the weekend and generated way too many WUs (1M) and Tasks (2M). The run needs to be re-started, but probably not this Year. GPU will continue. We also need to review our memory requirements, the G1 task were estimated to take 1.8GB, they really need >3GB.

mikey

Joined: 22 Jan 05

Posts: 12743

Credit: 1839145099

RAC: 3525

Ian&Steve C. wrote:Boca

12 Dec 2022 17:38:26 UTC

Message 205093 in response to message 205079

(moderation:

)

Ian&Steve C. wrote:

Boca Raton Community HS wrote:

Each CPU task is requiring ~2 GB of ram.(!) I don't think I have ever seen tasks with such large memory requirements. Our systems are chewing away at them, but wow- very memory intensive.

I'm not sure about currently, but I know at one time Rosetta@home was also using about 2GB per task.

GPUGRID's Python tasks, which are a hybrid CUDA/MT task, use ~10GB system ram, ~3GB VRAM, and 32+ cores for each task lol.

I'm running those a 12/24 core Ryzen, that is running a different Boinc Project on 23 of the cpu cores, and an Nvidia 3060 and they are not taking 10gb or ram for each task, they are taking a long time to run though:

208,848.50

87,500.00

Python apps for GPU hosts v4.03 (cuda1131)

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

GWGeorge007

Joined: 8 Jan 18

Posts: 3105

Credit: 4989180777

RAC: 1293965

mikey

12 Dec 2022 18:17:52 UTC

Message 205100 in response to message 205093

(moderation:

)

mikey wrote:

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

Hi Mikey,

What did you do to have a nice, successful processing of the 03 GPU tasks?

I don't have a single validation, and I have a bunch of errors. Another member said that I may not have the executions set to be enabled in my app. I don't recall any app... where would it be?

George

Proud member of the Old Farts Association

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 258

Credit: 10722800555

RAC: 11615627

GWGeorge007 wrote: mikey

12 Dec 2022 18:35:07 UTC

Message 205103 in response to message 205100

(moderation:

)

GWGeorge007 wrote:

mikey wrote:

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

Hi Mikey,

What did you do to have a nice, successful processing of the 03 GPU tasks?

I don't have a single validation, and I have a bunch of errors. Another member said that I may not have the executions set to be enabled in my app. I don't recall any app... where would it be?

We did not really have too many issues with these tasks either (like Mikey). We were running three of the GPU tasks simultaneously. I ran them as hard as we could for about a week to be able to send a large enough sample set of completed tasks back in order to be somewhat helpful (well, hopefully large enough).

All (6481)
In progress (4)
Pending (267)
Valid (6119)
Invalid (0)
Error (84)

GWGeorge007

Joined: 8 Jan 18

Posts: 3105

Credit: 4989180777

RAC: 1293965

Boca Raton Community HS

12 Dec 2022 19:07:22 UTC

Message 205107 in response to message 205103

(moderation:

)

Boca Raton Community HS wrote:

We did not really have too many issues with these tasks either (like Mikey). We were running three of the GPU tasks simultaneously. I ran them as hard as we could for about a week to be able to send a large enough sample set of completed tasks back in order to be somewhat helpful (well, hopefully large enough).

All (6481)
In progress (4)
Pending (267)
Valid (6119)
Invalid (0)
Error (84)

Thanks for the response. I know I'll have to wait at least a couple of weeks to try it again.

Did you have to set the permissions for execution in order to get the tasks completed? I just set mine now.

George

Proud member of the Old Farts Association

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4013

Credit: 47602332084

RAC: 44405047

mikey wrote:I'm running

12 Dec 2022 20:12:12 UTC

Message 205109 in response to message 205093

(moderation:

)

mikey wrote:

I'm running those a 12/24 core Ryzen, that is running a different Boinc Project on 23 of the cpu cores, and an Nvidia 3060 and they are not taking 10gb or ram for each task

when I said 10GB, I was referring to "system RAM", ie CPU memory. I was also rounding up to give yourself some breathing room. my big system with 2x 3060, running 4 tasks each (8 tasks total) uses ~76GB system memory.

not video memory or VRAM. uses about 3GB for each task on VRAM.

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519464903

RAC: 10206

Keith Myers wrote: Not when

12 Dec 2022 20:21:28 UTC

Message 205110 in response to message 205054

(moderation:

)

Keith Myers wrote:

Not when they impact all the other users who don't want to run beta applications.

The O3MD* work generators are running unthrottled and producing more than enough work for the few users who want to run beta tasks.

But they have overloaded the RTS buffers and everybody else that is running Gamma Ray and BRP4/7 work is getting no work when requested even though there is plenty of it in the Ready to Send categories.

The beta work is swamping the download servers and schedulers and preventing all the other work from being sent out.

I am down over a thousand tasks in my 3 card hosts from my set cache levels and continuing to fall without replenishment. I will be out of work in just 8 hours.

Seems like the server isn't too bright (although this is Boinc....)

You would think if there is A B C D and E needing done, the scheduler would take even amount of each. A seperate queue of each. Users take from whichever queue(s) they want. When a queue is running low, it takes more from the relevant generator. It would be monumentally stupid to just allow 1 generator to fill the scheduler up with 1 type of task. Milkyway for example has 10000 tasks queued for seperation and 1000 for nbody. One doesn't swamp out the other. What's gone wrong here?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519464903

RAC: 10206

Elphidieus wrote:Beta

12 Dec 2022 20:49:04 UTC

Message 205112 in response to message 205068

(moderation:

)

Elphidieus wrote:

Beta Settings: Run Test Applications = Yes, as long as they are NATIVE Arms app, neither Intel nor Legacy apps...

Allow non-preferred apps = Already No...

Looks like I have to turn Beta Settings OFF then... sad...

Thanks archae86...

Something is up if you allow beta and it gives you beta of an app you haven't selected.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Keith Myers

Joined: 11 Feb 11

Posts: 4992

Credit: 18831363049

RAC: 5826110

Milkyway had the same issue

12 Dec 2022 21:31:40 UTC

Message 205113 in response to message 205110

(moderation:

)

Milkyway had the same issue with N-body tasks swamping the download server buffers. Nobody was getting any Separation work even though there was plenty in the RTS buffers.

The RTS category is not the same thing as the download buffer. If projects follow suit as how Seti servers were configured, the download buffer holds 100 tasks. That is all. When you hit the scheduler for a work request the scheduler fills it out of that download server buffer of exactly 100 tasks.

When it gets emptied, it refills from all the Ready to Send sub-project caches. When you hit the scheduler right after a fast host has just emptied it right before your scheduler connection is serviced, the buffer is empty and you get the no tasks to send message.

When the Ready to Send caches of a single sub-project are 10X -100X the size of the other sub-project caches, the download buffer will be swamped and filled entirely by the unthrottled work 100X oversized cache and there will not be a single type of other work in that 100 task buffer.

So you get the same message from the scheduler . . . no work to send. The end result is that the one sub-project, in our case the new O3MD* work completely excluded all other sub-project work from being available.

Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Forums › Technical News

Comment viewing options

Forums › Technical News