Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Vato

Joined: 19 Jun 10

Posts: 2

Credit: 78642280

RAC: 78607

At around the same time that

13 Jan 2023 23:23:20 UTC

Message 206552

(moderation:

)

At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421

archae86

Joined: 6 Dec 05

Posts: 3159

Credit: 7246059936

RAC: 1326492

Vato wrote:At around the

14 Jan 2023 2:28:14 UTC

Message 206567 in response to message 206552

(moderation:

)

Vato wrote:

At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421

Yes, though my case is a bit odd.

I have three hosts running Einstein, and with the "fixed" application the one that formerly errored all O3 GPU units in early December was now able to run them to completion and validation. Initially all three hosts got a really large fraction of GW tasks relative to BRP tasks, so as a means of throttling I turned off O3 task download for all but about an hour a day.

But a couple of days ago or so, this resulted in zero O3 tasks during the hour I permitted both. The next day, as a test, I turned off BRP permission, and still got zero O3 tasks in rather more than an hour. As gazillions of O3 tasks show as ready to send, it seems something thought my system in some way unsuitable.

While composing this comment, I've switched preferences again temporarily only to request O3 GPU tasks. I'll see whether any come now.

[edit to add observations:

After more than an hour with all three hosts requesting O3 GPU work only repeatedly, zero O3 tasks were sent.

Here are what I imagine are the relevant lines from the work request log from one of those hosts late in this hour:

Quote:

[send] Not using matchmaker scheduling; Not using EDF sim
[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] ATI: req 8725.21 sec, 0.00 instances; est delay 0.00
[send] work_req_seconds: 0.00 secs
[send] available disk 58.75 GB, work_buf_min 172800
[send] active_frac 0.869587 on_frac 0.999825 DCF 0.196890
[mixed] sending locality work first (0.0542)
[send] send_old_work() no feasible result older than 336.0 hours
[send] send_old_work() no feasible result younger than 208.7 hours and older than 168.0 hours
[mixed] sending non-locality work second
[send] [HOST#12260865] will accept beta work. Scanning for beta work.
[debug] [HOST#12260865] MSG(high) No work sent
Sending reply to [HOST#12260865]: 0 results, delay req 60.00
Scheduler ran 12.210 seconds

I've decided to request FGRP only again for a while pending resolution of this situation]

Ereignishorizont

Joined: 17 May 21

Posts: 19

Credit: 3033637047

RAC: 820412

Vato schrieb:At around the

14 Jan 2023 7:46:15 UTC

Message 206579 in response to message 206552

(moderation:

)

Vato wrote:

At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421

The same here. No O3MDF-Tasks for my NVIDIA-GPUs for a few days now.

DF1DX

Joined: 14 Aug 10

Posts: 106

Credit: 3903377854

RAC: 2284925

I can confirm this. Still

16 Jan 2023 10:13:42 UTC

Message 206674

(moderation:

)

I can confirm this. Still haven't received any O3MDF today.

On my CPU (AMD 3700x, @45 W, 8 tasks) the O3MD1 tasks take about 21 hours each.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251182139

RAC: 41772

There was a problem with the

16 Jan 2023 11:22:39 UTC

Message 206677

(moderation:

)

There was a problem with the project configuration that was fixed just minutes ago. It should work again now.

Aurum

Joined: 12 Jul 17

Posts: 77

Credit: 3412397040

RAC: 133

What is Erorr 1152 and can I

16 Jan 2023 15:10:00 UTC

Message 206683

(moderation:

)

What is Erorr 1152 and can I do anything to alleviate it?

MAIN: XLALComputeFstat() failed with errno=1152
2023-01-13 23:35:17.1679 (432423) [CRITICAL]: ERROR: MAIN() returned with error '1152'

https://einsteinathome.org/task/1409447039

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4016

Credit: 47630918331

RAC: 43866462

Aurum wrote: What is Erorr

16 Jan 2023 15:15:23 UTC

Message 206684 in response to message 206683

(moderation:

)

Aurum wrote:

What is Erorr 1152 and can I do anything to alleviate it?
MAIN: XLALComputeFstat() failed with errno=1152
2023-01-13 23:35:17.1679 (432423) [CRITICAL]: ERROR: MAIN() returned with error '1152'
https://einsteinathome.org/task/1409447039

you need to look at the first error in the chain. everything after that is just cascading errors as fallout.

your real issue is this:

failed with OpenCL error: CL_MEM_OBJECT_ALLOCATION_FAILURE

you ran out of VRAM. if you're trying to run 4x tasks, it wont work. there is only enough VRAM on the 3080ti for 3x tasks.

_________________________________________________________________________

Aurum

Joined: 12 Jul 17

Posts: 77

Credit: 3412397040

RAC: 133

Thanks, I was running 3 tasks

16 Jan 2023 15:55:40 UTC

Message 206686 in response to message 206684

(moderation:

)

Thanks, I was running 3 tasks at a time but now I'm running just one on all GPU models. So far so good.

This project DLs far too many WUs and so they quickly trigger Running High Priority. This sometimes switches a running WU to Waiting and so with 3 WUs running and one or two Waiting it may have wanted too much VRAM.

If the supply is going to be continuous might be a good idea to run in RZM.

archae86

Joined: 6 Dec 05

Posts: 3159

Credit: 7246059936

RAC: 1326492

Bernd Machenschalk

16 Jan 2023 17:01:17 UTC

Message 206687 in response to message 206677

(moderation:

)

Bernd Machenschalk wrote:

There was a problem with the project configuration that was fixed just minutes ago. It should work again now.

I confirm that all three of my hosts received new GW O3 GPU work after the change today. They had not received any since seven days earlier, with the last at 14:26 UTC on January 9.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5874

Credit: 117991248131

RAC: 21123178

Aurum wrote:This project DLs

16 Jan 2023 21:38:42 UTC

Message 206694 in response to message 206686

(moderation:

)

Aurum wrote:

This project DLs ...

No, it doesn't. The project tries to supply exactly what the client asks for. Your client needs to stop asking :-).

You have to figure out why the client is asking for so much work that high priority mode is being triggered. Because of things like you describe, you really, really, really don't want to allow the client to go into high priority mode (panic mode). Things can become really complicated if you run multiple projects, multiple searches per project and asymmetric resource shares. Perhaps as a first step you might review the settings for work cache size to see if a reduction there lowers the amount of work on hand for Einstein to a point where panic mode is never triggered.

Cheers,
Gary.

Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Forums › Technical News

Comment viewing options

Forums › Technical News