Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

B.I.G wrote:Mine all

B.I.G wrote:

Mine all finnished with an error, for example:

https://einsteinathome.org/task/893062516

Aborted the remaining tasks, 1.0.7 worked fine on that machine.

I assume you were referring to the 'Gamma-ray pulsar binary search #1 on GPUs v1.17 () x86_64-apple-darwin' application. Your host has run those tasks  succesfully.

That Mac has NVIDIA GeForce GT 650M (1024MB) GPU and Intel GPU chip. It tried to run these tasks with that Nvidia GPU. Intel chip wouldn't be compatible at this point, that's for sure, but it also seems that Nvidia GPUs from that generation have not been compatible with this application in Windows environment either. That GPU architecture may possibly not quite meet the internal requirements of these GW tasks.

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 117
Credit: 1176657400
RAC: 993672

I was referring to the the

I was referring to the the previous gravitational wave ones which worked fine (1.0.7 and 1.0.8 if I remember correct), got new gravitational wave tasks yesterday which then all had an error at the end. So I just uncheck the GW waves for this host and only allow Gamma-Ray Pulsar Search tasks?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250581552
RAC: 34667

I told the workunit generator

I told the workunit generator to avoid the frequencies around 505Hz and turned on the GW search again.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250581552
RAC: 34667

The 'validate errors' from

The 'validate errors' from the new ("O2MD1G2") tasks should now be back to normal (i.e. mostly switched to 'valid'). I can only see 20 'validate error' tasks, which are for good reasons.

BM

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Thanks Bernd, that sounds

Thanks Bernd, that sounds great !

B.I.G wrote:
I was referring to the the previous gravitational wave ones which worked fine (1.0.7 and 1.0.8 if I remember correct), got new gravitational wave tasks yesterday which then all had an error at the end. So I just uncheck the GW waves for this host and only allow Gamma-Ray Pulsar Search tasks?

Okay, I just didn't find any earlier GW stuff on the task list of that particular computer. Maybe everything is not visible there anymore. I think that's a good idea to uncheck GW stuff for now. But of course you could try the app version (maybe new version) later again. Maybe something on the app might have changed by then.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Bernd Machenschalk wrote:I

Bernd Machenschalk wrote:
I told the workunit generator to avoid the frequencies around 505Hz and turned on the GW search again.

How wide is that avoided area? This host didn't get new gpu tasks in a while (there was 'resend' thing going on, which dried out the queue). The host got that single v2.00 cpu task that it was missing and I aborted it. Then the host got fresh bunch of gpu tasks, containing 502.30 - 502.50 Hz stuff.

Some of them seem to run ten times slower than their sisters from the same bunch. I think that's not how it should be, so I'm aborting those slow tasks after looking at their progress long enough to identify them one by one.

The host that had problems with 505 Hz tasks earlier, got three more 505 Hz tasks after I let it download fresh tasks. They errored out just like earlier and I forced NNT. I thought maybe the system had some kind of remnants of the work files which might had caused that.

So I hit 'Reset project'.... and after that, host received 510 Hz stuff . These tasks run fine. I was just wondering if 'Reset project' was actually needed for the 'frequency avoidance' to come into action. Should this other host have received those 502 Hz tasks at all after that change had been already applied ?

 

edit: "502.30 - 502.50 Hz stuff" ...

I noticed that running them only 1x won't slow  them down. I left them running 1x overnight and they are still going strong. That slowdown problem with 2x is propably not anything generic but related to this specific hardware and those 502 Hz tasks. I will let the remaining tasks finish. Then I'll click "Reset project" and see what frequency new tasks will have and if 2x will work again.

edit #2: I did a "Reset project". The host received then 523 Hz tasks. They run fine with 1x but when I tried 2x the card was again suffocating. The first task that was already running half way when a parallel task was allowed was able to run to end with full speed, though. But that other task was extremely slow from the beginning. And then all tasks after that would be also extremely slow. already right from the beginning.

This is a host-specific thing only and is occurring now... I don't know why. But I'll consider this a mystery and leave this as is. 2x was working on this host before but not anymore. Run times with 1x are alright, so I'll just let it be. For the record, this host has R9 390 + Windows 10. Another host with R9 390 + Windows 7 has been running these tasks happily 2x all the time. Same GPU driver version on both.

tictoc
tictoc
Joined: 1 Jan 13
Posts: 44
Credit: 7226195409
RAC: 7828771

After seeing some positive

After seeing some positive results on a 5700XT https://einsteinathome.org/goto/comment/174106, I decided to fire mine up and see how things went in Linux.

Unfortunately, I am not seeing the same results running a 5700XT in Linux.  So far all of the GW tasks I ran have come back invalid. https://einsteinathome.org/host/12492619/tasks/5/0

Nothing is obviously wrong in the output, but I'm going to assume it is a driver issue since a 5700XT running in Windows is not having any invalid results on the v2.02 tasks.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Bernd Machenschalk wrote:This

Bernd Machenschalk wrote:
This looks like a problem with the workunit setup that we have already seen in O2AS. Seems to be an anomality in the application code around 505Hz.

Just to confirm that. Host 12768123 received eight 504.85Hz tasks from that O2AS v1.09 search today. All those tasks crashed in 30 secs, this time with Nvidia card. Then again two tasks with 582Hz content run fine and validated succesfully.

Ubik
Ubik
Joined: 30 Oct 09
Posts: 9
Credit: 6149130
RAC: 10660

I had a bunch of O2MD1 tasks

I had a bunch of O2MD1 tasks on a Mac (v2.02 (GW-opencl-nvidia) x86_64-apple-darwin) error out with error code

97 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

exceeded elapsed time limit 12278.46

When the time limit of 12,278 sec was exceeded the progress bar (if trustworthy) showed that computations had been 66% completed. 

What surprises me though is that when one of those WUs, when sent to windows hosts, was completed successfully, but with run times of 21,500 sec. and 37,000 sec, respectively. This is much longer than the 12,278 sec which my setup was allowed to run for.

See for example https://einsteinathome.org/de/workunit/424661775

Thus I am wondering whether the max time limit for O2MD1-v2.02 (GW-opencl-nvidia) x86_64-apple-darwin-WUs is unnecessarily short -- or whether my rig is really underpowered for the job.

 

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Ubik wrote:What surprises me

Ubik wrote:
What surprises me though is that when one of those WUs, when sent to windows hosts, was completed successfully, but with run times of 21,500 sec. and 37,000 sec, respectively. This is much longer than the 12,278 sec which my setup was allowed to run for.

Hi! The difference is that your host tried to run the GPU version (2.02) of that application. The other hosts run CPU version (2.00). GPU is supposed to complete tasks much faster than CPU. I quess the run time limit for CPU tasks might be much longer because of that.

Your host has Nvidia GT650M which in theory should have roughly about 25% of the computational power of a  GTX 960 for example. If that would reflect directly on the run time of these tasks, I believe GT650M should be able to complete a task in about max 2 - 3 hours. In that perspective, your GPU was running those tasks already overtime when the progress was at 66 %.

But I believe GTX650M is simply incompatible with this current GW app. That's why the progress is slow or it is slowing down exponentially the further it gets. And that leads to those tasks not being able to complete at all.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.