ABP2 CPU-only applications

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250657113

RAC: 34363

RE: RE: RE: Got

19 Feb 2010 16:44:16 UTC

Message 96501 in response to message 96500

(moderation:

)

Quote:

Quote:
Quote:
Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.

Interesting. Anything running on this machine that could eati up memory at that time?

Quote:
Interesting thing is I still get APP2 WUs stamped to be done by app 1.08, while the actual app should be 1.11.

The CUDA App version is at 1.11; the CPU App is 1.08. That's ok.

BM

The server has 8GB RAM and low load, just some Apache instances, mail server and PosgreSQL running.

I see this 'segfault' errors occasionally happen on some machines, usually all app instances running there get this signal at the very same time, and without any relation to the application source code line they are in or the data they are processing, so this isn't a real programming error.

I suspected the Linux 'optimistic memory allocation' to be responsible for that, that randomly kills processes if the physical memory isn't enough for the memory it 'optimistically' assigned to processes, but it's hard to believe that this is the case here.

We currently loose us up to ~2000h of computing time per day due to this problem.

M. Schmitt

Joined: 27 Jun 05

Posts: 478

Credit: 15872262

RAC: 0

RE: RE: RE: RE: Got

19 Feb 2010 17:49:03 UTC

Message 96502 in response to message 96501

(moderation:

)

Quote:

Quote:
Quote:
Quote:
Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.

The server has 8GB RAM and low load, just some Apache instances, mail server and PosgtreSQL running.

I see this 'segfault' errors occasionally happen on some machines, usually all app instances running there get this signal at the very same time, and without any relation to the application source code line they are in or the data they are processing, so this isn't a real programming error.

Same thing here. Server is running kernel...think you know.. ;)

Quote:

I suspected the Linux 'optimistic memory allocation' to be responsible for that, that randomly kills processes if the physical memory isn't enough for the memory it 'optimistically' assigned to processes, but it's hard to believe that this is the case here.

Hm, anything related to 64bit os and 32bit compatibility libs maybe?

Quote:

We currently loose us up to ~2000h of computing time per day due to this problem.

Oh this is ugly. Any information about the distributions/kernels involved? Didn't see this problem on my other hosts so far. The server runs OpenSuse 11.1(64bit), my laptop runs OpenSuse 11.2(32bit), old Athlon XP 3000 runs OpenSuse 10.3 like my development host(64bit). Former root server run OpenSuse 10.3(64bit/8GB) without segfaults. And there is still a little chance for cpu errors or memory failures. Hear about memory problems more and more - maybe a consequence of low profit for the manufacturers and higher integration.
But: Why don't other apps(exception for FF ;)) crash from time to time if this is a Linux problem? I really cant remember a fatal crash on one of my systems in the last years.
And last not least, could the problem be circumvented by a program restart after killed by OOM? Would require the BOINC client to be changed or a wrapper program calling/controlling the science apps(overhead?). But I'm no C/C++ coder, so I might be far off road. ;)

cu,
Michael

[Edit]'killed by OOM' should read as 'ended by out-of-memory killer'.

[Edit2]Last signal 11 on my X2 5000 with E@H:
2008-01-18 18:06:30 [Einstein@Home] Reason: Unrecoverable error for result h1_0762.95_S5R2__255_S5R3a_0 (process got signal 11)

Logfile started 2006 :)

Athlon XP 3000+ running 24/365:
Never ever any signal 11 since 14-May-2008(logging started)

dan

Joined: 7 Sep 10

Posts: 3

Credit: 5343

RAC: 0

Hi Gary I have been working

14 Sep 2010 6:00:47 UTC

Message 96503

(moderation:

)

Hi Gary I have been working with windows 7 taskman and it seems that when I am not using my computer I punch up the running programs by changing the cpus and raising the usage...I also bring the boinc to the front..

Fred J. Verster

Joined: 27 Apr 08

Posts: 118

Credit: 22451438

RAC: 0

Since I switched from

15 Sep 2010 10:16:11 UTC

Message 96504 in response to message 96503

(moderation:

)

Since I switched from 9800GTX+ and 8500GT to GTX470 and now 480, no problems with
CUDA, anymore.
I don't know if it's accepted, but according to the cards GPU & Memory-Load, it's
possible to run 2 at a time.?

(I run 3 SETI MB at a time, which gives a good Load on GPU, 99% and 60% for Memory controller, on it's 384BIT's bus)

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: I don't know if it's

15 Sep 2010 11:22:54 UTC

Message 96505 in response to message 96504

(moderation:

)

Quote:

I don't know if it's accepted, but according to the cards GPU & Memory-Load, it's possible to run 2 at a time.?

You are in the wrong thread (CPU vs. GPU), but I think it's accepted anyway. ;-)

You'll just have to create an app_info.xml with the correct entries. There is at least one other thread with infos about that.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

On my SuSE Linux 11.1 32-bit

26 Oct 2010 13:50:14 UTC

Message 96506

(moderation:

)

On my SuSE Linux 11.1 32-bit pae I can see the ABP2 graphics when I want, but not the S5GC1 graphics. Although I use it rarely because it takes a lot of CPU, I am wondering why.
Tullio

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

I don't know, if this is the

27 Oct 2010 7:36:32 UTC

Message 96507 in response to message 96506

(moderation:

)

I don't know, if this is the right thread, but I can't find a better one. Can anybody tell me about the update cycle of the webpages. I see a number of tasks, wich are finished and uploaded for almost 24 hours still as "in progress". What's the reason for this delay?

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6591

Credit: 319481835

RAC: 428791

RE: I don't know, if this

27 Oct 2010 7:52:27 UTC

Message 96508 in response to message 96507

(moderation:

)

Quote:

I don't know, if this is the right thread, but I can't find a better one. Can anybody tell me about the update cycle of the webpages. I see a number of tasks, wich are finished and uploaded for almost 24 hours still as "in progress". What's the reason for this delay?

Well they are still in progress. But you'd be waiting on your 'wingman'. All work is duplicated ( at least ) to two different hosts, of which you are one in this case. When the other host returns work, then validation occurs, credit is awarded etc and all being well the matter is settled. How long to wait? Well that depends on the activity of the other host and/or other circumstances like missing of deadlines, possible re-issue to complete the quorum ( 2 validated results ) and the like ......

Cheers, Mike.

( edit ) One is always welcome to fire up a new thread if you judge there is no current suitable one ... :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: I see a number of

27 Oct 2010 16:36:04 UTC

Message 96509 in response to message 96507

(moderation:

)

Quote:

I see a number of tasks, which are finished and uploaded for almost 24 hours still as "in progress". What's the reason for this delay?

There's no delay. Tasks are considered "in progress" until they are reported, which is a process separate from uploading.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Cannibal Corpse

Joined: 21 Feb 05

Posts: 18

Credit: 1555535

RAC: 0

Hello all!! Before I go out

14 Nov 2010 0:39:18 UTC

Message 96510

(moderation:

)

Hello all!! Before I go out and get a loan for the new GTX 580, will it be usable to crunch? Is there an compatibility issue? Oh I will get one reguardless, unless there is a better card?
If and/or when it can crunch,I will post the results and/or report any bugs.

DO WHAT THOW WILL SHALL BE THE WHOLE OF THE LAW.
PROUD MEMBER OF THE CARL SAGAN TEAM.

DO WHAT THO WILL SHALL BE THE WHOLE OF THE LAW.
PROUD MEMBER OF THE CARL SAGAN TEAM.

ABP2 CPU-only applications

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner