What's up with LHC at Home?

Donald A. Tevault

Joined: 17 Feb 06

Posts: 439

Credit: 73516529

RAC: 0

RE: If I "build" the task

6 Feb 2009 18:32:30 UTC

Message 89997 in response to message 89996

(moderation:

)

Quote:

If I "build" the task from scratch I have no need to process it, I know what is in the task because I am the one that created the signals / data that is in the task. In a simple example I create a task on SaH and put into that task exactly one pulse ... well, it should be pretty easy to find that pulse ...

My objection to the use, as current standard, of real world signals is that with all the noise in the sample you really don't know what is in there. We run them and say that we know what is in there ... but, it is not something that you can eyeball to validate ... with generated and "clean" test signals you can actually look at the data in the file and can see the test signals.

After we have validated the system, and note that this is not only a test of the computer hardware, it is also a test of the running software, we can then add noise to the test samples and rerun the tests ...

It seems that when I looked at the follow-up report for one of the previous E@H runs, I saw that there actually have been workunits that have contained injected test signals. Is that still the case?

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

It seems to me that today

6 Feb 2009 18:46:59 UTC

Message 89998

(moderation:

)

It seems to me that today computers are not confined to datacenters and managed by professionals in white coats. Any cell phone, any GPS unit, any car, any boat, any airplane contains one or more computers even if the user is not a data professional. And yet we telephone, drive a car, travel in a plane without even thinking about possible errors or faults. We assume that everything works and is "good enough", as they say. I am running seven projects on my Opteron 1210 24/7 with Linux OS since January 2008 24/7 and never had a compute error on any project. Quorum is 1 on climateprediction.net and CPDN Beta, AQUA. QMC, 2 on SETI and Einstein, 3 on LHC but this last is almost nonexistent. So my CPU must be "good enough" and that is enough for me.
Tullio

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2958836198

RAC: 714123

RE: It seems to me that

6 Feb 2009 19:21:25 UTC

Message 89999 in response to message 89998

(moderation:

)

Quote:

It seems to me that today computers are not confined to datacenters and managed by professionals in white coats. Any cell phone, any GPS unit, any car, any boat, any airplane contains one or more computers even if the user is not a data professional. And yet we telephone, drive a car, travel in a plane without even thinking about possible errors or faults.

The trouble is, as BOINCers know only too well, that computers are not infallible.

A case in point: the ABS computer in my car has developed a fault. The garage assures me that it's safe to drive, because the computer has built-in self-monitoring, detects the fault, and switches itself off so the car drives like an old model without ABS fitted. I know how to cope with that, even with snow on the road.

The trouble comes when I reboot the computer (turn the ignition key). For the first couple of miles, the self-monitoring doesn't notice the fault, and the ABS tries to 'help' me slow down - actually quite alarming the first time it did it. So now I have to drive extra-carefully to start with, until the fault monitoring kicks in, and only then can I drive as normal.

BOINC, just like the real world, needs that fault monitoring to work all the time.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 729203928

RAC: 1196987

RE: It seems that when I

6 Feb 2009 19:22:59 UTC

Message 90000 in response to message 89997

(moderation:

)

Quote:

It seems that when I looked at the follow-up report for one of the previous E@H runs, I saw that there actually have been workunits that have contained injected test signals. Is that still the case?

The hardware injected Pulsars? Yes, these injections of simulated pulsar signals happen independently of Einstein@Home, right on-site at the LIGO interferometers.

CU
Bikeman

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

RE: RE: It seems that

6 Feb 2009 19:31:07 UTC

Message 90001 in response to message 90000

(moderation:

)

Quote:

Quote:

It seems that when I looked at the follow-up report for one of the previous E@H runs, I saw that there actually have been workunits that have contained injected test signals. Is that still the case?

The hardware injected Pulsars? Yes, these injections of simulated pulsar signals happen independently of Einstein@Home, right on-site at the LIGO interferometers.

CU
Bikeman

This is mentioned also in the article published in Physical Review D. See the Einstein@home web page for instructions in downloading it.
Tullio

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

RE: The trouble is, as

6 Feb 2009 19:39:16 UTC

Message 90002 in response to message 89999

(moderation:

)

Quote:

The trouble is, as BOINCers know only too well, that computers are not infallible.

A case in point: the ABS computer in my car has developed a fault. The garage assures me that it's safe to drive, because the computer has built-in self-monitoring, detects the fault, and switches itself off so the car drives like an old model without ABS fitted. I know how to cope with that, even with snow on the road.

The trouble comes when I reboot the computer (turn the ignition key). For the first couple of miles, the self-monitoring doesn't notice the fault, and the ABS tries to 'help' me slow down - actually quite alarming the first time it did it. So now I have to drive extra-carefully to start with, until the fault monitoring kicks in, and only then can I drive as normal.

BOINC, just like the real world, needs that fault monitoring to work all the time.

I do not have ABS in my car, it is an old model. But several aircraft accidents occurred on modern planes trying to land on icy runaways. The ABS would not let the pilot brake the wheels. I read it on IEEE Spectrum, in an article dedicated to computers on aircrafts.

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 317951182

RAC: 390674

RE: Is the risk that the

6 Feb 2009 21:45:09 UTC

Message 90003 in response to message 89994

(moderation:

)

Quote:

Is the risk that the BOINC hardware is less reliable than a large in-house super-computer / cluster a significant risk?

Is the risk that BOINCers deliberately return incorrect results more often than another researcher / institution fakes their results a significant risk?

I understand false results from BOINC could lead to interesting WUs being ignored by a project. But surely, prior to publishing, every project either checks the interesting WUs on their own hardware, or ignores oddball outliers?

And as for "check WUs":

How do you know that the in-house hardware used to compare the "check WU" is accurate?

How do you ensure that the "check WU" will quickly find a hardware issue given that the issue might not appear until the hardware is stressed (run 24x7, or overheats)?

You're right, these issues can arise regardless. But I think the management of these risks is quite different in-house vs. distributed computing. Because 'open invitation distributed computing' is pretty new then there is bound to be some inertia/reluctance until it is 'proven' as an avenue of research. This is as much to do with the technical aspects as the prevailing culture of science researchers - specifically what they are actually prepared to accept as valid, rather than what we think they might be thinking! :-)

There was considerable ruckus in the mathematical community when some began using computers ( because of their sheer raw crunching power ) to plow through large sets to examine/define/display various math ideas. The basic tension was over 'correctness', validity and methods of proof etc ... much like here. The Four Colour Map Theorem proof was a bit of a watershed, and the use of digital orreries to test the old saw of solar system stability. Let us not forget the discomforts occurring with Akos Fekete's really neat optimisations of the E@H binaries - while brilliant, they had to be 'internalised' by LIGO.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 317951182

RAC: 390674

RE: I do not have ABS in my

6 Feb 2009 22:01:26 UTC

Message 90004 in response to message 90002

(moderation:

)

Quote:

I do not have ABS in my car, it is an old model. But several aircraft accidents occurred on modern planes trying to land on icy runaways. The ABS would not let the pilot brake the wheels. I read it on IEEE Spectrum, in an article dedicated to computers on aircrafts.

Indeed. There have been concerns raised about some fly-by-wire systems denying pilot control inputs on the basis of algorithms that predict unsafe flight configurations should such inputs persist. However as such algorithms cannot appreciate/predict the circumstances which may elicit said pilot inputs, then you have a problem.

I may jerk the wheel on my car in a split second to veer while avoiding an unexpected errant child, even though that movement if persisting would take me some really unhappy places. So if I'm in the rear of the plane looking right at a mountainside I'd like the pilot to be untethered by technology in his control options.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

The problem with injected

7 Feb 2009 6:44:27 UTC

Message 90005

(moderation:

)

The problem with injected test signals is they tend to be singular. And yes I was aware of that test signal injection. Just as I am aware that there are situations where the black box tests fine in the aircraft, bad at the black box level at the next level of maintenance and the cards detected as faulty test good at the circuit card level. I was a member of a group that ran these tests at the three levels of Navy maintenance. The problem is that the testing methods used in each case are different ... and so detect different failure modes.

With single test injection you may be able to detect that one signal, correctly, but what about others.

As to the point about resistance to the new methods ... well, all the more reason to increase the level of rigor to prevent naysayers like me, and those other staid scientists holding up their noses at this new fangled way, from holding it back. And the reason I have been bringing this up is that I am well aware of the resistance to the new way problem ... which, as I said, is all the more reason to be more rigorous ...

The fact that you have not had a compute failure on CPDN is truly remarkable ... I have not had that many models run flawlessly over the long haul ... I get streaks ... but, the CPDN models are known to be unstable and that computer errors and crashes are part of the territory.

But, the problems I talk to are the more insidious ones where you have a "silent failure" which is not an obvious death of the model / task, but the one where all looks good. And, as I have pointed out we have the classic problem of the F-Div bug in a whole line of CPUs which means that "poisoned" calculations can occur ... but if the CPU is in wide use then all the outputs will agree, though they are wrong.

There are other long standing bugs that have shown up in our computers, one was the calculator that effectively could not subtract two numbers correctly giving rise to the situation were 1 - 1 was not zero (I forget the exact error values).

Oh, and in many fly by wire aircraft, if you unteather the pilot he can initiate a movement of the aircraft that will cause it to come apart in flight. So, the plane will miss the mountain, but the parts, and you, will not ... because the plane will come apart in flight ... raining debris all over that mountainside you wanted to miss ...

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

Now that you remind me I had

7 Feb 2009 8:52:23 UTC

Message 90006

(moderation:

)

Now that you remind me I had one CPDN Beta crash, but it happened after I had brought back my system time by about 5 minutes because my NTP server in Germany was not working. I then hitched to a NIST server in the USA and it has been working all right. I loaded another CPDN Beta model and it has been working since then (that was months ago).
Tullio

What's up with LHC at Home?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner