Sort of...you said they were the same, unless they are different.....I was being nit-picky.
I'm having a problem understanding how they want to have same OS and same CPU and then don't care how far off the results are.
Depending on the project, you have it exactly right. For SETI@Home, the actual, or exact, power and frequency of the result value in the result information is not that critical. So, the more important aspect is that we have a correct count, that should be pretty much the same for all returns. But if one says the pulse is 4.3 inches high and the other says 4.2 and a third says 4.1 ... well, who cares ....
If, I say there are 20 pulses and you say 10, well, we are not in the same bal park and even SETI@Home will barf on this ...
In the case of, say, LHC@Home, and I suspect for Einstein@Home, the tolerances are much tighter. So, it is hard to understand why iterative processes have troubles, but it is fairly simple (and I am suprised you did not take me to task for not having this example in the Wiki ...
If, we have a operation that returns a minor error, of say 0.0000000000001 ... we can ignore it right? Maybe, maybe not ...
If we have a loop
[pre]
for i := 1 to 10,000 do {
x := (x * y) + 0.0000000000001
}
[/pre]
No problem ... as yet ... but ...
[pre]
for i := 1 to 1,000,000,000,000 do {
x := (x * y) + 0.0000000000001
}
[/pre]
Wrong...
Worse, if I am doing that and you are doing:
for i := 1 to 1,000,000,000,000 do {
x := (x * y) - 0.0000000000001
}
Well, we both may as well have stayed in bed.
I guess you found out I don't take Wiki to the can for reading.....
I say prove the formula first, seems others are saying just optimize it and find the differences sooner.........to me that's Development and Testing.........not Production.
I noticed somewhere you said you were a SCPO. Just for curiosity what was your field?
Sort of...you said they were the same, unless they are different.....I was being nit-picky.
I'm having a problem understanding how they want to have same OS and same CPU and then don't care how far off the results are.
Depending on the project, you have it exactly right. For SETI@Home, the actual, or exact, power and frequency of the result value in the result information is not that critical. So, the more important aspect is that we have a correct count, that should be pretty much the same for all returns. But if one says the pulse is 4.3 inches high and the other says 4.2 and a third says 4.1 ... well, who cares ....
Which projects decide who gets WUs based on OS and CPU?
I give me something to live for ... as I was gradually failing I found an employer that would take the hours I would give and be content. He even had set-up a special medical plan for JUST me, so I could see certain doctors ... but when I had to give up driving ... well ... I had to give that up too... :(
But, I have the time and the talent ... so ... here I am, writing and trying to teach what little I know about BOINC ...
In your opinion is it a good or bad idea to have optimised clients?
Good ... however, ...
Which is more like *HOWEVER COMMA (hear the rolling tones as a voice from heaven ...) ...
If the accuracy of the application suffers, which it can, then, there is no point.
Lets see if I can get this across ... computer programs are a tension between creativeness and codeability ... in general you want to make the programs as "small" as possible with regard to repeated code segements. This means we should write as many functions as possible with each function doing exactly one thing and one thing only ... an "ideal" function in this model would be one line of code long ...
On the other hand, having 3,000,000 one line functions make the system nearly impossible to effectively use these functions (one of the problems with Microsoft's DOT NET library/framework (what ever they are calling it this month). So, programmers tend to write longer modules that really do more than one thing at a time.
Got it?
Ok, optimizations come in "flavors" ... the optimization that pays off the most is one where you select the most efficient algorithm possible. For example, there are about 50 or so major sort type algorithms. Some are good with data that is almost already sorted, some are more efficient if the data is random, etc. So, thats one ...
Secondly, we have "peephole" optimizations... where the compiler tries to find the "best" and most efficient instructions to accomplish the behavior necessary ... this is like using MMX or SSE instructions when those instructions gernerate faster execution ... One of the points of the RISC was to have all instructions execute in one clock cycle so they were all equally efficient and to have as few unique instructions as possible. The bad news is that I have never seen a truly RISC machine ... way to easy to make the Instruction Set Architecture (ISA) more complex (we emit 8086 code that ALL modern CPUs translate into RISC instructions used internally - that is why the cache on the Intel processors is named the way it is, the cache holds "uOPs" - but I digress ...).
Ok, then we have Global Optimization ... this is where the compiler and code generator look at using the CPU's internals in the most efficient manner across the entire program ...
Still with me?
Ok, what is wrong with this picture? Well, if we have a lot of functions, we spend a lot of time doing what we call a "context switch", we save where we are, the internal states, then jump to the subroutine, initialize THIS context, so what we need, jump back, dump the function's context and restore the calling function's context ... if we jump/call into the OS, especially the Kernal, we have a hugely expensive context switch. So, none of this is easy, and this is vastly simplified ...
But, a seemingly "identical" set of operations, even when mathematically equivelent WILL return different answers. There is material in the Wiki in the area of Floating Point where I explain some of this in more detail and give an example ...
Ok, coming up levels, if you run the program on your AMD, optimized for AMD, and I run it on my Intel ... optimized for Intel ... we are not going to see matching results. Now, we have to talk about what we mean by identical results (I have a verey high level example in validation process) ... in SETI@Home if we have about the same stuff about the same height it is good enough ... like you hold up your thumb to measure something... But in LHC@Home, the answers have to be virtually bit for bit identical ... much harder to accomplish...
So, we also skipped over the optimize for speed (what you are thinking of almost all times) we also have optimization for space ... as an example, if I need to get a very fast program written in a higher level language that also has great size characteristics, Forth is the language of choice because it does both pretty well ... and no... you would have to be real convincing to talk me into using C or any of its deritives (heck we could eliminate an entire class of exploits just by using a real computer language). ...
SO, there are other types of optimizations we can think of...like maximizing the use of the resources in the processor ... HT, good example, the long pipeline have a drawback of being very expensive if you have a "stall" ... so, they change hats and do something else while getting that straightened out ... virtue out of adversity ... :)
Itanium was another attempt in that direction and only time will tell if we will get there or not ... there is another interesting technology from HP called "Dynamo" ... it is a program, run on a computer that emulates in software another CPU. In the case of the paper I think they emulated something called a 1000 series. And the simulator simulated the CPU of the computer it was running on (really it makes no difference what computer you emulate or what you run the emulation on). And the study found that contrary to expectations, the software simulation even with the software overhead could run programs faster than the native machine running the program directly ...
Unexpected result is an understatement ... but the efficieny was that the programs they were running were streamlined during execution ... SO, instead of the cost of a context switch, they would effectively inline code from the function call and drop the unneeded context switch ... why THIS is noteworthy is that right now, if you run an optimized client program, ALL CALLS TO THE OS ARE NOT OPTIMIZED ... so, not only do you have a very expensive context switch, but the code you run is not optimized... and any history the processor learns, is tossed as soon as we stop running that small segment of code.
What if we could run a program, the CPU monitors what is going on and begins to "learn" and then to save that learning? Oh, my ...
Ok, we covered a lot of ground and I know people are going to chime in on things that I glossed over or drastically simplified ... yes, this is simplified ... :)
Ask questions ... glad to answer ... And even if you did it to pull me up some ... well ...thanks ... :)
>Good ... however, ...
>
>Which is more like *HOWEVER COMMA (hear the rolling tones as a >voice from heaven ...) ...
>
>etc...
>
Wow Paul - I haven't had that kind of in-depth talk laid on me since I was in my computer architecture class some time ago. The inner-computer-nerd in me is basking in the moment of getting to parse through some low-level hardware talk. That was a nice little trip down an avenue I haven't traversed for quite some time now...
I taught computer architecture a couple times. Though the students were usually puzzled about the way I approached it. The first month I spent on history of computers. One of the things that I have always found funny is that for every new generation we usually make the very same mistakes. The most obvious is the fact that there is usually an implict assumption in each new generation that memory space only needs to be "n", with "n" being a small number.
The last one is for the PDA and phones ... Yet we made this mistake in each new class of computing machines. It is like 640K is plenty and no one will need more... then sure enough, we need more, but the architecture needs a Kludge to make it come out right ...
First off: Paul is obviously smarter than most of us at computer technology so I can hardly offer any serious answer to him. But I would like to add my two cents to the idea of not only Einstein letting Mr. T.M. Rai have a shot at their source code but recommend any one running SETI alongside Einstein consider using his optimized code for the SETI project.
Before running the opt. code I was getting a little over 50 credits a day (on an average day when I was running both projects most of the day...). Since I started using the opt code for SETI I am now averaging well over a hundred credits a day for both SETI and the unoptimized Einstein code. I expected this from the new SETI code but I was surprised that Einstein also improved since I forget how the BOINC program attempts to balance competing applications. I noticed that, for instance right now, I have 2 Einstein WU's in memory suspended (one at 79% the other 59% done...) and two SETI WU's are happily crunching away. Other times it will let both Einstein WU's hog the resources.
My feeling is the scientists who run the projects must know if the results they're getting back are valid so I would call the optimized code a great sucess. And I'm only one machine - imagine if all the machine were running optimized code! Whew, I gotta settle down: thats the old DP tech in me trying to get back out.
RE: RE: Sort of...you
)
I guess you found out I don't take Wiki to the can for reading.....
I say prove the formula first, seems others are saying just optimize it and find the differences sooner.........to me that's Development and Testing.........not Production.
I noticed somewhere you said you were a SCPO. Just for curiosity what was your field?
RE: RE: Sort of...you
)
Which projects decide who gets WUs based on OS and CPU?
I see Predictor does.....which others?
RE: I see Predictor
)
LHC@Home and Einstein@Home I though were also ... but I cannot find an example...
All I can find are Predictor@Home and The Lattice Project...
So, what do I know ...
RE: ......So, what do I
)
More than most, My Son! Thanks for all your hard work. It's appreciated.
You are most welcome. I
)
You are most welcome.
I give me something to live for ... as I was gradually failing I found an employer that would take the hours I would give and be content. He even had set-up a special medical plan for JUST me, so I could see certain doctors ... but when I had to give up driving ... well ... I had to give that up too... :(
But, I have the time and the talent ... so ... here I am, writing and trying to teach what little I know about BOINC ...
Paul, In your opinion is
)
Paul,
In your opinion is it a good or bad idea to have optimised clients?
RE: Paul, In your opinion
)
Good ... however, ...
Which is more like *HOWEVER COMMA (hear the rolling tones as a voice from heaven ...) ...
If the accuracy of the application suffers, which it can, then, there is no point.
Lets see if I can get this across ... computer programs are a tension between creativeness and codeability ... in general you want to make the programs as "small" as possible with regard to repeated code segements. This means we should write as many functions as possible with each function doing exactly one thing and one thing only ... an "ideal" function in this model would be one line of code long ...
On the other hand, having 3,000,000 one line functions make the system nearly impossible to effectively use these functions (one of the problems with Microsoft's DOT NET library/framework (what ever they are calling it this month). So, programmers tend to write longer modules that really do more than one thing at a time.
Got it?
Ok, optimizations come in "flavors" ... the optimization that pays off the most is one where you select the most efficient algorithm possible. For example, there are about 50 or so major sort type algorithms. Some are good with data that is almost already sorted, some are more efficient if the data is random, etc. So, thats one ...
Secondly, we have "peephole" optimizations... where the compiler tries to find the "best" and most efficient instructions to accomplish the behavior necessary ... this is like using MMX or SSE instructions when those instructions gernerate faster execution ... One of the points of the RISC was to have all instructions execute in one clock cycle so they were all equally efficient and to have as few unique instructions as possible. The bad news is that I have never seen a truly RISC machine ... way to easy to make the Instruction Set Architecture (ISA) more complex (we emit 8086 code that ALL modern CPUs translate into RISC instructions used internally - that is why the cache on the Intel processors is named the way it is, the cache holds "uOPs" - but I digress ...).
Ok, then we have Global Optimization ... this is where the compiler and code generator look at using the CPU's internals in the most efficient manner across the entire program ...
Still with me?
Ok, what is wrong with this picture? Well, if we have a lot of functions, we spend a lot of time doing what we call a "context switch", we save where we are, the internal states, then jump to the subroutine, initialize THIS context, so what we need, jump back, dump the function's context and restore the calling function's context ... if we jump/call into the OS, especially the Kernal, we have a hugely expensive context switch. So, none of this is easy, and this is vastly simplified ...
But, a seemingly "identical" set of operations, even when mathematically equivelent WILL return different answers. There is material in the Wiki in the area of Floating Point where I explain some of this in more detail and give an example ...
Ok, coming up levels, if you run the program on your AMD, optimized for AMD, and I run it on my Intel ... optimized for Intel ... we are not going to see matching results. Now, we have to talk about what we mean by identical results (I have a verey high level example in validation process) ... in SETI@Home if we have about the same stuff about the same height it is good enough ... like you hold up your thumb to measure something... But in LHC@Home, the answers have to be virtually bit for bit identical ... much harder to accomplish...
So, we also skipped over the optimize for speed (what you are thinking of almost all times) we also have optimization for space ... as an example, if I need to get a very fast program written in a higher level language that also has great size characteristics, Forth is the language of choice because it does both pretty well ... and no... you would have to be real convincing to talk me into using C or any of its deritives (heck we could eliminate an entire class of exploits just by using a real computer language). ...
SO, there are other types of optimizations we can think of...like maximizing the use of the resources in the processor ... HT, good example, the long pipeline have a drawback of being very expensive if you have a "stall" ... so, they change hats and do something else while getting that straightened out ... virtue out of adversity ... :)
Itanium was another attempt in that direction and only time will tell if we will get there or not ... there is another interesting technology from HP called "Dynamo" ... it is a program, run on a computer that emulates in software another CPU. In the case of the paper I think they emulated something called a 1000 series. And the simulator simulated the CPU of the computer it was running on (really it makes no difference what computer you emulate or what you run the emulation on). And the study found that contrary to expectations, the software simulation even with the software overhead could run programs faster than the native machine running the program directly ...
Unexpected result is an understatement ... but the efficieny was that the programs they were running were streamlined during execution ... SO, instead of the cost of a context switch, they would effectively inline code from the function call and drop the unneeded context switch ... why THIS is noteworthy is that right now, if you run an optimized client program, ALL CALLS TO THE OS ARE NOT OPTIMIZED ... so, not only do you have a very expensive context switch, but the code you run is not optimized... and any history the processor learns, is tossed as soon as we stop running that small segment of code.
What if we could run a program, the CPU monitors what is going on and begins to "learn" and then to save that learning? Oh, my ...
Ok, we covered a lot of ground and I know people are going to chime in on things that I glossed over or drastically simplified ... yes, this is simplified ... :)
Ask questions ... glad to answer ... And even if you did it to pull me up some ... well ...thanks ... :)
Paul D. Buck said: >Good
)
Paul D. Buck said:
>Good ... however, ...
>
>Which is more like *HOWEVER COMMA (hear the rolling tones as a >voice from heaven ...) ...
>
>etc...
>
Wow Paul - I haven't had that kind of in-depth talk laid on me since I was in my computer architecture class some time ago. The inner-computer-nerd in me is basking in the moment of getting to parse through some low-level hardware talk. That was a nice little trip down an avenue I haven't traversed for quite some time now...
Regards,
Clint
www.clintcollins.org - spouting off at the speed of site
Yeah, it is the "teacher" in
)
Yeah, it is the "teacher" in me ... :)
I taught computer architecture a couple times. Though the students were usually puzzled about the way I approached it. The first month I spent on history of computers. One of the things that I have always found funny is that for every new generation we usually make the very same mistakes. The most obvious is the fact that there is usually an implict assumption in each new generation that memory space only needs to be "n", with "n" being a small number.
The last one is for the PDA and phones ... Yet we made this mistake in each new class of computing machines. It is like 640K is plenty and no one will need more... then sure enough, we need more, but the architecture needs a Kludge to make it come out right ...
First off: Paul is obviously
)
First off: Paul is obviously smarter than most of us at computer technology so I can hardly offer any serious answer to him. But I would like to add my two cents to the idea of not only Einstein letting Mr. T.M. Rai have a shot at their source code but recommend any one running SETI alongside Einstein consider using his optimized code for the SETI project.
Before running the opt. code I was getting a little over 50 credits a day (on an average day when I was running both projects most of the day...). Since I started using the opt code for SETI I am now averaging well over a hundred credits a day for both SETI and the unoptimized Einstein code. I expected this from the new SETI code but I was surprised that Einstein also improved since I forget how the BOINC program attempts to balance competing applications. I noticed that, for instance right now, I have 2 Einstein WU's in memory suspended (one at 79% the other 59% done...) and two SETI WU's are happily crunching away. Other times it will let both Einstein WU's hog the resources.
My feeling is the scientists who run the projects must know if the results they're getting back are valid so I would call the optimized code a great sucess. And I'm only one machine - imagine if all the machine were running optimized code! Whew, I gotta settle down: thats the old DP tech in me trying to get back out.