I understand your confusion and I'm equally confused by all of this. Let me explain it another way.
In a previous run we recognized that the same task had different runtimes when run on specific CPUs. We tried to find out what property of the host or CPU was responsible. Through a runtime analysis I was able to divide the host population in two parts. Hosts that show a lower runtime and hosts that show a higher runtime for the same type of tasks. The difference is roughly a factor of 2 which makes it significant enough to break the runtime estimation of BOINC. We tried to find an explanation for this but so far we had no luck.
What we did was to adjust the amount of work that is in each task to adjust the actual runtime. Each task is designed to run for 8-10 hours on a recent CPU. We test this on our internal nodes which fall into the fast host category. We then divide the run into 2 subruns. Previously this was done by target, this time we have only one target so we divided by search frequency. The work that is put into tasks for the "slow" hosts (2 times the runtime of the fast hosts) is just cut in half (in respect to other work) to achieve the same expected runtime per task of 8-10 hours on those hosts so we have the same expected runtime on all hosts.
The difference between the tasks of the Lo and Hi applications is just the length of the tasks not how much computational power they demand.
Thanks for this detailed explanation. But I think you not fully understand my concerns. So I'll try to explain it in a simpler way. There are two computers:
- server based on Intel Xeon L5520 processors, lets call it A (for simplicity),
- laptop with Intel i7 Q 840 processor, let call it B,
A is getting Lo task witch is ok as it takes about 8h 20m to compute. So it's in perfect match with expected runtime you wrote about. So A is obviously the "slow" class host. Now A is faster then B, so B should also be in "slow" class. But for some reason B is getting Hi tasks and it takes about 16h to finish even doing only one at a time. Now B is processing four Hi tasks parallel and estimated time is about 24h.
So to summarize, there are two hosts A and B. A is faster than B and is in "slow" class but B which is slower than A is in "fast" class.
so to summarize, there are two hosts A and B. A is faster than B and is in "slow" class but B which is slower than A is in "fast" class.
Could we be hypothetical and say B magically gets a LowFreq task. Note i say LowFreq - this has nothing to with CPU clock speeds, or sound waves elephants might make either :) it just one of the other tasks.
I have listed some possible answers how long it might take
Maybe, once again, I'm not understanding the point :-).
I would have thought you can't give an answer without specifying the number of tasks running in parallel since from the information previously given, the crunch time varies from 16 to 24hrs depending on either 1 or 4 tasks running concurrently - assuming that what was called an estimate turns out to be the real crunch time. There's not much point in quoting an estimate if it's not a reasonable approximation of reality, you would think.
Or am I missing the point of your little 'QI style' test :-). Is that a klaxon I hear?? :-).
And why give two separate options just 1 minute apart? I sure don't understand the reason for that :-).
So to summarize, there are two hosts A and B. A is faster than B and is in "slow" class but B which is slower than A is in "fast" class.
Your assumption that just because the physical property to have a higher CPU frequency means that A is faster than B is correct in general if you look at benchmarks and simple applications. But our gravitational wave search is no simple application and as I wrote earlier we also don't understand what is the reason for the different runtimes. You would for example expect that some AMD CPUs would be in the "fast" host category and I would have expected the Ryzen to be but that is not the case. There are as you would call it slow CPUs that outperform faster CPUs when running our gravitational wave search application. Since we need more than just simple CPU cycles we also transfer data to and from the CPU so the actual runtime per task you see in the end is influenced by a myriad of properties of your host.
The i7 family of processors in general falls into the "fast hosts" category but this is not a black and white decision. There are some hosts or subtypes of CPUs that behave different than the others in respect to runtime per task. So it could be that all the i7 CPUs on EaH show an average runtime of 8h for Hi tasks that your i7 CPU shows a runtime of 16h per task. We do not distinguish between subtypes (like different generations) of CPU families.
And why give two separate options just 1 minute apart? I sure don't understand the reason for that :-).
We have no quantitative data to compare for the case in question (we probably will not get any) and the words fast / slow are being used, mixed with other things to muddy the water.
There is a problem with perception and understanding of what the OP is trying to say, i have read the thread and not understood some things. The "some" options were meant to provide some insight into the "speed" words being used.
If A takes 16 hours and B takes 15 hours 59 minutes B is faster than A, yes?
I'm interested in knowing what the OP thinks this specifically means in this case. Then perhaps we can turn it into numbers which we can relate to.
I was just curios why computer with greater performance in processing GW Search application, is not getting Galactic Center highFreq tasks, while in the same time the other one, which have lower performance in processing GW Search application, is getting those Galactic Center highFreq tasks. In previous post in this thread, I gave numbers measured with perf and Intel PCM during execution of the same (copied between hosts) Galactic Center highFreq task on both of this computers. So I didn't make assumption but measurement. And when I was talking about clock speed of this two platforms, I didn't compare apples to oranges, like for example AMD to Intel or i7 based on Haswell to i7 based on Sandy Bridge microarchitecture. I was comparing apples to apples, as both those CPUs are based on Nehalem microarchitecture. And that's why it's is not clear to me why GW tasks are distributed that way.
After response from Christian
Quote:
The i7 family of processors in general falls into the "fast hosts" category...
I think, but this is only my guess, that decision is made based on processor name string returned by BOINC client, rather than on Vendor name and Family, Model and Stepping numbers combinations, what I assumed it is. Maybe Christianin will find some time to give some more details about this.
@Harri Liljeroos
Harri, as you probably already know BOINC benchmarks are very poor measurements of host performance. They are made occasionally if they are not disabled with <skip_cpu_benchmarks> in very uncontrolled state of host and BOINC client configuration, and they indicate only theoretical capabilities of execution units in separation from other parts of system.
I understand your confusion
)
I understand your confusion and I'm equally confused by all of this. Let me explain it another way.
In a previous run we recognized that the same task had different runtimes when run on specific CPUs. We tried to find out what property of the host or CPU was responsible. Through a runtime analysis I was able to divide the host population in two parts. Hosts that show a lower runtime and hosts that show a higher runtime for the same type of tasks. The difference is roughly a factor of 2 which makes it significant enough to break the runtime estimation of BOINC. We tried to find an explanation for this but so far we had no luck.
What we did was to adjust the amount of work that is in each task to adjust the actual runtime. Each task is designed to run for 8-10 hours on a recent CPU. We test this on our internal nodes which fall into the fast host category. We then divide the run into 2 subruns. Previously this was done by target, this time we have only one target so we divided by search frequency. The work that is put into tasks for the "slow" hosts (2 times the runtime of the fast hosts) is just cut in half (in respect to other work) to achieve the same expected runtime per task of 8-10 hours on those hosts so we have the same expected runtime on all hosts.
The difference between the tasks of the Lo and Hi applications is just the length of the tasks not how much computational power they demand.
Thanks for this detailed
)
Thanks for this detailed explanation. But I think you not fully understand my concerns. So I'll try to explain it in a simpler way. There are two computers:
- server based on Intel Xeon L5520 processors, lets call it A (for simplicity),
- laptop with Intel i7 Q 840 processor, let call it B,
A is getting Lo task witch is ok as it takes about 8h 20m to compute. So it's in perfect match with expected runtime you wrote about. So A is obviously the "slow" class host. Now A is faster then B, so B should also be in "slow" class. But for some reason B is getting Hi tasks and it takes about 16h to finish even doing only one at a time. Now B is processing four Hi tasks parallel and estimated time is about 24h.
So to summarize, there are two hosts A and B. A is faster than B and is in "slow" class but B which is slower than A is in "fast" class.
Sebastian M. Bobrecki
)
Could we be hypothetical and say B magically gets a LowFreq task. Note i say LowFreq - this has nothing to with CPU clock speeds, or sound waves elephants might make either :) it just one of the other tasks.
I have listed some possible answers how long it might take
a) 18 hours.
b) 16 hours.
c) 15 hours and 59 minutes
d) 14 hours
e) 8 hours 20 minutes
How do you decide?
AgentB wrote:Could we be
)
Maybe, once again, I'm not understanding the point :-).
I would have thought you can't give an answer without specifying the number of tasks running in parallel since from the information previously given, the crunch time varies from 16 to 24hrs depending on either 1 or 4 tasks running concurrently - assuming that what was called an estimate turns out to be the real crunch time. There's not much point in quoting an estimate if it's not a reasonable approximation of reality, you would think.
Or am I missing the point of your little 'QI style' test :-). Is that a klaxon I hear?? :-).
And why give two separate options just 1 minute apart? I sure don't understand the reason for that :-).
Cheers,
Gary.
If you compare your Boinc
)
If you compare your Boinc benchmarks, your i7 is faster than any of your three L5520 Xeons.
Sebastian M. Bobrecki
)
Your assumption that just because the physical property to have a higher CPU frequency means that A is faster than B is correct in general if you look at benchmarks and simple applications. But our gravitational wave search is no simple application and as I wrote earlier we also don't understand what is the reason for the different runtimes. You would for example expect that some AMD CPUs would be in the "fast" host category and I would have expected the Ryzen to be but that is not the case. There are as you would call it slow CPUs that outperform faster CPUs when running our gravitational wave search application. Since we need more than just simple CPU cycles we also transfer data to and from the CPU so the actual runtime per task you see in the end is influenced by a myriad of properties of your host.
The i7 family of processors in general falls into the "fast hosts" category but this is not a black and white decision. There are some hosts or subtypes of CPUs that behave different than the others in respect to runtime per task. So it could be that all the i7 CPUs on EaH show an average runtime of 8h for Hi tasks that your i7 CPU shows a runtime of 16h per task. We do not distinguish between subtypes (like different generations) of CPU families.
Gary Roberts wrote:And why
)
We have no quantitative data to compare for the case in question (we probably will not get any) and the words fast / slow are being used, mixed with other things to muddy the water.
There is a problem with perception and understanding of what the OP is trying to say, i have read the thread and not understood some things. The "some" options were meant to provide some insight into the "speed" words being used.
If A takes 16 hours and B takes 15 hours 59 minutes B is faster than A, yes?
I'm interested in knowing what the OP thinks this specifically means in this case. Then perhaps we can turn it into numbers which we can relate to.
OK, I see. I was
)
OK, I see. I was misunderstanding the point of the quiz. Thanks for clearing that up.
I was also misunderstanding what Sebastien was trying to say but the latest exchanges with Christian seem to have cleared that up.
Cheers,
Gary.
Gary Roberts wrote:I was also
)
Hope so, it would be good to hear if it has from Sebastian.
It is confusing when fast / slow / high / low and frequency are used in several contexts.
I was just curios why
)
I was just curios why computer with greater performance in processing GW Search application, is not getting Galactic Center highFreq tasks, while in the same time the other one, which have lower performance in processing GW Search application, is getting those Galactic Center highFreq tasks. In previous post in this thread, I gave numbers measured with perf and Intel PCM during execution of the same (copied between hosts) Galactic Center highFreq task on both of this computers. So I didn't make assumption but measurement. And when I was talking about clock speed of this two platforms, I didn't compare apples to oranges, like for example AMD to Intel or i7 based on Haswell to i7 based on Sandy Bridge microarchitecture. I was comparing apples to apples, as both those CPUs are based on Nehalem microarchitecture. And that's why it's is not clear to me why GW tasks are distributed that way.
After response from Christian
I think, but this is only my guess, that decision is made based on processor name string returned by BOINC client, rather than on Vendor name and Family, Model and Stepping numbers combinations, what I assumed it is. Maybe Christianin will find some time to give some more details about this.
@Harri Liljeroos
Harri, as you probably already know BOINC benchmarks are very poor measurements of host performance. They are made occasionally if they are not disabled with <skip_cpu_benchmarks> in very uncontrolled state of host and BOINC client configuration, and they indicate only theoretical capabilities of execution units in separation from other parts of system.