HostID 1001562 - Richard Haselgrove's Q6600 Quad Core

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7309801689
RAC: 2311365

RE: Sounds like you've

Message 78837 in response to message 78835

Quote:
Sounds like you've somehow unchecked the 'Menu Bar' option. Right-click on one of the tool bars and check it back on. Should be able to get "help about" then.

Ah, thank you.

The answer is 7.0.5730.11

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7309801689
RAC: 2311365

RE: I'll continue to trawl

Message 78838 in response to message 78836

Quote:


I'll continue to trawl through the identified hosts looking for examples of starting just below a boundary and then traversing it.


Thanks for the stdoutdae.txt tip.

I've found a case last October where a host got the 270 skygrid, then started at the h1 and l1 files for 269.45. Over the next four weeks it work its way up .05 at a time reaching 270.05. However it never started a task from a frequency higher than 269.85 before moving on to another skygrid altogether. Assuming the scheduler knows what it is doing, this at least suggests that 270 results could need the 270.05 h,l files in some case.

Another case on the same host may be more relevant:

It downloaded skygrid 610, without, so far as my records go, ever having obtained skygrid 600.

Then it got an initial batch of h,l files starting from 600.00 up through 600.25, and its first actual result start was announced by this line:

"Starting task h1_0600.00_S5R2__224_S5R3a_1 using einstein_S5R3 version 415"

Combining a few different bits of indirect evidence, I surmise:

A skygrid of 610 works for results from 600.00 through 600.95.

Computing a result requires the h,l files for that frequency plus (optionally?) the h,l files for (one or possibly more?) higher .05 step.

If I got this right, then for the way I defined it the correct value of "about1" is 0.00 and it can be dropped from the ceiling function argument.

Other evidence or reasoning supporting or refuting this interpretation I would welcome strongly.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118652180529
RAC: 18889712

RE: RE: I'll continue to

Message 78839 in response to message 78838

Quote:
Quote:


I'll continue to trawl through the identified hosts looking for examples of starting just below a boundary and then traversing it.


Thanks for the stdoutdae.txt tip.

You're welcome! I've found an example of traversing the boundary from my own hosts as well. Here is the log snippet

Quote:

15-Dec-2007 18:59:33 [Einstein@Home] Got server request to delete file h1_0589.65_S5R2
15-Dec-2007 18:59:33 [Einstein@Home] Got server request to delete file l1_0589.65_S5R2
15-Dec-2007 18:59:35 [Einstein@Home] Started download of h1_0589.90_S5R2
15-Dec-2007 18:59:35 [Einstein@Home] Started download of l1_0589.90_S5R2
15-Dec-2007 19:00:20 [Einstein@Home] Finished download of l1_0589.90_S5R2
15-Dec-2007 19:00:20 [Einstein@Home] Started download of h1_0589.95_S5R2
15-Dec-2007 19:00:37 [Einstein@Home] Finished download of h1_0589.90_S5R2
15-Dec-2007 19:00:37 [Einstein@Home] Started download of l1_0589.95_S5R2
15-Dec-2007 19:01:17 [Einstein@Home] Finished download of h1_0589.95_S5R2
15-Dec-2007 19:01:29 [Einstein@Home] Finished download of l1_0589.95_S5R2
16-Dec-2007 05:37:47 [Einstein@Home] Computation for task h1_0589.35_S5R2__22_S5R3a_1 finished
16-Dec-2007 05:37:47 [Einstein@Home] Starting h1_0589.40_S5R2__36_S5R3a_0
16-Dec-2007 05:37:48 [Einstein@Home] Starting task h1_0589.40_S5R2__36_S5R3a_0 using einstein_S5R3 version 415
16-Dec-2007 05:37:50 [Einstein@Home] Started upload of h1_0589.35_S5R2__22_S5R3a_1_0
16-Dec-2007 05:37:52 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 986 seconds of work, reporting 0 completed tasks
16-Dec-2007 05:37:57 [Einstein@Home] Scheduler request succeeded: got 1 new tasks
16-Dec-2007 05:37:57 [Einstein@Home] Got server request to delete file h1_0589.70_S5R2
16-Dec-2007 05:37:57 [Einstein@Home] Got server request to delete file l1_0589.70_S5R2

16-Dec-2007 05:37:59 [Einstein@Home] Finished upload of h1_0589.35_S5R2__22_S5R3a_1_0
16-Dec-2007 05:37:59 [Einstein@Home] Started download of h1_0590.00_S5R2
16-Dec-2007 05:37:59 [Einstein@Home] Started download of l1_0590.00_S5R2
16-Dec-2007 05:38:43 [Einstein@Home] Finished download of l1_0590.00_S5R2
16-Dec-2007 05:38:47 [Einstein@Home] Finished download of h1_0590.00_S5R2

16-Dec-2007 16:49:23 [Einstein@Home] Computation for task h1_0589.40_S5R2__37_S5R3a_0 finished
16-Dec-2007 16:49:23 [Einstein@Home] Starting h1_0589.60_S5R2__14_S5R3a_1
16-Dec-2007 16:49:23 [Einstein@Home] Starting task h1_0589.60_S5R2__14_S5R3a_1 using einstein_S5R3 version 415
16-Dec-2007 16:49:25 [Einstein@Home] Started upload of h1_0589.40_S5R2__37_S5R3a_0_0
16-Dec-2007 16:49:37 [Einstein@Home] Finished upload of h1_0589.40_S5R2__37_S5R3a_0_0
16-Dec-2007 18:16:35 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 106 seconds of work, reporting 2 completed tasks
16-Dec-2007 18:16:45 [Einstein@Home] Scheduler request succeeded: got 1 new tasks
16-Dec-2007 18:16:45 [Einstein@Home] Got server request to delete file h1_0589.75_S5R2
16-Dec-2007 18:16:45 [Einstein@Home] Got server request to delete file l1_0589.75_S5R2
16-Dec-2007 18:16:45 [Einstein@Home] Got server request to delete file h1_0589.80_S5R2
16-Dec-2007 18:16:45 [Einstein@Home] Got server request to delete file l1_0589.80_S5R2
16-Dec-2007 18:16:47 [Einstein@Home] File skygrid_0600Hz_S5R3.dat exists already, skipping download
16-Dec-2007 18:16:47 [Einstein@Home] Started download of h1_0590.05_S5R2
16-Dec-2007 18:16:47 [Einstein@Home] Started download of l1_0590.05_S5R2
16-Dec-2007 18:17:40 [Einstein@Home] Finished download of h1_0590.05_S5R2
16-Dec-2007 18:17:40 [Einstein@Home] Started download of h1_0590.10_S5R2
16-Dec-2007 18:17:44 [Einstein@Home] Finished download of l1_0590.05_S5R2
16-Dec-2007 18:17:44 [Einstein@Home] Started download of l1_0590.10_S5R2
16-Dec-2007 18:18:34 [Einstein@Home] Finished download of l1_0590.10_S5R2
16-Dec-2007 18:18:35 [Einstein@Home] Finished download of h1_0590.10_S5R2

The host had been processing 0589.xx data starting from around 0589.35. The skygrid file at the start of this particular run was 590, as you would expect. I've started the log at the point where the .65 files were being deleted and the .90 and .95 files were being requested. No change in skygrid at this point.

At the next work request (shown in blue), the .70 files were deleted and the 0590.00 files were downloaded but there was no request for a new skygrid.

At the following work request (shown in red), the .75 and .80 files were deleted and the 0590.05 and 0590.10 files were downloaded. It was at this point that the 600 skygrid file was first requested. My assumption is that a skygrid file is sent at the same time that any data that needs that skygrid is sent, irrespective of when that data might actually be crunched. I interpret the log as indicating that 0590.00 doesn't need skygrid 600 but 0590.05 and above does.

It would appear that I must have been mistaken in saying that I'd seen a data filename with the fractional part just above the precise nnn.00. I apologise for having misled people about this.

So unless I find a clear example to the contrary, and in the light of your examples as well, I now conclude that the data range of skygrid nn0 must be from n(n-1)0.00 to nn0.00. I'm wondering if actual nn0.00 data could be handled by either skygrid which could perhaps explain your observations when the run started at 0600.00.

Also, if you're wondering why my log says that the download of the 600 skygrid was skipped, it's because I save skygrids when they first appear and roll them out to all other machines on the LAN. As shown here, it saves unnecessary downloading that way.

I agree that your "about1" value looks like being .00 and therefore should be dropped. I'm still wondering about the .433 value however.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118652180529
RAC: 18889712

RE: ... I think I'll put a

Message 78840 in response to message 78822

Quote:

... I think I'll put a daily update of the two hosts - yours (1001562) and peanut's (997488) on the website with a link here. The link should stay the same - all I'll do is replace the files approximately each day. The files will keep growing as results are added. I'll put a note in this thread each time I update the files.

These results are from RH's machine and these are from peanut's.

Fresh full sets of results have been uploaded, replacing the previous files. The links are unchanged. Please report any errors or problems here.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2991076307
RAC: 703912

909.15_429 is in, so here's

909.15_429 is in, so here's the first graph:


(direct link)

We're approaching the trough, so this is just less than half a cycle (RR7A gives trough at 426, peak at 511). I don't know why we've got all the oscillation down the bottom - anyone care to speculate?

Here are the remaining data points for 909.15 (beyond the ones which Gary filed this morning):

909.15,429,20998.61,4.32
909.15,430,21297,4.32
909.15,431,21002.11,4.32
909.15,432,21401.08,4.32
909.15,433,21010.2,4.32
909.15,434,21489.61,4.32
909.15,435,21025.31,4.32
909.15,436,21585.81,4.32


and the RR analysis:

		Period of task cycle = 170.3
		Number of points = 67
		Minimum runtime in data = 20998.61
		Maximum runtime in data = 27352.58
		Estimated peak runtime = 28658
		Estimated average runtime = 23924
		Estimated trough runtime = 21221
		Estimated runtime variance = 0.259


The first three at 909.25 are part-cooked, and I've restarted CPDN on the fourth core to try and give the WU generator and scheduler a chance to keep up - I'm already 8 tasks ahead of the nearest allocated wingman. This 4.32 app is incredibly fast (max 7 h 36 m per task, min 5 h 50 m).

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7309801689
RAC: 2311365

RE: 909.15_429 is

Message 78842 in response to message 78841

Quote:

909.15_429 is in,

We're approaching the trough, so this is just less than half a cycle (RR7A gives trough at 426, peak at 511).


The round up to next 10 Hz integer multiple plus 0.433 and use a sky grid constant of .00020417 method predicts trough at 423, peak at 508 for 909.15 (and for any other frequency between 900 and 910). It will be a little hard to tell the difference though. The chances would be much better if this spanned a peak instead of a trough.

I've seen that even/odd oscillation somewhere before, but don't recall where, and don't recall whether a plausible explanation turned up.

The closest thing I have to a guess is some sort of "drafting". If two processes running very similar applications in near sync may see the leading process be charged more CPU time for the same work, if the trailing process is less likely to get swapped out. But that does not feel very persuasive at all. Other bids?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118652180529
RAC: 18889712

RE: ... I've restarted CPDN

Message 78843 in response to message 78841

Quote:
... I've restarted CPDN on the fourth core to try and give the WU generator and scheduler a chance to keep up ....

I'm seeing exactly the same on a bunch of dual PIII 1400 servers. I don't think it's anything to do with the speed of the host. I believe the WU generator is creating sequence numbers from the top down on all frequency bands belonging to a given skygrid file at about the same rate and therefore wont create the lower sequence numbers on any given band until it's ready to do that for all bands. I say this because I've noticed that when it stops on one band it doesn't jump to the topmost sequence number of the next band - it jumps to an approximately equivalent but perhaps somewhat higher position in the next band. In other words all bands are "draining" at what seems to be a similar rate and perhaps the scheduler is thinking that any adjacent band that might be lagging a bit, needs a bit of help .... :).

It will be very interesting to see if it ever "jumps back" to a previous band once it has "evened-up" the "laggers". I hope so but I suspect not, unless the band you vacated now happens to be one of the next set of "laggers" :).

At the end of the day, it might not be all that important. It should be possible to simulate complete data sets for multiple cycles by adjusting the adjacent frequencies to a single value. A good reason to really nail down whether or not the period funrtion is stepwise in frequency or not.

EDIT:
I've only had a very casual look at peanut's data that I've been posting - just enough of a look to experience the dramatic slowdown in RR performance as you cycle to different frequency bands that Peter reported. However, my impression was that peanut's machine was jumping from band to band at about the same seq# position but perhaps back a bit each time.

PS:
My meaning might be clearer if the frequency "band" description was replaced with frequency "step". I'm referring to the .00, .05, .10, .05 ... frequency values as "bands" which is probably not the best term to use.

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 325497562
RAC: 65844

RE: A good reason to really

Message 78844 in response to message 78843

Quote:
A good reason to really nail down whether or not the period funrtion is stepwise in frequency or not.


I can do a quick & dirty change to RR_V7C ( call it V7D ) to auto-adjust/shift the frequencies on input according to some function. Also the period derivation for that matter.

As I've been too busy to follow the discussion closely, what forms would you prefer for the period and/or frequency adjustments?

Currently :

period = some_constant * frequency^2

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7309801689
RAC: 2311365

RE: RE: A good reason to

Message 78845 in response to message 78844

Quote:
Quote:
A good reason to really nail down whether or not the period funrtion is stepwise in frequency or not.

I can do a quick & dirty change to RR_V7C ( call it V7D ) to auto-adjust/shift the frequencies on input according to some function. Also the period derivation for that matter.

As I've been too busy to follow the discussion closely, what forms would you prefer for the period and/or frequency adjustments?

Currently :

period = some_constant * frequency^2

Cheers, Mike.


I currently prefer:

period = .00020417*(ceiling(frequency,10) + .433)^2

using Excel notation.

Notes:

1. The ceiling function implements the 10-Hz step that Bikeman has strongly suggested should be expected from the influence of the number of lines in the skygrid file. We've not managed a direct observation confirmation, but it is not excluded either.

2. The .433 "fiddle" is something that Gary found to match the skygrid linecount data over a wide range (at least I think that is where he got it). My observational comparisons do not object, though it cannot confirm to that resolution.

3. Even if the 10-Hz step matter is not fully true (though at this moment I suspect it is), "batching" observations together should be helpful in the current generator climate in which we are seeing our hosts hop from a frequency to that +.05 daily or even more often.

4. If you are feeling very generous, you might provide that the data from a 10-Hz step are all handled together for the purposes of estimation, statistic, and graphing (probably labelled by the 10-Hz multiple that comes out of the ceiling function, but are distinguished on the graph by some combination of plotted color and plotted symbol. For some cases (probably those with non-sequential hops) this would help us notice if the 10-Hz step hypothesis is actually not true.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118652180529
RAC: 18889712

RE: RE: ... what forms

Message 78846 in response to message 78845

Quote:
Quote:
... what forms would you prefer for the period and/or frequency adjustments?

I currently prefer:

period = .00020417*(ceiling(frequency,10) + .433)^2 ....

Since Mike hasn't been following closely, a short summary might help :).

  • * Bikeman suggested - Const = GridSize / Skypoints / Frequency^2
    * I showed - this works extremely well if Freq = (skygrid filename value) + 0.433
    * I did this by extracting #lines (GridSize) from a big range of skygrid files and playing with freq to see what best fitted. There was no theoretical basis for what I did - just suck it and see.
    * Using this approach the const is 0.00020417 +/- about 0.00000001 when tested on skygrid frequencies ranging from 380 to 930 which I think you'll agree is a pretty substantial range.
    * The constancy of the const is too good to be a coincidence. There must be an explanation although I have no understanding of it. However it would seem that part of the explanation may well be that frequency should be stepwise in 10Hz steps when calculating the period.
    * I asked Mike to test appropriately modified values of frequency and const on some data that was clearly drifting off the model line at the higher cycles where small errors in period would start to really be visible.
    * I thought, and Mike agreed at the time, that these mods gave a better fit between the data and the model. The data was from RH's archive.

Please note that this summary is for my own benefit as much as anyone else's since brain fades are not the exclusive domain of MH :).

I also agree with Peter's suggestions for a mod to RR.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.