O3ASE Questions - Issues - Advice

Gandolph1

Joined: 20 Feb 05

Posts: 180

Credit: 389650264

RAC: 1083

earthbilly wrote: I just ran

18 May 2021 0:34:28 UTC

Message 185922 in response to message 185919

(moderation:

)

earthbilly wrote:

I just ran a large batch of engineering tasks, 4000-5000, and had very little trouble except for when I was experimenting around too much with hardware. Even some at 2X strings on two rx570 4gb did well but I had to watch that closely and decided next time to just crunch along at one string per gpu. I am in no hurry. I am retired, every morning.

Oh! I found really better luck with multi strings per gpu when I staggered the start time by half the completion time.

I was wondering if it would be possible to do this to better utilize the GPU during the CPU post processing of the alternate task. How did you accomplish this?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117661546051

RAC: 35251388

gandolph1 wrote:... How did

18 May 2021 2:20:19 UTC

Message 185923 in response to message 185922

(moderation:

)

gandolph1 wrote:

... How did you accomplish this?

You need to be using an app_config.xml file so that you have instantaneous local control. Launch BOINC with this file in place but with the <gpu_usage> value set to 1 so that one GPU task only will be running. You will have <cpu_usage> set to 1 and you will leave it there since these tasks do require a lot of CPU support.

After startup, with the single GPU task running, open the config file with a plain text editor and change the <gpu_usage> value to 0.5. Save the change.

In BOINC Manager, when the running task gets to around ~48% (Tasks tab in BM advanced view) just click on Options -> Read config files and a second task will immediately start. If things get out of sync, some simple mental arithmetic will tell you when to suspend a running task that's too close to its partner. For example, let's say that you observed one task at ~15% and the other at ~13% and you wanted to correct this as easily as possible. Just wait until the first task hits ~26% and then suspend the 2nd task. When a replacement for it starts up, just resume the suspended task (which should be around 24%). You can figure out for yourself why this would work :-). If you miss the 25% sweet spot, you could always use the other one at 75%.

I was running at x4 initially (then x3 at higher frequencies - RX 570 8GB) on the previous GW GPU search using equidistant spacing of tasks (just an extension of the above). It was fairly easy to maintain since most tasks tended to take about the same amount of time. I'm not yet running this Engineering test run since I don't have time to baby sit hosts with apps that don't have stable performance (or timing) and particularly when this test may end shortly and the real science run with actual data may turn out to behave differently.

In constructing your app_config.xml file, if you don't know the proper short name to use, it will be listed in an <app> ... </app> section of your current state file (client_state.xml). You can browse that file but be very careful not to change anything. On past behaviour, I would guess it will einstein_O3ASE.

For others reading this and perhaps running GRP GPU tasks instead of GW, the current batch have a relatively long followup stage where the GPU is essentially idle, The length of this stage seems to be increasing as the frequency term in the task name (around 596.0Hz at the moment) increases. You will gain a big performance boost in the 2nd task of an x2 setup (just for the duration of the followup stage), if you make sure it is nowhere near the finish line when the first task is there. Since the length of the followup stage depends on CPU speed, this is particularly of use if you have an old and/or slow CPU.

Cheers,
Gary.

earthbilly

Joined: 4 Apr 18

Posts: 59

Credit: 1140229967

RAC: 0

Hi Gandolph,I let the

18 May 2021 15:13:19 UTC

Message 185938 in response to message 185922

(moderation:

)

Hi Gandolph,

I let the tasks q in, just load 0.1 days at first. Then select no new tasks. Then select all engineering tasks and suspend. Then select the first task and resume. After a minute or two depending on how fast your hardware run these task, select the next task and resume. That will fill the first gpu. Select the third task and resume right away. Now wait a minute to a few minutes and select the fourth task to resume. Now both GPU's are full with two strings running each which start and stop out of sink. I guess it will work for how ever many GPU's you have. Just get r done before gpu 1 is ready for it's first string new task. Now resume all remaining tasks and ask for more tasks.

My completion time for single strings was roughly 19 minutes. My completion time per task only went up to 22 minutes each running two strings per gpu. Really GOOD! However, everything needs to be just right so your gpu doesn't have or get a bug and stop. If you have run two strings before then you probably won't have anything but smooth sailing.

It also might need doing over sometimes unless you can do what Gary explained to keep them from meshing again.

This also assumes you know how to change settings in your preferences account to run 2X tasks per gpu.

Work runs fine on Bosons reacted into Fermions,

Sunny regards,

earthbilly

cecht

Joined: 7 Mar 18

Posts: 1534

Credit: 2907815435

RAC: 2152759

Yes, getting started needs a

18 May 2021 16:14:31 UTC

Message 185947

(moderation:

)

Yes, getting started needs a bit of baby sitting, but once tasks are staggered, I find that they tick along unattended with no problem. This is with two RX 570 (4GB) running tasks @ 3X with a 6-core CPU; productivity is an average completion time of ~23 min for ~375 reported tasks/day (last week was ~ 325/day; mileage varies with WU Hz?).

I used to suspend and resume tasks that occasionally would pile up at 99% completion, but productivity seem to be the same with or without that hand-holding, so I just let them sort themselves out. If #-cores are limiting, then I can see where ongoing pile-ups (meshing?) could lead to a productivity issue.

What I have not figured out is that with my computing preferences cranked down to 0.01 days of work stored, BOINC queues ~1000 tasks. It used to be ~30 tasks in queue with storage set to 0.05 days of work, so I don't know what happened to change that. It's a host thing, because on my other host, BOINC task storage is as expected.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18743641421

RAC: 7012681

Check to see if the host

18 May 2021 16:34:53 UTC

Message 185948

(moderation:

)

Check to see if the host picked up a web based preference from another project.

My best solution is to set local preferences so they won't get overridden by a change in another projects web preferences propagating out to the rest of my hosts via BOINC's amalgamation.

earthbilly

Joined: 4 Apr 18

Posts: 59

Credit: 1140229967

RAC: 0

Keith Myers wrote: Check to

18 May 2021 16:38:44 UTC

Message 185950 in response to message 185948

(moderation:

)

Keith Myers wrote:

Check to see if the host picked up a web based preference from another project.

My best solution is to set local preferences so they won't get overridden by a change in another projects web preferences propagating out to the rest of my hosts via BOINC's amalgamation.

DITTO!

I also lined up 400-500 tasks per host so fast I thought Gary had something to do with it to get us to run these;-)

Work runs fine on Bosons reacted into Fermions,

Sunny regards,

earthbilly

cecht

Joined: 7 Mar 18

Posts: 1534

Credit: 2907815435

RAC: 2152759

earthbilly wrote: Keith

18 May 2021 18:27:42 UTC

Message 185951 in response to message 185950

(moderation:

)

earthbilly wrote:

Keith Myers wrote:

Check to see if the host picked up a web based preference from another project.

My best solution is to set local preferences so they won't get overridden by a change in another projects web preferences propagating out to the rest of my hosts via BOINC's amalgamation.

DITTO!

I also lined up 400-500 tasks per host so fast I thought Gary had something to do with it to get us to run these;-)

Hmmmm. According to Boinc Manager Computing Preferences, I am using local prefs. And the global_preferences_override.xml file reflects any changes I make. And the Event log says it read the override file when I evoke "Read local prefs file". But Boinc Manager still keeps a queue of 1001 or 1002 tasks no matter what. I mean, everything is running fine, and I don't much mind having 3 days of work lined up, but why won't it listen to me?

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4350

Credit: 3212156627

RAC: 2022507

If you are using

18 May 2021 21:02:36 UTC

Message 185953

(moderation:

)

If you are using max_concurrent line in your app_config file that could do it. There seems to be a bug in the work fetch algorithm of Boinc client. It will just keep asking for more work until server says 'no more work for you'.

earthbilly

Joined: 4 Apr 18

Posts: 59

Credit: 1140229967

RAC: 0

Quote:cecht wrote: Hmmmm.

18 May 2021 21:41:23 UTC

Message 185956 in response to message 185951

(moderation:

)

Quote:

cecht wrote:

Hmmmm. According to Boinc Manager Computing Preferences, I am using local prefs. And the global_preferences_override.xml file reflects any changes I make. And the Event log says it read the override file when I evoke "Read local prefs file". But Boinc Manager still keeps a queue of 1001 or 1002 tasks no matter what. I mean, everything is running fine, and I don't much mind having 3 days of work lined up, but why won't it listen to me?

I seem to remember several days ago when I selected this task the first time, the task timer or clock showed this task estimated time to complete at 4 minutes each. That would explain why we get so many. I have no idea how that all works or changes. Maybe something needs more engineering. I'm very envious of all you smart code writers. I learned FORTRAN in high school, FORTRAN II in College. Then it was Cobalt and I said I wasn't learning a new language every two years and turned off my interest. Now I find out Cobalt may have been the one! Sort of like the De'Beers heir I dated in college. If only I learned Cobalt in '76'. Too late for both. Her grandfather made her go to Missouri School of Mines.

Cecht, do you remember what the original batch you Q'd said per task? And is it getting closer? Sounds like it is not. Guess I could try again. Now I am interested. Just in case I'm only going to allow one computer to get them. One I know I can run x2 per gpu. Tomorrow. Then I'll report.

Work runs fine on Bosons reacted into Fermions,

Sunny regards,

earthbilly

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117661546051

RAC: 35251388

Here is a link to a comment

18 May 2021 22:07:43 UTC

Message 185958

(moderation:

)

Here is a link to a comment that Richard Haselgrove posted about problems with <max_concurrent> a little while ago. It was the first reference I found with a search just now and I think there were other comments from him with even more details.

Hopefully he'll see his name being used in vain and respond accordingly with more information :-). Knowing Richard, he has probably continued to pursue this relentlessly :-).

Cheers,
Gary.

O3ASE Questions - Issues - Advice

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner