Problem with info posted by Event Log

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118695126967

RAC: 19791652

Allen wrote:... I've realized

13 Sep 2023 0:57:54 UTC

Message 216926 in response to message 216907

(moderation:

)

Allen wrote:

... I've realized that all of my machines were in panic mode.

Hallelujah!! Praise the flying Spaghetti Monster!! I've been trying to convince you about this for many days now :-).

I was prepared to do a deep dive on one machine but NOT all of them since I'm not a total masochist. I was pretty sure about the problem and once getting one of them out of trouble you would know what to do to fix any others with the same condition.

Allen wrote:

All numbers are increasing steadily.

Hopefully not! :-).

Excess tasks on hand should be decreasing. The host I'm watching is now down to ~300 so hopefully you don't have anything suspended anymore and the client can start asking for new work soon. Also, crunch times have decreased a little but I've now noticed that you've exhausted the backlog of the old 4021 series tasks and are now doing the new 3012 series which do run about 5-10% faster.

I understand you weren't referring to those metrics but more likely things like RAC. Credit will take care of itself if you have your machine running efficiently.

Allen wrote:

Still wonder (like you) what is causing the oddity on the 560's ...

Once the machine has settled into a stable routine with regular work fetching and result return, things to look at include driver version, OpenCl version, operating frequencies (core & mem), etc. Also, it would be really useful to narrow down the problem to either the GPUs or to the machine/OS.

You have a machine running Linux. The simplest thing to do when things are stable is shut down both machines temporarily and swap a 560 to replace the existing 580. There should be no issue with doing this swap - drivers, etc., are all OK. I have done swaps like this many times before.

By the way, the current RAC of the 580 under Linux shows as 680K and that compares very favourably with 660K that I see for one of my hosts with an RX570. A 580 should be a bit better, so that all fits well with expectations. The Linux machine is performing as expected.

After restarting this machine with the temporary 560, just allow a few tasks run at x2. I assume the 580 has been adjusted to run at the more sensible x2 - if not, change app_config.xml before restarting BOINC to use x2. There will be partially completed tasks so allow them to restart and finish and just look at new tasks done entirely on the 560. You will be able to get good values for comparison by just allowing a few tasks (eg 5 or so) to run to completion. If the new crunch time comes in at what we have been expecting (low to mid 30 mins for x2 on a 560) you will know that the card is OK and the problem may be something to do with the Windows system.

As soon as you get the answer, just reverse the swap and restart both machines. I'll be interested to see the outcome - probably another puzzle :-). Also, when you have the card out, please record details like model numbers, part numbers, etc so that you can google for anything that might be inhibiting card performance.

Cheers,
Gary.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 708032756

RAC: 999914

You're right about the 580

13 Sep 2023 2:48:08 UTC

Message 216929

(moderation:

)

You're right about the 580 Linux machine. I have released all tasks and even though it had over 500, it downloaded more. I think it has settled on about 750. It is also cranking out about 755K per day per Boinc.

The host you are watching, cranked out only 370K today.

I have found that MY 3 machines with 1050i cards have a slightly better output at 3 tasks, than at 2.

Thanks to your help, I have found out alot about how to make adjustments, that were just plain stupid, before I reduced the tasks on hand. I had never heard of panic mode before you brought it up.

I'm not sure I'm going to attempt the swap, since the 580 takes a 6 pin connector which I am not sure the 560's machine has. We'll see on that.

Kudos, Allen

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118695126967

RAC: 19791652

Allen wrote:You're right

13 Sep 2023 8:33:19 UTC

Message 216932 in response to message 216929

(moderation:

)

Allen wrote:

You're right about the 580 Linux machine. I have released all tasks and even though it had over 500, it downloaded more. I think it has settled on about 750. It is also cranking out about 755K per day per Boinc.

I don't know where you are getting the 750 tasks in progress from. The website says that host has 309 tasks in progress which sounds reasonable for a 2 day cache size.

Also the website shows the current RAC is 685K. Please show how you arrive at the 755K figure. Please also copy and paste the "Resources:" line from a properties page for an in-progress task to show the resource allocation. The stderr outputs for completed tasks on the website don't include that information which must indicate that the Windows and Linux versions of the app report different things. Must be why I've never noticed it before in website stderr outputs since I tend to be looking mainly at Linux results.

Allen wrote:

The host you are watching, cranked out only 370K today.

The website says the current RAC is 300K, which is up a bit from where it was last time I looked. Once again, how do you calculate 370K? For my hosts, the current RAC from the statistics page in BOINC Manager (with a granularity of 1 day) is pretty much in agreement with what shows on the website, which gets updated when new results get validated (ie. at regular intervals during the day).

Allen wrote:

I'm not sure I'm going to attempt the swap, since the 580 takes a 6 pin connector which I am not sure the 560's machine has. We'll see on that.

You need to take a better look.

Both RX 570s (like mine) and RX 580s (like yours) should (I believe) require a single 8pin (6+2) PCIe power connector, in addition to power supplied through the slot they plug into. Some RX 560s (like my pair) don't need a PCIe power cable at all and get all power from the motherboard slot. This is why I was mistaken initially, thinking that they were RX 460s which do use a bit less power and usually don't require any extra cable.

If yours do require a PCIe power cable (a lot do) then all you need to do when you take out the RX 580 is remove the 8pin plug from the card, put the card in a nice safe place (so you can reinsert it after the experiment is over) and plug in the RX 560 in its place. There is no need to put the RX 580 into the other machine - just leave that machine off for the duration.

If your RX 560 needs power, it will be a 6pin connector so you look at the free 8pin plug you now have (or the other one on the cable since they usually come in pairs) and split one of them (as they are designed to do ) into a 6pin bit plus a 2pin bit. Plug the 6pin bit into the RX 560 and you're good to go.

Just remember before you restart BOINC, we are trying to do this experiment at x2 so a quick edit of app_config.xml before launching BOINC if necessary. If you do need an edit, you can easily reverse the edit once the experiment concludes.

The experiment will take at most a couple of hours - less if the first couple of tasks show a big improvement on what you're used to from the Windows machine. When you know the outcome, stop the experiment and return both machines to their prior duties. It's very simple and should give you a definite answer one way or the other.

Cheers,
Gary.

GWGeorge007

Joined: 8 Jan 18

Posts: 3149

Credit: 5068106723

RAC: 3248945

Allen may be looking at

13 Sep 2023 15:37:42 UTC

Message 216938 in response to message 216932

(moderation:

)

Allen may be looking at BOINCtasks under "credit per day" to acquire his task performance, not his "average" RAC.

I'm just guessing though...

George

Proud member of the Old Farts Association

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 708032756

RAC: 999914

GWGeorge007 wrote: Allen may

13 Sep 2023 16:17:14 UTC

Message 216940 in response to message 216938

(moderation:

)

GWGeorge007 wrote:

Allen may be looking at BOINCtasks under "credit per day" to acquire his task performance, not his "average" RAC.

I'm just guessing though...

Yes George, you are totally correct. It is NOT the RAC. Thanks for pointing that out.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 708032756

RAC: 999914

https://www.boincstats.com/st

13 Sep 2023 18:04:31 UTC

Message 216945 in response to message 216932

(moderation:

)

https://www.boincstats.com/stats/-1/host/list/12/0/327cc69513d11332c8c50708cae7ca52

This is where I got my per day information.

RAC on the 560's was at 440K at one time and is now about 304K. The RAC increased 8K over night.

BTW, the 580 machine finally settled down and has a cache of about 300 tasks.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 708032756

RAC: 999914

Thanks to BoincTasks and

15 Sep 2023 15:50:23 UTC

Message 216998

(moderation:

)

Thanks to BoincTasks and others, I have found that the days of tasks is based on the number of tasks running at the same time. So, if you are running 1 task and are set at 2 days work, then you will see 2 days worth of tasks. OTOH, if you have 2 tasks at one time, you will get 4 days of work.

I always thought it was based on 1 task, not many.

Thanks all!

Allen

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4080

Credit: 48664062978

RAC: 34810118

It’s based on number of

15 Sep 2023 16:16:11 UTC

Message 216999

(moderation:

)

It’s based on number of devices and estimated runtime.

if you had been running at 1x on a GPU for a while and have a solid runtime estimation based on 1x, then you switch to 2x, the runtime estimation doesn’t immediately change to 2x that previous value. It will have to naturally evolve. So by switching to two tasks (0.5 GPU), it now thinks you can process 2x the work in the same amount of time. This will gradually fix itself once the runtime estimation grows to its new average value.

_________________________________________________________________________

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118695126967

RAC: 19791652

Ian&Steve C. wrote:... This

16 Sep 2023 2:45:00 UTC

Message 217018 in response to message 216999

(moderation:

)

Ian&Steve C. wrote:

... This will gradually fix itself once the runtime estimation grows to its new average value.

For the benefit of any reader unsure about what happens with increasing the multiplicity, here is a description of what you will see and how to avoid any tendency to over-fetch work. As was mentioned, the client doesn't adjust the estimated run time when the config change occurs but it will do so fairly soon afterwards and most likely in just two stages.

For a 1x to 2x change, when the 'read config files' is clicked, the second task starts whilst the first running task slows to close to half pace. If the work cache was previously topped up to the proper level, the client will use the old estimate and assume the cache is now only half full. It will immediately try to fetch a lot more work.

The sensible way to do the transition is to temporarily set 'no new tasks' (NNT) in the manager before clicking 'read config files' because the in-progress tasks you already have will be fairly close to the correct amount anyway. Not long after the config is changed, the task that was in progress will complete and it will have a longer than usual completion time. If the increase is >10% above the former estimate, the client will immediately choose this as the new estimate, otherwise only ~10% of the increase will be applied immediately. With NNT set, the client wont be able to fetch any work. This is the first stage for 'fixing' the estimate.

As soon as the new task that was just started completes, the run time will be perhaps not far short of double the 1x run time and is therefore guaranteed to cause the client to use this full amount immediately as the new estimate. So the maximum time for the client to get back at least close to 'on track' is just the completion time for the new task that was created by the change. This is the second stage of the change.

The final thing to do is cancel NNT and allow the client to manage any future work requests as per normal.

If you are decreasing multiplicity (2x -> 1x) there is no risk of over-fetch. Tasks will complete more quickly and the estimate will be too high for the 'new normal'. Every task completion will cause the client to gradually reduce the estimate by 10% of the difference. You will have a cache of work that tends to be 'a bit less than full' because the client thinks it's over-full. This will continue until the estimate has been reduced to the current run time. During this time, the client has been gradually increasing the work on hand to the correct amount. With GPU tasks, it doesn't take that long.

Cheers,
Gary.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 708032756

RAC: 999914

New Question:On my Ryzen

18 Sep 2023 18:52:56 UTC

Message 217111

(moderation:

)

New Question:

On my Ryzen machine. Ryzen7 4600G, so has internal ATI graphics and 16 threads. Runs about 4.2Ghz.

I have a 1050ti card in this machine too.

I was running 2 nVidia tasks and 2 ATI tasks.

Times were about 31 minutes on the nVidia tasks and 2 hrs 10 minutes on the ATI.

I changed the tasks to 3 each and now the times are 48 min and 2 hrs 22 min.

From this, I gather that I lost production on the nVidia by about 1.5 min and gained another task on the ATI costing only 12 minutes, so all and all, I gained production and that's what it shows the results.

I don't really understand this, but it makes me ask, how can I specify more ATI tasks and less nVidia tasks running at the same time?

Thanks for any ideas or facts.

Allen

BTW, I am not running any CPU tasks. Also, I made the task change from 2 to 3 a few days ago.

Problem with info posted by Event Log

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports