My daily driver is a Windows box. The current CPU is a Ryzen 2400g (4c/8t/iGpu). I am running PrimeGrid on the iGpu. And a random assortment of e@h CPU tasks on the CPU.
Which CPU tasks will maximize my RAC?
I am looking at both maximizing credit payout and minimizing CPU wall clock time.
I look forward to your analysis.
Tom M
A Proud member of the O.F.A. (Old Farts Association).
Copyright © 2024 Einstein@Home. All rights reserved.
Tom M wrote: My daily driver
)
Are you limited to any particular Project or are you open to all of them, because there's a chart, I don't think it's been updated in awhile though, that might help you
https://www.boincstats.com/stats/-1/cpcs
I think Tom M means the
)
I think Tom M means the separate CPU apps of e@h. I would say: O3MD1 is less rewarded than FGRP5 and BRP4X64. I can't tell if BRP4X64 or FGRP5 gives better credit payout. They are almost the same and runtimes of tasks also differ to some extent. Finished FGRP5 results often stay "pending" for weeks. Often second task of a workunit exceeds deadline; has to be resend; deadline is longer: 2 weeks instead of 1 week for BRP4X64). So RAC for FGRP5 can only be compared to BRP4X64 after running it for some weeks.
Scrooge McDuck wrote: I
)
Yes, I was presuming e@h. It hadn't occurred to me that I could also wonder outside of e@h. This means, of course, I could also consider an "all PrimeGrid" solution. :)
Anyway, I eyeballed the results I had already and it looks like "Multi-Directional Gravitational Wave search on O3 (CPU) v1.03 () windows_x86_64" pays out 1,000 per validated task. And takes about a 1/3 longer than any of the 500 payout tasks.
So I am trying that. These tasks take upwards to nearly a day to run (at least so far) so it will be quite some time before I have a confirmation that this was a good choice. And I still don't think my cpu-based RAC will jump up much :)
Tom M
A Proud member of the O.F.A. (Old Farts Association).
Tom M schrieb: Anyway, I
)
If they only take 1/3 longer, than you will achieve highest RAC by O3MD1. You have to try it on your hardware. It depends on the CPU type.
On my old computer O3MD1 tasks (1000 credits) take 100 to 150 % longer than BRP4X64 (500 credits) tasks. These tasks even seem to stress the CPU more than BRP4X64 or FGRP5. Core temperature rises higher when running same number of O3MD1 tasks compared to FGRP5 or BRP4X64. Runtimes are also 50 to 100 % longer than FGRP5 (693 credits). So for me running O3MD1 reduces RAC remarkably while it was the highest when only running a mix of FGRP5 and BRP4X64 on all cores. But I don't care. E@h's scheduler decides my task mix. My computer also lacks the memory to run O3MD1 on more than half of the cores. So I can't test it.
...and you have to find out
)
...and you have to find out how many CPU cores to use without slowing down the iGPU. It surely will create more credits than 2..3 CPU cores.
... but what ideas can I, the dwarf, give to a power cruncher with ~10M RAC. Happy crunching!
Scrooge McDuck wrote: ...and
)
I am running at 75% of the 8 threads available. The iGPU seems to have slowed down some (the graph under Tasks is drooping). This maybe due to running more CPU threads.
But yesterday I asked the bio's to try to "overclock" the CPU a little bit. It claims it is up 8% and I told the iGPU to work a little harder too.
So I have several things "in motion" to see what happens next.
I also have a Ryzen 5600G on order. (8c/16t, iGPU). Since it is running a later generation of CPU and maybe even GPU I may get more bang out of this system yet. :)
>>.. but what ideas can I, the dwarf, give to a power cruncher with ~10M RAC. Happy crunching!
You mistake "Power Cruncher" for Omniscience. I ain't Omniscient. That is why I have been asking for help here.
If my RAC on my Windows daily driver starts declining I will be reverting to the "any CPU task you like" approach that is working for you.
Tom M
A Proud member of the O.F.A. (Old Farts Association).
Tom M schrieb: You mistake
)
No, I don't meant "Omniscience" but the kind of experience I don't have. Experienced with newest and different types of hardware, different GPUs cards; how to efficiently run them, keep them busy, adjusting concurrent tasks, overclock them or the memory.... and all this stuff the guys from your team discuss in the threads on GPU topics here. Impressing. So, it's you and others who enable e@h to get their science runs finished in reasonable time. I add some crumbs to it...
Be careful with the number
)
Be careful with the number of O3MD1 tasks you pull. I let my 24GB 12c/24t system pull O3MD1 tasks and it is running out of memory as each task is using 1.8GB. In my case this means 10 out of 20 threads are able to be used due to lack of memory
kb9skw wrote: Be careful
)
While I didn't have that issue, once I got all 6 of the 8 CPU threads going the processing time seriously ballooned to a day and a half or so per task.
Switching to MeerKAT? and "run non-preferred tasks" to see if that helps the "balloon" to collapse.
I have a Ryzen 5600G on order. It should allow me to run out of memory :) Right now I am "only" using 91% of my available 16GB.
Tom M
A Proud member of the O.F.A. (Old Farts Association).
Tom M schrieb:While I
)
Your Ryzen 5 2400 GE is a quad-core CPU supporting 8 threads (Intel called the feature HyperThreading, don't know about AMD). Each of 4 physical cores contains a number of Function Units (FU), e.g. 1: integer&SSE_ADD, 2: integer&SSE_ADD&FP_ADD, 3: integer&FP_MUL&SSE_MUL/DIV, 4: load_memory 5: store_mem, ...
A reorder buffer is upstream in front of the FUs. There, the micro-operations (µOps) of the active threads are pre-sorted in such a way that as many FUs as possible can be used in parallel (e.g. 2 integer µOps, 1 FP & 1 integer, or 1 load & 1 FP, ...). For floating point multipy and divide there is usually only one FU. So a physical core can run two integer ops or two FP add ops in parallel, but not two FP multiply.
My example is from an old Intel Core Nehalem micro architecture that I remember:
Intel Nehalem arch - Nehalem (microarchitecture) - Wikipedia
Current AMD Zen micro architecture offers way more µOps in parallel, 4 Integer (ALU),2 load/store, 4 FP of which 2 FP MUL...
Zen microarchitecture - Zen (1st gen) - Wikipedia
Reason: 64-bit FP multiplier hardware requires a large number of logic gates (chip area). Now it depends on the specific CPU (which generation of Intel or AMD micro architecture: how many and which type of FUs per physical core). And it depends on the running threads, which µOps the CPU cores are capable to execute in parallel. O3MD1 app (Gravitational Wave O3) appears to be a very demanding app that probably uses FP_MUL ops extensively. This means that maybe 6 of these threads cannot run in parallel on 4 physical cores. Some MUL or DIV ops have eventually to be executed sequentially. Then threads are slowed down as soon as #threads > #phys. CPU cores. But it is possible that a different mix of einstein threads (different apps, e.g. BRP4X64, FGRP5) results in a better utilization of virtual threads (available FUs) offered by the phys. cores. One has to try it out for each CPU micro architecture. (e.g. limiting max number of concurrent task of specific science app via BOINC's client configuration; limiting number of CPU cores to use (client conf.) as you did: 6 of 8 = 75% of (virtual) CPU cores ... "CPU cores" is a misleading term; it's: "threads")
e.g.: app_config.xml in project's directory:
BRP7 alias "MeerKAT" is GPU only at the moment: and only ATI and Nvidia apps available;