Hi everyone,
I'm a happy cruncher for Einstein@home for quite some time on Windows and Linux. Just a few weeks ago, I bought a new system, a Ryzen 3700X with 32 GB RAM, an NVIDIA 2070 Super, and an Gigabyte Aorus X570 Elite Mainboard. Everything runs perfectly, hardware is checked and no problems could be found. However (topics 1 to 3 are minor issues only, but could be helpful by indicating something - that's why I included them here although they are actually off-topic):
- As typical, I ran BOINC under Ubuntu 19.10 (will transfer to 20.04 LTS for several reasons) and found that Einstein@Home did not load any workunits for the GPU (it did so for Asteroids@Home, where the GPU worked nicely). However, I only tried for two days, so it could be a mere coincidence that no GPU workunits were downloaded. GPU-workunits for Einstein@Home are, however, loaded and executed under Windows 10. That's why I switched to that OS for the moment.
- From here on, both Linux and Windows 10 are affected: There are now exactly twice as many workunits active as I have cores (16 to 8). As far as I have read, that is normal and has to do with the Ryzen reporting two threads for each core. I'm not sure whether this is efficient, but I trust the developers here.
- At first, the CPU-workunits raced up to around 25% progress, only to be reverted back to 0% after 50 minutes to 70 minutes. CPU-time then is 0 as well. I understood from different threads that this is a common behaviour. Is that necessary? I never observed that on my other (Intel) systems.
- The worst problem: whenever I start calculating (e.g. after booting, or pausing Einstein@Home), there is no progress for around 30 to 90 minutes (today even more, nearly 120 minutes). This is true for all workunits, no matter how far they have progressed or what kind of calculations the do (Gamma-ray pulsar search as well as graviational wave search). CPU-time counts up, but there is no progress. All workunits are affected in the same way. They all need the same time (probably not to the second) until they start. Temperature readings of the CPU show that it does calculate something, but it is clearly below the normal working temperatures at full load. See second edit for a remarkable exception!
Since I often run my system only for a few hours, waiting for an hour or two every time is very annoying. Since both, Windows and Linux, are affected, but not my other computers, there seems to be some issue with that particular computer.
I have no other programs/apps/processes running, just the OS and their typical background load (1% to 4% typically). Asteroids@Home is paused at the moment as otherwise I would never get Einstein@Home calculating. Sorry for using your time and many thanks for every kind of help! Clear Skies,
Guenther
Edit: Just after finishing this post I was so frustrated that I installed something, CPU-load went above 25% and BOINC paused all workunits. A few seconds later, all continued. Still, after starting BOINC, no workunit of any kind progresses. Many thanks!
Edit: If a workunit reached 89.979% it "get's stuck", thus shows no progress. However, after some time (around 2h), it jumps directly to 100% and thus finishes. Fine. Interestingly, this is also possible if calculations are in the state described in point 4. So for late-stage workunits above 89.979%, point 4 does not apply!
Guenther schrieb: ... There
)
Hi,
did you try in your Boinc manager - options - computing preferences use at most 50% of CPU to see what happens?
Also, FGRP tasks and O2... tasks may not like each other, so try to limit your crunching to one of those in your account - preferences - project.
In ubuntu, make sure it is Opencl capable.
:-)
Hi Solling2, I have removed
)
Hi Solling2,
I have removed the ticks from the three O2-fields in the preferences. Thanks for that, let's see how it works out. However, there are still many O2-workunits active, so it may take a while.
I limited CPUs to 50% twice - once deactivating one half of the wu, then deactivating the other half. Now one is deactivated, the other one does not progress (point 4 of my initial post). My mistake, I should have done that only once... Before it caculated 7.75% per hour (my own meassurement, not the client's one), let's wait and see how much it is with 50% of CPUs.
Cheers!
Reducing the number of cores
)
Reducing the number of cores to 50% seems to increase the speed per core, but not enough to balance the lack of cores. Good to know ;-)
I unticked all three entries for O2 in my preferences and updated the client, still several O2 workunits were downloaded - CPU as well as GPU! Do I need to restart the calculations?
However, problem 4 - my main trouble - remains unaffected. I did not check for Opencl under Linux yet, that may take a few days or more.
Cheers and clear skies!
To balance cores, cconfig xml
)
To balance cores, cconfig xml or a second boinc instance may be useful. Just keep in mind that Nvidia gpu require one cpu core per task for support. I guess all bios and chip drivers are updated?
Thanks for the hints! I
)
Thanks for the hints! I double-checked drivers and BIOS, all were the latest available versions.
Guenther wrote:Hi
)
I believe you are talking about the Usage Limits section while he was talking about the When To Suspend section for the 50%.
Indeed, I was talking about
)
Indeed, I was talking about the usage limit section. Since BOINC never get's suspended (apart from the one mistake I made by manually installing something), I did not think about the when to suspend section. Anyway, I tried to set that value from 25% to 50%, but the result is the same as before.
However, I made a strange observation: my workunits "get stuck" at 89.979% for around 2h and then directly jump to 100%. No problem for me. However, they also do so during the time, when there is no progress for the workunits which did not reach 89.979%. So point 4 of my initial post does not apply to workunits which did reach this percentage! I'm more and more puzzled!
Thanks and clear skies!
Guenther wrote:Hi
)
Try turning HT (HyperThreading) off.
Worked for me ...
Guenther wrote:Indeed, I was
)
I believe this is normal as at the end of the workunit it's cleaning up and checking everything and the progress can slow to a crawl then jump as it does it's thing.
As for the Suspend part try turning everything off and see what happens for the next couple of hours, the pc could become unusable and very laggy and you may hve to set it back to the current settings but it's worth a test to see if things speed up for you as far as crunching goes. If it does then you have your answer...cut back on the cores for crunching or get another pc just for crunching.
Thanks everyone for the many
)
Thanks everyone for the many helpful ideas! After playing around a lot, I think I found a solution/workaround:
For the moment, I'm happy with the current situation. Many thanks once again and clear skies!