Overwhelming proportion of all tasks don't validate

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 1043043712
RAC: 1681954
Topic 229346

Platform:

AuthenticAMD AMD Ryzen 7 5800X 8-Core Processor [Family 25 Model 33 Stepping 2]

Number of processors:8

Coprocessors:[2] AMD AMD Radeon (TM) R9 390 Series (8192MB)

Operating system: Microsoft Windows 11 Professional x64 Edition, (10.00.22621.00)

BOINC client version:7.20.2

Memory:65450.66 MiB

Cache:512 KiB

Swap space:69546.66 MiB

Total disk space:930.78 GiB

Application: 1.03 Multi-Directional Gravitational Wave search on O3 (GPU) (GW-opencl-ati)

Normally I process WUs for MilkyWay@Home at RPI, where ALL my WUs validate, but recently that site went down and my work room was cold, so I returned to Einstein@Home for a few days.  As far as I can tell, from March 31 to April 5 I processed about 176 WUs on this computer, of which 39 were judged valid and 137 invalid.  On my Intel machine, 145 were deemed valid and 55 invalid.  

 

My request is this: When a high proportion of a user's WUs are held invalid, could Einstein@Home send out a notice, similar to the notice sent out when a new version of the BOINC client is released, giving the user's validate/invalidate statistics and one or more diagnostic messages telling the user why his or her WUs don't pass muster.  This forum, in a different thread, claims that most validate errors occur due to deficiencies in the user's computer.  So prove it by giving us information pointing to the cause of the errors.  This is a brand new computer that I just built.  If it is causing errors, I would much rather know sooner than later.  In addition, in Southeast Pennsylvania the power company compares consumers' use of electricity by neighborhood. Every month I get a letter from the power company telling me that I use much more electricity than my neighbors. I import electricity from a power company in another state that uses all renewable sources, so I'm not worried about climate change. Still, generating a high proportion of invalid work units is a terrible waste of electricity, not to mention money. It would help a lot if Einstein@Home could tell me what I'm doing wrong.

 

Thank you in advance for your thoughtful consideration of this request.

Charles Elliott 

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 1043043712
RAC: 1681954

Will you please rename the

Will you please rename the applications listed on the user's account page, preferences tab, to correspond to the column names on the Einstein@Home server status page.  Now it is not possible to tell what statistics apply to which application.

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861024624
RAC: 1443392

CElliott wrote: Will you

CElliott wrote:

Will you please rename the applications listed on the user's account page, preferences tab, to correspond to the column names on the Einstein@Home server status page.  Now it is not possible to tell what statistics apply to which application. 

This has been asked for many many times yet it hasn't been done so far, it would be nice if this time is the time it gets done.

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861024624
RAC: 1443392

CElliott

CElliott wrote:

Platform:

AuthenticAMD AMD Ryzen 7 5800X 8-Core Processor [Family 25 Model 33 Stepping 2]

Number of processors:8

Coprocessors:[2] AMD AMD Radeon (TM) R9 390 Series (8192MB)

Operating system: Microsoft Windows 11 Professional x64 Edition, (10.00.22621.00)

BOINC client version:7.20.2

Memory:65450.66 MiB

Cache:512 KiB

Swap space:69546.66 MiB

Total disk space:930.78 GiB

Application: 1.03 Multi-Directional Gravitational Wave search on O3 (GPU) (GW-opencl-ati)

Normally I process WUs for MilkyWay@Home at RPI, where ALL my WUs validate, but recently that site went down and my work room was cold, so I returned to Einstein@Home for a few days.  As far as I can tell, from March 31 to April 5 I processed about 176 WUs on this computer, of which 39 were judged valid and 137 invalid.  On my Intel machine, 145 were deemed valid and 55 invalid.  

 

My request is this: When a high proportion of a user's WUs are held invalid, could Einstein@Home send out a notice, similar to the notice sent out when a new version of the BOINC client is released, giving the user's validate/invalidate statistics and one or more diagnostic messages telling the user why his or her WUs don't pass muster.  This forum, in a different thread, claims that most validate errors occur due to deficiencies in the user's computer.  So prove it by giving us information pointing to the cause of the errors.  This is a brand new computer that I just built.  If it is causing errors, I would much rather know sooner than later.  In addition, in Southeast Pennsylvania the power company compares consumers' use of electricity by neighborhood. Every month I get a letter from the power company telling me that I use much more electricity than my neighbors. I import electricity from a power company in another state that uses all renewable sources, so I'm not worried about climate change. Still, generating a high proportion of invalid work units is a terrible waste of electricity, not to mention money. It would help a lot if Einstein@Home could tell me what I'm doing wrong.

 

Thank you in advance for your thoughtful consideration of this request.

Charles Elliott  

You may get more answers in the Crunchers Corner forum than this one.

I have no idea what's wrong and why it's failing for you but have you tried updating the Visual C runtimes all in one 2.23 version yet? I have no diea if this will fix this but I've updated it before to get other projects gpu tasks working again.

https://www.majorgeeksoft.com/visual-c-runtime-installer-all-in-one/

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 1043043712
RAC: 1681954

I downloaded and installed

I downloaded and installed Visual C++ 2015-2022 / 14.24.31938 Redistributable Package (x64) as you suggested.  Apparently only File Explorer and BitDefender use the new cruntime.dll.  All the other programs use msvcrt.dll, which has an old file date.  These results are according to the Windows built-in program, Resource Monitor.

 

It is the MeerKAT workunits that are failing, by over 50%.  Hundreds of hours of processing time and thousands of kilowatts of electricity are being wasted every day.  Why can't project administrators take an interest in this crap they are purveying?  If these failures are the users' fault, then tell us what we can do about it.  It it is project's fault, then stop distributing a broken application.

 

Users are making great sacrifices to conserve electricity, water, gasoline, and other resources.  Why can't Einstein@Home and the University of  Wisconsin -- Milwaukee get with the program?


mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861024624
RAC: 1443392

CElliott

CElliott wrote:

I downloaded and installed Visual C++ 2015-2022 / 14.24.31938 Redistributable Package (x64) as you suggested.  Apparently only File Explorer and BitDefender use the new cruntime.dll.  All the other programs use msvcrt.dll, which has an old file date.  These results are according to the Windows built-in program, Resource Monitor.

 

It is the MeerKAT workunits that are failing, by over 50%.  Hundreds of hours of processing time and thousands of kilowatts of electricity are being wasted every day.  Why can't project administrators take an interest in this crap they are purveying?  If these failures are the users' fault, then tell us what we can do about it.  It it is project's fault, then stop distributing a broken application.

 

Users are making great sacrifices to conserve electricity, water, gasoline, and other resources.  Why can't Einstein@Home and the University of  Wisconsin -- Milwaukee get with the program?

On the other hand I am also running the Meerkat gpu tasks on my own gpu's and these are my stats for those:

So they can be run to completion and give valid results you just have to figure out why yours aren't.  The 50 Error tasks are mostly from me aborting tasks that were running on a gpu that I didn't want to run tasks on.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5020
Credit: 18920917688
RAC: 6508762

Bernd made an interesting

Bernd made an interesting comment on the issues with 4090 thread that the BRP7 applications for each platform use different FFT files when compiled.

So each application will produce a slightly different answer because of the differences in the FFT files and lead to much higher invalids compared to the other gpu applications.

So much, much higher chance of invalids when each wingmen uses a different card type.  Nvidia won't match against AMD or Intel.  Intel won't match against Nvidia or AMD and AMD won't match against Nvidia or Intel.

Poor application design of the BRP7 application NOT to use the same code path for each card type.

 

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1077
Credit: 18244286
RAC: 11685

Keith Myers schrieb:Bernd

Maybe it's the hardware differences of Nvidia, AMD, Intel enforcing this? Internal number representation, bit widths, precision of vectorized floating point numbers? Rounding rules? Do we need a 21st century version of a "Father of floating point" for GPUs, as William Kahan did in 1979 when designing the i8087 and afterwards standardizing an universal floating point arithmetics (IEEE754) for scientific computers?

Forum - Cruncher's Corner: Generic CPU discussion - ARCHAE86's comment on IEEE754 / i8087

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861024624
RAC: 1443392

Scrooge McDuck wrote: Maybe

Scrooge McDuck wrote:

Maybe it's the hardware differences of Nvidia, AMD, Intel enforcing this? Internal number representation, bit widths, precision of vectorized floating point numbers? Rounding rules? Do we need a 21st century version of a "Father of floating point" for GPUs, as William Kahan did in 1979 when designing the i8087 and afterwards standardizing an universal floating point arithmetics (IEEE754) for scientific computers?

Forum - Cruncher's Corner: Generic CPU discussion - ARCHAE86's comment on IEEE754 / i8087 

An easy answer in the meantime would be to only validate each brand of gpu against itself, telling people that this could/WILL slow down validation a bit but in the end should produce more valid tasks for each individual user.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5020
Credit: 18920917688
RAC: 6508762

Has nothing to do with the

Has nothing to do with the precision likely, though the rounding issue has been brought up before.

Its the fact that different FFT libraries were used for each application. This Bernd's comment.

FGRP - HIGH INVALID RATE ON NVIDIA 4090?

Quote:

However, the different Apps use different libraries for the FFT:

 

* FGRP uses "clFFT", originally developed by AMD for their cards, now OpenSource on GitHub

 

* BRP CUDA (BRP7 Windows) uses cuFFT

 

* BRP OpenCL uses an own development based on an Apple OpenCL code example, which seems to be derived from an early cuFFT version

N.B.

The GPU Users Group also uses one of Petri's custom Linux BRP7 applications using CUDA12 so that app uses Nvidia's cuFFT library. So again a mismatch in FFT libraries against the stock BRP7 apps.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5020
Credit: 18920917688
RAC: 6508762

mikey wrote: An easy answer

mikey wrote:

An easy answer in the meantime would be to only validate each brand of gpu against itself, telling people that this could/WILL slow down validation a bit but in the end should produce more valid tasks for each individual user.

That involves the project admins and developers to run three different validator processes to segregate the results.  Then still have to compare those results against the other app card types.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.