Gravitational Wave search GPU App version

Submitted on 11 Apr 2014 9:12:32 UTC

Due to the excellent work of our French volunteer Christophe Choquet we finally have a working OpenCL version of the Gravitational Wave search ("S6CasA") application. Thank you Christophe!

This App version is currently considered 'Beta' and being tested on Einstein@Home. To participate in the Beta test, you need to edit your Einstein@Home preferences, and set "Run beta/test application versions?" to "yes".

It is currently available for Windows (32 Bit) and Linux (64 Bit) only, and you should have a card which supports double precision FP in hardware.

Comments

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

The shortname is shown in

21 Apr 2014 12:19:16 UTC

Message 121105 in response to message 121104

Quote

(moderation:

)

The shortname is shown in Boinc when a tasks starts. Open the event log and look for "starting task xx using [application shortname] version x.x" or something similar.

For the S6 application it's: einstein_S6CasA

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4345

Credit: 3208867971

RAC: 2003426

Thanks for the advise but

21 Apr 2014 12:45:15 UTC

Message 121106 in response to message 121105

Quote

(moderation:

)

Thanks for the advise but Boinc doesn't show the task short name. I'm using Boinc 7.2.42 for win 7 64 bit with BoincTasks 1.58. Here's what's shown on messages on BoincTasks:

7213	Einstein@Home	21.4.2014 11:33:31	Starting task h1_0808.15_S6Directed__S6CasAf40a_809.15Hz_70_1	
7214	Einstein@Home	21.4.2014 11:33:34	Started upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_0	
7215	Einstein@Home	21.4.2014 11:33:34	Started upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_1	
7216	Einstein@Home	21.4.2014 11:33:39	Finished upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_0	
7217	Einstein@Home	21.4.2014 11:33:39	Finished upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_1

[edit]Same thing with Boinc Manager.[/edit]

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Sorry about that, it seems

21 Apr 2014 12:53:57 UTC

Message 121107 in response to message 121106

Quote

(moderation:

)

Sorry about that, it seems that one needs to have the log flag set in cc_config.xml to get the full message.

With the log flag set I see:

21/04/2014 14:36:22 | Einstein@Home | [cpu_sched] Starting task h1_0926.30_S6Directed__S6CasAf40a_926.6Hz_1014_0 using einstein_S6CasA version 108 (GWopencl-nvidia-Beta) in slot 8

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4345

Credit: 3208867971

RAC: 2003426

RE: Sorry about that, it

21 Apr 2014 12:56:18 UTC

Message 121108 in response to message 121107

Quote

(moderation:

)

Quote:

Sorry about that, it seems that one needs to have the log flag set in cc_config.xml to get the full message.

With the log flag set I see:

21/04/2014 14:36:22 | Einstein@Home | [cpu_sched] Starting task h1_0926.30_S6Directed__S6CasAf40a_926.6Hz_1014_0 using einstein_S6CasA version 108 (GWopencl-nvidia-Beta) in slot 8

I'll give it a try.

[edit]Yep, it works like that! Thanks again.[/edit]

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250498964

RAC: 34611

RE: We plan to update the

22 Apr 2014 7:31:28 UTC

Message 121109 in response to message 121090

Quote

(moderation:

)

Quote:

We plan to update the server software so that the quota would be per app version, which would be especially important for beta test apps (we don't want to penalize beta-testers who are more likely to suffer from massive failures of tasks). I'm not sure tho how fast we can do this update.

FWIW we have been planning this update for months and possibly years.

For the time being I raised the quota for GPU hosts quite a bit (there is a "gpu_factor" that determines how many CPU cores a GPU equals when determining the total number of tasks per day. I raised it from 2 to 8).

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250498964

RAC: 34611

I added a check for double

22 Apr 2014 11:24:56 UTC

Message 121110

Quote

(moderation:

)

I added a check for double precision fp support. GPUs that don't support that shouldn't get any GWopencl tasks anymore.

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

Great approximation!

23 Apr 2014 19:45:20 UTC

Message 121111 in response to message 121109

Quote

(moderation:

)

Great approximation! Thanx.

8 is much closer to 10 - which is becomming a reality even in those intel integrated CPU-GPUs.
Ex.1: Core i5-3320M (22 nm) with iGPU HD4000: 3.412 GFLOPS per CPU core.
And it has 45 GFLOPS *PEAK* for the GPU!
Ex.2: AMD A8-3870K (I have it at home) CPU at stock 3.0 GHz: 2.7 GFLOPS per core.
GPU(in this APU) ... 480 GFLOPS PEAK ... OOOOOOOOOOPS! - ratio overflow!! This is slightly more than 10* times. :-)))))))))))))

*it is 155.55 times - when I take the GPU as 420 GFLOPS unit (in average in reality).

EDIT: Corrected the GFLOPS of CPU i5-3320M core from 4.1 (my guess) to reality: 3.412 - reported by BOINC when I attached that ntb to Asteroids at home project.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578180208

RAC: 201563

The BOINC benchmark

23 Apr 2014 21:59:56 UTC

Message 121112 in response to message 121111

Quote

(moderation:

)

The BOINC benchmark measures.. whatever. Higher numbers are better, but apart from that it's got nothing to do with actual performance in any project, since, well it's just that: an artificial benchmark.

If you want to compare peak performance (as quoted for GPUs) you should take into account how many single precision floating point operations the CPUs can do. That's 16 per core for your Ivy Bridge i3 and 8 for the A8, using AVX or SSE, respectively.

At base clock speed of 2.6 GHz this yields for the i3: 16 * 2.6e9 = 41.6 GFlops per core
And at 3.0 GHz for the A8: 8 * 3.0e9 = 24 GFlops per core

Note that HT wouldn't provide any benefit if the core would somehow sustain 16 ops/clock, i.e. maximum throughput of the i3 CPU cores is "just" 83.2 GFlops (96 GFlops for the A8).

And you can obviously argue that the CPU will never run at exactly peak performance.. but neither do GPUs. So what is a realistic estimate of real world performance then? Better call Saul.. ehm, ask the guys who're programming and profiling their apps.

MrS

Scanning for our furry friends since Jan 2002

Pollux_P3D

Joined: 8 Feb 11

Posts: 30

Credit: 212418648

RAC: 0

Google translate: Running the

24 Apr 2014 20:12:03 UTC

Message 121113

Quote

(moderation:

)

Google translate:
Running the GWopencl-ati-Beta tasks on the Ati 7750 ? No double precision. Have received quite a few!
http://einsteinathome.org/host/11142353/tasks&offset=0&show_names=0&state=0&appid=0

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 406300184

RAC: 370728

RE: The amount of memory

24 Apr 2014 20:16:43 UTC

Message 121114

Quote

(moderation:

)

Quote:

The amount of memory displayed won't matter (1023 is enough), but your driver 266.71 is almost ancient.. considering the pace of change in the GPU world. Upgrade to the current WHQL and it should work.

MrS

I've upgrated to the latest nvidia driver but still can't get WU:

2014-04-24 20:12:44.1327 [PID=18949] [version] Checking plan class 'GWopencl-nvidia-Beta'
2014-04-24 20:12:44.1328 [PID=18949] [version] parsed project prefs setting 'gpu_util_gw': 1.000000
2014-04-24 20:12:44.1328 [PID=18949] [version] OpenCL GPU RAM required min: 1073741824.000000, supplied: 1073283072

Pollux_P3D

Joined: 8 Feb 11

Posts: 30

Credit: 212418648

RAC: 0

Correction, the Ati 7750 has

24 Apr 2014 20:56:12 UTC

Message 121115

Quote

(moderation:

)

Correction, the Ati 7750 has double precision. Everything is OK

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117607759709

RAC: 35248164

RE: OpenCL GPU RAM required

24 Apr 2014 22:17:47 UTC

Message 121116 in response to message 121114

Quote

(moderation:

)

Quote:

OpenCL GPU RAM required min: 1073741824.000000, supplied: 1073283072

A 1GB 550Ti should be fine to pass that test. Having upgraded the driver, the next thing to try would be to upgrade BOINC to 7.2.42 (which you already have on one of your other hosts). If that still doesn't work then my next guess would be that it's something to do with Win XP.

You could always try upgrading to Linux :-). The Q8400 is still quite a reasonable CPU for powering a system. It runs very well under Linux :-).

Cheers,
Gary.

Jacob Klein

Joined: 22 Jun 11

Posts: 45

Credit: 114028547

RAC: 0

RE: RE: OpenCL GPU RAM

24 Apr 2014 23:21:53 UTC

Message 121117 in response to message 121116

Quote

(moderation:

)

Quote:

Quote:
OpenCL GPU RAM required min: 1073741824.000000, supplied: 1073283072

A 1GB 550Ti should be fine to pass that test. Having upgraded the driver, the next thing to try would be to upgrade BOINC to 7.2.42 (which you already have on one of your other hosts). If that still doesn't work then my next guess would be that it's something to do with Win XP.

You could always try upgrading to Linux :-). The Q8400 is still quite a reasonable CPU for powering a system. It runs very well under Linux :-).

The error message is pretty self explanatory, isn't it?

The app says it needs 1,073,741,824 bytes, which is EXACTLY 1 GB.
Your GPU reported having 1,073,283,072 bytes, which is UNDER 1 GB.
That's not enough.

I don't know how to fix it. You might consider contacting either your GPU vendor, or NVIDIA, to see if the value your GPU reports is correct or not.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117607759709

RAC: 35248164

RE: The error message is

25 Apr 2014 2:11:03 UTC

Message 121118 in response to message 121117

Quote

(moderation:

)

Quote:

The error message is pretty self explanatory, isn't it?

Of course.

Quote:

The app says it needs 1,073,741,824 bytes, which is EXACTLY 1 GB.

Again, of course!

Quote:

Your GPU reported ...

Are you certain? Or could it be perhaps, "Your version of BOINC miscalculated ..."
That's why I suggested, as a first step, to upgrade BOINC - just in case.

Quote:

I don't know how to fix it. You might consider contacting either your GPU vendor, or NVIDIA, to see if the value your GPU reports is correct or not.

Neither do I, but I rather suspect that it could be BOINC (or perhaps Win XP), rather than NVIDIA that is causing the problem. At least it's worth checking the easy stuff first.

Maybe the Devs will feel inclined to drop the limit ever so slightly so that all the 1GB cards that seem to erroneously report as 1023MB can still be used, whatever the real cause of the problem. I just had a look at some of my own cards. Some report as 1024 and some as 1023 - so go figure. I haven't tried to run any S6 CasA opencl-beta tasks yet. I'm planning to, but on a 2GB card that actually reports as 2048MB.

Cheers,
Gary.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 727312466

RAC: 1231765

RE: Maybe the Devs will

25 Apr 2014 8:58:52 UTC

Message 121119 in response to message 121118

Quote

(moderation:

)

Quote:

Maybe the Devs will feel inclined to drop the limit ever so slightly so that all the 1GB cards that seem to erroneously report as 1023MB can still be used, whatever the real cause of the problem.

Good idea, will do that.

Cheers
HB

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 406300184

RAC: 370728

RE: Maybe the Devs will

25 Apr 2014 13:44:11 UTC

Message 121120

Quote

(moderation:

)

Quote:

Maybe the Devs will feel inclined to drop the limit ever so slightly so that all the 1GB cards that seem to erroneously report as 1023MB can still be used, whatever the real cause of the problem.

Good idea, will do that

Thanks. It's working now. The first one as validated against a CPU

http://einsteinathome.org/workunit/188655451

sorcrosc

Joined: 3 May 13

Posts: 8

Credit: 16046006

RAC: 22

RE: Seems the v1.07

26 Apr 2014 11:33:25 UTC

Message 121121 in response to message 121095

Quote

(moderation:

)

Quote:

Seems the v1.07 (GWopencl-ati-Beta) I processed will all be marked as invalid
http://einsteinathome.org/host/7803636/tasks&offset=0&show_names=1&state=0&appid=24

Also with v1.08 they are all failing

straubertlajos31

Joined: 17 Oct 13

Posts: 2

Credit: 95031

RAC: 0

Like to set for BETA TEST HOW

26 Apr 2014 14:42:09 UTC

Message 121122

Quote

(moderation:

)

Like to set for BETA TEST
HOW TO DO ?
USING WIN 8.1 64BIT/google.com
PLEASE ADVISE

TEL 1-416-785-1775
LEWIS STRAUBERT
206 3174 BATHURST STREET
TORONTO ON M6A 3A7 CANADA.

LEWIS

Jacob Klein

Joined: 22 Jun 11

Posts: 45

Credit: 114028547

RAC: 0

Your post: RE: Like to

26 Apr 2014 15:47:19 UTC

Message 121124 in response to (parent removed)

Quote

(moderation:

)

Your post:

Quote:

Like to set for BETA TEST
HOW TO DO ?
USING WIN 8.1 64BIT/google.com
PLEASE ADVISE

LEWIS STRAUBERT

First post in this thread:

Quote:

Due to the excellent work of our French volunteer Christophe Choquet we finally have a working OpenCL version of the Gravitational Wave search ("S6CasA") application. Thank you Christophe!

This App version is currently considered 'Beta' and being tested on Einstein@Home. To participate in the Beta test, you need to edit your Einstein@Home preferences, and set "Run beta/test application versions?" to "yes".

It is currently available for Windows (32 Bit) and Linux (64 Bit) only, and you should have a card which supports double precision FP in hardware.

BM

Summary:
So... Go to that link, and set "Run beta/test application versions?" to "yes". Then in BOINC you can click "Update" on the "Einstein" Project, and then the server will include beta application versions in the possible tasks that it sends your GPU. For this new beta application, it is only available for Windows and Linux, and there is a restriction that the GPU must be compute capability 1.3 or higher, and there may be additional restrictions such as driver version checks.

If you were asking a different question, then please be more precise.

-- Jacob

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: Like to set for BETA

26 Apr 2014 19:24:25 UTC

Message 121125 in response to (parent removed)

Quote

(moderation:

)

Quote:

Like to set for BETA TEST
HOW TO DO ?
USING WIN 8.1 64BIT/google.com
PLEASE ADVISE

Your GPU is a Nvidia GeForce 210 with 512MB of video-RAM and compute capability 1.2.
This beta GPU app requires compute capability 1.3 and 1GB of video-RAM so that's a no go, sorry.

I see that you're running only GPU tasks on your machine, the problem is that they are all failing and the error message is:

Quote:

The window cannot act on the sent message.
(0x3ea) - exit code 1002 (0x3ea)

and

[13:41:40][3140][ERROR] Failed to enable CUDA thread yielding for device #0 (error: 999)! Sorry, will try to occupy one CPU core...
[13:41:41][3140][ERROR] Couldn't acquire CUDA context of device #0 (error: 999)!
[13:41:41][3140][ERROR] Demodulation failed (error: 1002)!

Not sure of exactly what's wrong, it could be that your running out of video-RAM but I suspect that would throw another error.
First of all try to reboot the machine and if that doesn't help you could try to reinstall the graphics driver, do a clean install by choosing advanced in the installation guide and select clean install. If that doesn't work I'll have to defer this to someone more knowledgeable.

Chris

Joined: 9 Apr 12

Posts: 61

Credit: 45056670

RAC: 0

Was this supposed to go out

26 Apr 2014 20:56:09 UTC

Message 121126

Quote

(moderation:

)

Was this supposed to go out to 64 bit Windows?

I had to suspend all CPU tasks to get something resembling a running system. Still pretty unsteady running with only one GPU task and no CPU. I was running only 2 CPU tasks though after some poor performance with the BRP4 and 5.

GTX 650 and a Phenom II X4 820. Not the newest, but this is just grinding to a stop.

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

I have been doing some

26 Apr 2014 21:01:40 UTC

Message 121127

Quote

(moderation:

)

I have been doing some testing and have found that the new GW Search GPU application does not benefit from extra PCI-E bandwidth like what is seen with BRP4 and BRP5 GPU applications. I have a system currently configured with slot one set at x16 2.0 and slot two set at x16 3.0. The runtime per task is approximately the same with tasks completed via both GPUs.

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

RE: FWIW we have been

26 Apr 2014 21:04:17 UTC

Message 121128 in response to message 121109

Quote

(moderation:

)

Quote:

FWIW we have been planning this update for months and possibly years.

For the time being I raised the quota for GPU hosts quite a bit (there is a "gpu_factor" that determines how many CPU cores a GPU equals when determining the total number of tasks per day. I raised it from 2 to 8).

BM

Thank you for making this change. My system has not run into any new quota limits since the change was made.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117607759709

RAC: 35248164

RE: Was this supposed to go

27 Apr 2014 3:27:51 UTC

Message 121129 in response to message 121126

Quote

(moderation:

)

Quote:

Was this supposed to go out to 64 bit Windows?

The opening post said 32 bit Windows and 64 bit Linux. As far as I know, that hasn't changed. I presume the Windows app is 32 bit so I imagine it could run on both 32 and 64 bit Windows. It's listed for both on the applications page.

Quote:

I had to suspend all CPU tasks to get something resembling a running system. Still pretty unsteady running with only one GPU task and no CPU. I was running only 2 CPU tasks though after some poor performance with the BRP4 and 5.

I've just set up a host with a Q8400 CPU and a HD7850 (2GB) GPU running 64 bit Linux. The first opencl-beta task started running (0.5 CPU + 1 GPU) but extremely slowly - ~8% after 28 mins. There were 4 CPU tasks running. I changed the CPU utilization to 75%, 1 CPU task paused and the GPU task took off. It raced to completion in a further 14 mins. The very next task then took just 17.5 mins to complete. I then set up an app_config.xml file to process two concurrent GPU tasks, with each one reserving a CPU core. So with 2 GPU tasks and 2 CPU tasks actually running, the GPU tasks were now taking around 23 minutes, ie 11.5 mins each, a very worthwhile improvement. It seems the key (at least for AMD GPUs) is to have 1 CPU core available for each running GPU task.

Quote:

GTX 650 and a Phenom II X4 820. Not the newest, but this is just grinding to a stop.

My Q8400 is probably about the same vintage so I don't think your CPU should be a problem. I took a look at your tasks list. You don't seem to have any yet that have been completed and returned. Are any still being processed? Are you still not running any CPU tasks? I can't think of a reason why the GPU task shouldn't be running OK under those conditions. Perhaps the driver?

Cheers,
Gary.

Chris

Joined: 9 Apr 12

Posts: 61

Credit: 45056670

RAC: 0

Task #1 took 54 minutes to

27 Apr 2014 4:08:01 UTC

Message 121130

Quote

(moderation:

)

Task #1 took 54 minutes to run, with no CPU tasks running. Hasn't uploaded yet.

I suspended the GW tasks, I'll try and burn them off overnight when I'm not actually doing anything.

I usually run 2 CPU tasks (CPDN when they have work) and 2 GPU tasks. For whatever reason, sometime last year I needed to leave 2 open cores to keep the GPU fed and the desktop environment usable. So it could be me, but I'm not sure why.

My GPU-Z memory controller load had a load, and then was showing a drop to 0, then a plateau of load. Not sure what that could mean either (except it being slower).

kiska

Joined: 31 Mar 12

Posts: 9

Credit: 108916550

RAC: 0

Well it seems my GT 525M is

27 Apr 2014 9:47:01 UTC

Message 121131

Quote

(moderation:

)

Well it seems my GT 525M is about 4x faster than my core i5 2450M, but then I have to use 23x multiplier for my cpu otherwise it would overheat. So it probably is 3.5x faster than stock core i5 2450M.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

RE: Was this supposed to go

27 Apr 2014 10:09:02 UTC

Message 121132 in response to message 121126

Quote

(moderation:

)

Quote:

Was this supposed to go out to 64 bit Windows?

I have been running it on Win7 64-bit for the last six days.

Quote:

I had to suspend all CPU tasks to get something resembling a running system. Still pretty unsteady running with only one GPU task and no CPU. I was running only 2 CPU tasks though after some poor performance with the BRP4 and 5.

GTX 650 and a Phenom II X4 820. Not the newest, but this is just grinding to a stop.

A work unit takes 38 to 40 minutes on my GTX 650 Ti, supported by one virtual core of an i7-3770. The other seven cores are on WCG/CEP2 (Win7 64-bit).

On another machine running WinXP, a work unit takes 28 to 30 minutes on a GTX 660, and 26 to 29 minutes on a GTX 750 Ti. Each GPU is supported by a single core of an E8400 Core2 Duo.

But note that these are dedicated PCs, and I don't try to do other work on them, so the desktop might be sluggish in that case, but I haven't checked.

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

Nice! :) Does it have FP64 at

4 May 2014 11:49:29 UTC

Message 121133 in response to message 121131

Quote

(moderation:

)

Nice! :)
Does it have FP64 at 1/4th of those SIMD units ... or how many?
I mean ... if FP64 is 3 times slower than 32-bit flop ... 4 * 3 = 12 times less 64FLOPS than 32-bit FLOPS. But thats GOOD - my mobile GPU (AMD HD5870M 1GB GDDR5) has only ZERO cores for FP64 :-))))))))))

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

I have bought a FP64 desktop

4 May 2014 13:16:06 UTC

Message 121134

Quote

(moderation:

)

I have bought a FP64 desktop GPU (to compile and test Asteroids@home CUDA app - which I need to fully undestand to re-write it into OpenCL :-) )! It is cheap NVidia with CUDA Capability 3.5 ... : ASUS GT640-1GD5-L :-)))
I will let You know how Einstein test/BETA app works on it - as soon as I will have it at home! :-)

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578180208

RAC: 201563

@Overtonesinger: the GT520

7 May 2014 11:57:18 UTC

Message 121135

Quote

(moderation:

)

@Overtonesinger:

the GT520 you're referring to is a mainstream Fermi (like almost all 400 and 500 series cards) and hence performs FP64 at 1/12th the FP32 rate. However, that's fine for the current Einstein app as most instructions use FP32.

Your GT640 is a mainstream Kepler (like almost all 600 and 700 series cards) and hence performs FP64 at 1/24th the FP32 rate. However, GT640 is severely limited by memory bandwidth anyway, which will be even more pronounced in the new Einstein app (which requires far more bandwidth than "normal" programs). So it should work, but not as well as other cards relative to what the chip could do (with faster memory, but then it's called a GTX650).

MrS

Scanning for our furry friends since Jan 2002

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

Thanx I see it is really slow

9 May 2014 17:36:14 UTC

Message 121136 in response to message 121135

Quote

(moderation:

)

Thanx
I see it is really slow in FP64... takes forever to finish this new Einstein GPU-BETA app (it might be one 24th of FP32, as You said) :-) BUT at least, (unlike my extreme 1.12 TFLOPs GPU in gaming notebook, which has ZERO FP64 support) I *can* test my FP64 apps on it!!! :D

OK, BOINC tells me it has 803 GFLOPs PEAK (I suppose it estimates the FP32 speed) :D That means... it has only: 33.46 GFLOPS-64-bit!
LOL! How a nice buy. :-)))

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

I ran the new Beta

11 May 2014 19:28:59 UTC

Message 121137

Quote

(moderation:

)

I ran the new Beta application on a variety of different graphics cards in Windows 7 64-bit to see how each performs with this application. Below are the results.

Card, Runtime (s)

GTX 580, 702-707
GTX 680, 1053-1060
GTX 780 Ti, 692-693
7970, 442-501 (463 Avg)

The GTX 580 does have higher FP64 rating than the GTX 680 which may account for the difference but there may be another reason as well. The GTX 780 Ti has a slightly higher FP64 rating than the GTX 580. It would be interesting to see how the Titan Black or Tesla K40 handles this application with FP64 mode enabled.

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

This was actually a Haswell

18 May 2014 9:51:28 UTC

Message 121138 in response to message 121112

Quote

(moderation:

)

This was actually a Haswell i5-3320M,
but never min. :)

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

Oh, really? You said

18 May 2014 10:14:54 UTC

Message 121139 in response to message 121135

Quote

(moderation:

)

Oh, really? You said "severely limited by memory"?
How is that possible, that running the Einstein@home CaS6-GWopenCL-BETA peaks the memory controller load typically at 76 percent with long-term average load os 33% ? :)

This is the GDDR5 version of *new* 28nm desktop GT640! ... Thus is has 40.1 GB/s memory throughput, not 13 GB/s like the ddr3 version. :-)

Here are the screenshots of a running Cas6-GWopenCL-BETA app - with CPU-Z and GPU-Z. As a base for the GPU, I use the A8-3870K at 3028 MHz + (2133) dual DDR3 RAM OC-running at 2261.8 MHz:

https://dl.dropboxusercontent.com/u/69462289/scr/CaS6-GT640_II.png

https://dl.dropboxusercontent.com/u/69462289/scr/CaS6-GT640.png

GW-openCL-BETA WU completes in around: 81 minutes (1 hour 21 min).

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578180208

RAC: 201563

It's good that you got the

20 May 2014 20:31:18 UTC

Message 121140 in response to message 121139

Quote

(moderation:

)

It's good that you got the better card. The old one with 900 MHz DDR3 at 128 bit bus achieves 28 GB/s, though. When going to GDDR5 at 2500 MHz nVidia dropped the width to 64 bit, hence it's still "only" got 40 GB/s. But 1/3 more bandwidth is certainly welcome! GDDR5 would also overclockl better ;)

Regarding the limits: I'm not a developer, but have followed the posts regarding this app. Obviously there are 2 phases: one with low memory controller load and almost 100% GPU load, whereas the other one with ~80% memory controller load can "only" achieve ~80% GPU load. In these phases more memory throughput should help proportionally.

MrS

Scanning for our furry friends since Jan 2002

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

Thanx! I am now going to

30 May 2014 15:31:18 UTC

Message 121141 in response to message 121140

Quote

(moderation:

)

Thanx!
I am now going to OVERCLOCK the GT640, testing stability with HEAVEN Benchmark 4.0 . Memory first!, as I always do.
When I find the highest stable clock for MEM, I will start to experiment with the GPU core clock to find the highest stable clock. I suppose that with 47 C (MAX), it has pretty good potential for overclocking! :-)

I only hope it will be also stable in FP64 - when stable in FP32 and DX11 benchmarks.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1886

Credit: 1408864554

RAC: 1160806

You will have no problem with

31 May 2014 0:33:30 UTC

Message 121142 in response to message 121141

Quote

(moderation:

)

You will have no problem with a card temp. that low on the GT640 since the max is 102C

My 550Ti's,650Ti's,and 660Ti's are all OC'd or "superclocked" and have for a couple years with the temp average of 60C and they never have any problems.

I check the GPU-Z when I load more tasks and use EVGAÂ® PRECISION X 4.2.1

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

OK, MAGIC. You are right. I

1 Jun 2014 16:33:41 UTC

Message 121143 in response to message 121142

Quote

(moderation:

)

OK, MAGIC.
You are right. I dont have any problems with temperature here on GT640 when overclocking it.
I have used ASUS GPU-Tweak and I raised the MEM (GDDR5) as most ppl recommended, from 1250 to 1500 (from 5k to 6000 effectively). When stable in UniGiNE Heaven 4.0 and temp still max 62 C, I raised the GPU core clock from 1046 to 1136.6 MHz. Stable! I cannot OC much further without finding a way to increase the Voltage. Any tips? :-) ... OMG, do I have to modify VBIOS to do that??? I dont like this (But I have done so in my mobile GPU ATi HD5870M)...
In Heaven-BenchM. it has max 64 C now... and with Einstein@home GW-openCL-nvidia-BETA app, it has only max. 54 C.

Now, first WU has been validated, so it seems stable also in FP64 calculations. :-) ... only 4.1 % speedup.
EDIT: Sorry, from the latest result it seems more like: 7.8 % average speedup!

Average GPU load increased from 77 to 80 percent!
Cool! ... Still, this app needs some optimizations to get closer to 98 average usage (and without need of OC GDDR5 MEM!), I think. Maybe by overlaping some HOST-to-device memory transfers with some computation.

http://einsteinathome.org/workunit/191325825

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

It always does (except those

1 Jun 2014 16:40:09 UTC

Message 121144 in response to message 121115

Quote

(moderation:

)

It always does (except those integrated in APUs).
And it's mobile version always doesn't.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

I can no longer EDIT my post

1 Jun 2014 17:42:02 UTC

Message 121145

Quote

(moderation:

)

I can no longer EDIT my post above...: About GT640 OC.
Mem (64-bit GDDR5 1250 MHz) properties in CAS6-GW-openCL-nvidia BETA app:

1. Original speed: 5000 MHz effectively (40 GB/s)... load AVG: 33 ; MAX: 78
2. OC to 1500 MHz: 6000 MHz effectively (48 GB/s)... load AVG: 31 ; MAX: 72
That's all, folks.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: I have used ASUS

1 Jun 2014 18:44:14 UTC

Message 121146 in response to message 121143

Quote

(moderation:

)

Quote:

I have used ASUS GPU-Tweak and I raised the MEM (GDDR5)...I cannot OC much further without finding a way to increase the Voltage. Any tips? :-)

you could try using MSI Afterburner or Riva Tuner instead of ASUS GPU Tweak. i know Afterburner has a voltage adjustment, but i'm not sure about Riva Tuner. there are other GPU utilities out there as well...i just don't know what they are off the top of my head.

Joined: 11 Feb 05

Posts: 178

Credit: 21041858

RAC: 0

RE: RE: I have used ASUS

1 Jun 2014 23:13:56 UTC

Message 121147 in response to message 121146

Quote

(moderation:

)

Quote:

Quote:
I have used ASUS GPU-Tweak and I raised the MEM (GDDR5)...I cannot OC much further without finding a way to increase the Voltage. Any tips? :-)

you could try using MSI Afterburner or Riva Tuner instead of ASUS GPU Tweak. i know Afterburner has a voltage adjustment, but i'm not sure about Riva Tuner. there are other GPU utilities out there as well...i just don't know what they are off the top of my head.

Correct about MSI Afterburner voltage adjuster, but this is fro some cards not working. I have experience with that. For the GTX660 for instance. While using EVGA Precision X, voltage can be adjusted (separate pop-up window). Works great, but also depends on the card, for some more regulators can be altered, for some less. You have to try it.

Greetings from
TJ

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

RE: I can no longer EDIT my

4 Jun 2014 12:57:01 UTC

Message 121148 in response to message 121145

Quote

(moderation:

)

Quote:

I can no longer EDIT my post above...: About GT640 OC.
Mem (64-bit GDDR5 1250 MHz) properties in CAS6-GW-openCL-nvidia BETA app:

1. Original speed: 5000 MHz effectively (40 GB/s)... load AVG: 33 ; MAX: 78
2. OC to 1500 MHz: 6000 MHz effectively (48 GB/s)... load AVG: 31 ; MAX: 72
That's all, folks.

Now my GT640 is OverClocked from 1046 to 1162.6 MHz (core) and 1499 MEM.
It completes WU typically in 1 hour 10 minutes 30 seconds! :-)))

Still validates those GW-openCL WUs... and has max. 65 degrees Celsia.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

I have added ultra-quiet

10 Jun 2014 11:54:44 UTC

Message 121149 in response to message 121148

Quote

(moderation:

)

I have added ultra-quiet 135mm fan Zalman ZM-F4 to the *BIG* passive Zalman-cube cooler on CPU, so the CPU now NEVER underclocks from 3000 to 900 MHz ... no surprise that Asteroids at home WU completes in 3 hour 10 minutes instead of 4:20 .......

SURPRISE, however, is: Einstein at home WU:
GW-OpenCL-beta now completes in 52 minutes instead of 1 hour 10 minutes!!!

I am getting close to performance of the GTX 750 Ti ! LOL! :)))

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

agony

Joined: 2 Jul 07

Posts: 7

Credit: 1397467

RAC: 0

my ati 7870 pokes such a

10 Jun 2014 13:49:59 UTC

Message 121150

Quote

(moderation:

)

my ati 7870 pokes such a workunit in about 13 minutes and so far i just had one bad one among them.

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 406300184

RAC: 370728

My old 550 Ti does a WU on

10 Jun 2014 14:16:37 UTC

Message 121151

Quote

(moderation:

)

My old 550 Ti does a WU on 39min

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

since we're on the topic of

10 Jun 2014 14:28:55 UTC

Message 121152

Quote

(moderation:

)

since we're on the topic of run times, my HD 7970 crunches GW tasks 3 at a time in 1150s on average...so that's an average of ~383s/task (or 6 min 23 sec per task). i've crunched thousands of these tasks by now without a single error or invalid. i'm at stock clocks (900MHz/1375MHz).

Overtonesinger

Joined: 23 Jan 06

Posts: 21

Credit: 92797680

RAC: 116043

yy, COOL! AMD 7970 is very

12 Jun 2014 20:58:25 UTC

Message 121153 in response to message 121152

Quote

(moderation:

)

yy, COOL! AMD 7970 is very powerful, I know it. I always wanted this GPU, but didn't have any desktop to plug it in (no money left for desktop).
I love AMD, cause they don't limit the number of FP64-capable stream processors (EXCEPT ommitting them completely in mobile GPUs and in all APUs) :)))
But now I need to practice CUDA + OpenCL development - on one cheapest FP64-bit-capable GPU.

btw: further stable OC: 1176 MHz core. Still validates. :O

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: I love AMD, cause they

13 Jun 2014 0:48:15 UTC

Message 121154 in response to message 121153

Quote

(moderation:

)

Quote:

I love AMD, cause they don't limit the number of FP64-capable stream processors (EXCEPT omitting them completely in mobile GPUs and in all APUs) :)))

careful now...with the current generation of AMD GPUs, the 290X and the 290 have their FP64 performance limited to 1/8 of their FP32 performance, while the 280X has its FP64 performance limited to 1/4 of its FP32 performance. so AMD's current gen mid-range GPU actually outperforms its big brothers in FP64 performance. notice that all the top hosts running Einstein@Home have R9 280X's or 7970's (the former is actually just the latter rebadged as a current gen part)...there isn't a single R9 290X or 290 in the top 20.

Grutte Pier [Wa...

Joined: 18 May 09

Posts: 39

Credit: 6098013

RAC: 0

i'M PROCESSING THESE WU's now

13 Jul 2014 17:53:03 UTC

Message 121155

Quote

(moderation:

)

i'M PROCESSING THESE WU's now on a hd 7870

The first few didn't finish.

the wu is have now is very very slow in the last 1%.
There is no GPU-load and Cpu-load is also not there.

It's creeping up very slow in the last hours, with 0.012% to go.

Is this common, some solution?