Gravitational Wave search GPU App version

Due to the excellent work of our French volunteer Christophe Choquet we finally have a working OpenCL version of the Gravitational Wave search ("S6CasA") application. Thank you Christophe!

This App version is currently considered 'Beta' and being tested on Einstein@Home. To participate in the Beta test, you need to edit your Einstein@Home preferences, and set "Run beta/test application versions?" to "yes".

It is currently available for Windows (32 Bit) and Linux (64 Bit) only, and you should have a card which supports double precision FP in hardware.

BM

Comments

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

The shortname is shown in

The shortname is shown in Boinc when a tasks starts. Open the event log and look for "starting task xx using [application shortname] version x.x" or something similar.

For the S6 application it's: einstein_S6CasA

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4345
Credit: 3208867971
RAC: 2003426

Thanks for the advise but

Thanks for the advise but Boinc doesn't show the task short name. I'm using Boinc 7.2.42 for win 7 64 bit with BoincTasks 1.58. Here's what's shown on messages on BoincTasks:

7213	Einstein@Home	21.4.2014 11:33:31	Starting task h1_0808.15_S6Directed__S6CasAf40a_809.15Hz_70_1	
7214	Einstein@Home	21.4.2014 11:33:34	Started upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_0	
7215	Einstein@Home	21.4.2014 11:33:34	Started upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_1	
7216	Einstein@Home	21.4.2014 11:33:39	Finished upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_0	
7217	Einstein@Home	21.4.2014 11:33:39	Finished upload of h1_0808.00_S6Directed__S6CasAf40a_809.1Hz_5_1_1	

[edit]Same thing with Boinc Manager.[/edit]

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Sorry about that, it seems

Sorry about that, it seems that one needs to have the log flag set in cc_config.xml to get the full message.

With the log flag set I see:

21/04/2014 14:36:22 | Einstein@Home | [cpu_sched] Starting task h1_0926.30_S6Directed__S6CasAf40a_926.6Hz_1014_0 using einstein_S6CasA version 108 (GWopencl-nvidia-Beta) in slot 8

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4345
Credit: 3208867971
RAC: 2003426

RE: Sorry about that, it

Quote:

Sorry about that, it seems that one needs to have the log flag set in cc_config.xml to get the full message.

With the log flag set I see:

21/04/2014 14:36:22 | Einstein@Home | [cpu_sched] Starting task h1_0926.30_S6Directed__S6CasAf40a_926.6Hz_1014_0 using einstein_S6CasA version 108 (GWopencl-nvidia-Beta) in slot 8

I'll give it a try.

[edit]Yep, it works like that! Thanks again.[/edit]

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250498964
RAC: 34611

RE: We plan to update the

Quote:
We plan to update the server software so that the quota would be per app version, which would be especially important for beta test apps (we don't want to penalize beta-testers who are more likely to suffer from massive failures of tasks). I'm not sure tho how fast we can do this update.

FWIW we have been planning this update for months and possibly years.

For the time being I raised the quota for GPU hosts quite a bit (there is a "gpu_factor" that determines how many CPU cores a GPU equals when determining the total number of tasks per day. I raised it from 2 to 8).

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250498964
RAC: 34611

I added a check for double

I added a check for double precision fp support. GPUs that don't support that shouldn't get any GWopencl tasks anymore.

BM

BM

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

Great approximation!

Great approximation! Thanx.

8 is much closer to 10 - which is becomming a reality even in those intel integrated CPU-GPUs.
Ex.1: Core i5-3320M (22 nm) with iGPU HD4000: 3.412 GFLOPS per CPU core.
And it has 45 GFLOPS *PEAK* for the GPU!
Ex.2: AMD A8-3870K (I have it at home) CPU at stock 3.0 GHz: 2.7 GFLOPS per core.
GPU(in this APU) ... 480 GFLOPS PEAK ... OOOOOOOOOOPS! - ratio overflow!! This is slightly more than 10* times. :-)))))))))))))

*it is 155.55 times - when I take the GPU as 420 GFLOPS unit (in average in reality).

EDIT: Corrected the GFLOPS of CPU i5-3320M core from 4.1 (my guess) to reality: 3.412 - reported by BOINC when I attached that ntb to Asteroids at home project.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578180208
RAC: 201563

The BOINC benchmark

The BOINC benchmark measures.. whatever. Higher numbers are better, but apart from that it's got nothing to do with actual performance in any project, since, well it's just that: an artificial benchmark.

If you want to compare peak performance (as quoted for GPUs) you should take into account how many single precision floating point operations the CPUs can do. That's 16 per core for your Ivy Bridge i3 and 8 for the A8, using AVX or SSE, respectively.

At base clock speed of 2.6 GHz this yields for the i3: 16 * 2.6e9 = 41.6 GFlops per core
And at 3.0 GHz for the A8: 8 * 3.0e9 = 24 GFlops per core

Note that HT wouldn't provide any benefit if the core would somehow sustain 16 ops/clock, i.e. maximum throughput of the i3 CPU cores is "just" 83.2 GFlops (96 GFlops for the A8).

And you can obviously argue that the CPU will never run at exactly peak performance.. but neither do GPUs. So what is a realistic estimate of real world performance then? Better call Saul.. ehm, ask the guys who're programming and profiling their apps.

MrS

Scanning for our furry friends since Jan 2002

Pollux_P3D
Pollux_P3D
Joined: 8 Feb 11
Posts: 30
Credit: 212418648
RAC: 0

Google translate: Running the

Google translate:
Running the GWopencl-ati-Beta tasks on the Ati 7750 ? No double precision. Have received quite a few!
http://einsteinathome.org/host/11142353/tasks&offset=0&show_names=0&state=0&appid=0

Filipe
Filipe
Joined: 10 Mar 05
Posts: 186
Credit: 406300184
RAC: 370728

RE: The amount of memory

Quote:

The amount of memory displayed won't matter (1023 is enough), but your driver 266.71 is almost ancient.. considering the pace of change in the GPU world. Upgrade to the current WHQL and it should work.

MrS

I've upgrated to the latest nvidia driver but still can't get WU:

2014-04-24 20:12:44.1327 [PID=18949] [version] Checking plan class 'GWopencl-nvidia-Beta'
2014-04-24 20:12:44.1328 [PID=18949] [version] parsed project prefs setting 'gpu_util_gw': 1.000000
2014-04-24 20:12:44.1328 [PID=18949] [version] OpenCL GPU RAM required min: 1073741824.000000, supplied: 1073283072

Pollux_P3D
Pollux_P3D
Joined: 8 Feb 11
Posts: 30
Credit: 212418648
RAC: 0

Correction, the Ati 7750 has

Correction, the Ati 7750 has double precision. Everything is OK

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117607759709
RAC: 35248164

RE: OpenCL GPU RAM required

Quote:
OpenCL GPU RAM required min: 1073741824.000000, supplied: 1073283072


A 1GB 550Ti should be fine to pass that test. Having upgraded the driver, the next thing to try would be to upgrade BOINC to 7.2.42 (which you already have on one of your other hosts). If that still doesn't work then my next guess would be that it's something to do with Win XP.

You could always try upgrading to Linux :-). The Q8400 is still quite a reasonable CPU for powering a system. It runs very well under Linux :-).

Cheers,
Gary.

Jacob Klein
Jacob Klein
Joined: 22 Jun 11
Posts: 45
Credit: 114028547
RAC: 0

RE: RE: OpenCL GPU RAM

Quote:
Quote:
OpenCL GPU RAM required min: 1073741824.000000, supplied: 1073283072

A 1GB 550Ti should be fine to pass that test. Having upgraded the driver, the next thing to try would be to upgrade BOINC to 7.2.42 (which you already have on one of your other hosts). If that still doesn't work then my next guess would be that it's something to do with Win XP.

You could always try upgrading to Linux :-). The Q8400 is still quite a reasonable CPU for powering a system. It runs very well under Linux :-).

The error message is pretty self explanatory, isn't it?

The app says it needs 1,073,741,824 bytes, which is EXACTLY 1 GB.
Your GPU reported having 1,073,283,072 bytes, which is UNDER 1 GB.
That's not enough.

I don't know how to fix it. You might consider contacting either your GPU vendor, or NVIDIA, to see if the value your GPU reports is correct or not.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117607759709
RAC: 35248164

RE: The error message is

Quote:
The error message is pretty self explanatory, isn't it?


Of course.

Quote:
The app says it needs 1,073,741,824 bytes, which is EXACTLY 1 GB.


Again, of course!

Quote:
Your GPU reported ...


Are you certain? Or could it be perhaps, "Your version of BOINC miscalculated ..."
That's why I suggested, as a first step, to upgrade BOINC - just in case.

Quote:
I don't know how to fix it. You might consider contacting either your GPU vendor, or NVIDIA, to see if the value your GPU reports is correct or not.


Neither do I, but I rather suspect that it could be BOINC (or perhaps Win XP), rather than NVIDIA that is causing the problem. At least it's worth checking the easy stuff first.

Maybe the Devs will feel inclined to drop the limit ever so slightly so that all the 1GB cards that seem to erroneously report as 1023MB can still be used, whatever the real cause of the problem. I just had a look at some of my own cards. Some report as 1024 and some as 1023 - so go figure. I haven't tried to run any S6 CasA opencl-beta tasks yet. I'm planning to, but on a 2GB card that actually reports as 2048MB.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 727312466
RAC: 1231765

RE: Maybe the Devs will

Quote:

Maybe the Devs will feel inclined to drop the limit ever so slightly so that all the 1GB cards that seem to erroneously report as 1023MB can still be used, whatever the real cause of the problem.

Good idea, will do that.

Cheers
HB

Filipe
Filipe
Joined: 10 Mar 05
Posts: 186
Credit: 406300184
RAC: 370728

RE: Maybe the Devs will

Quote:

Maybe the Devs will feel inclined to drop the limit ever so slightly so that all the 1GB cards that seem to erroneously report as 1023MB can still be used, whatever the real cause of the problem.

Good idea, will do that

Thanks. It's working now. The first one as validated against a CPU

http://einsteinathome.org/workunit/188655451

sorcrosc
sorcrosc
Joined: 3 May 13
Posts: 8
Credit: 16046006
RAC: 22

RE: Seems the v1.07

Quote:
Seems the v1.07 (GWopencl-ati-Beta) I processed will all be marked as invalid
http://einsteinathome.org/host/7803636/tasks&offset=0&show_names=1&state=0&appid=24

Also with v1.08 they are all failing

straubertlajos31
straubertlajos31
Joined: 17 Oct 13
Posts: 2
Credit: 95031
RAC: 0

Like to set for BETA TEST HOW

Like to set for BETA TEST
HOW TO DO ?
USING WIN 8.1 64BIT/google.com
PLEASE ADVISE

TEL 1-416-785-1775
LEWIS STRAUBERT
206 3174 BATHURST STREET
TORONTO ON M6A 3A7 CANADA.

LEWIS

Jacob Klein
Jacob Klein
Joined: 22 Jun 11
Posts: 45
Credit: 114028547
RAC: 0

Your post: RE: Like to

Message 121124 in response to (parent removed)

Your post:

Quote:

Like to set for BETA TEST
HOW TO DO ?
USING WIN 8.1 64BIT/google.com
PLEASE ADVISE

LEWIS STRAUBERT

First post in this thread:

Quote:

Due to the excellent work of our French volunteer Christophe Choquet we finally have a working OpenCL version of the Gravitational Wave search ("S6CasA") application. Thank you Christophe!

This App version is currently considered 'Beta' and being tested on Einstein@Home. To participate in the Beta test, you need to edit your Einstein@Home preferences, and set "Run beta/test application versions?" to "yes".

It is currently available for Windows (32 Bit) and Linux (64 Bit) only, and you should have a card which supports double precision FP in hardware.

BM

Summary:
So... Go to that link, and set "Run beta/test application versions?" to "yes". Then in BOINC you can click "Update" on the "Einstein" Project, and then the server will include beta application versions in the possible tasks that it sends your GPU. For this new beta application, it is only available for Windows and Linux, and there is a restriction that the GPU must be compute capability 1.3 or higher, and there may be additional restrictions such as driver version checks.

If you were asking a different question, then please be more precise.

-- Jacob

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: Like to set for BETA

Message 121125 in response to (parent removed)

Quote:
Like to set for BETA TEST
HOW TO DO ?
USING WIN 8.1 64BIT/google.com
PLEASE ADVISE


Your GPU is a Nvidia GeForce 210 with 512MB of video-RAM and compute capability 1.2.
This beta GPU app requires compute capability 1.3 and 1GB of video-RAM so that's a no go, sorry.

I see that you're running only GPU tasks on your machine, the problem is that they are all failing and the error message is:

Quote:

The window cannot act on the sent message.
(0x3ea) - exit code 1002 (0x3ea)

and

[13:41:40][3140][ERROR] Failed to enable CUDA thread yielding for device #0 (error: 999)! Sorry, will try to occupy one CPU core...
[13:41:41][3140][ERROR] Couldn't acquire CUDA context of device #0 (error: 999)!
[13:41:41][3140][ERROR] Demodulation failed (error: 1002)!


Not sure of exactly what's wrong, it could be that your running out of video-RAM but I suspect that would throw another error.
First of all try to reboot the machine and if that doesn't help you could try to reinstall the graphics driver, do a clean install by choosing advanced in the installation guide and select clean install. If that doesn't work I'll have to defer this to someone more knowledgeable.

Chris
Chris
Joined: 9 Apr 12
Posts: 61
Credit: 45056670
RAC: 0

Was this supposed to go out

Was this supposed to go out to 64 bit Windows?

I had to suspend all CPU tasks to get something resembling a running system. Still pretty unsteady running with only one GPU task and no CPU. I was running only 2 CPU tasks though after some poor performance with the BRP4 and 5.

GTX 650 and a Phenom II X4 820. Not the newest, but this is just grinding to a stop.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

I have been doing some

I have been doing some testing and have found that the new GW Search GPU application does not benefit from extra PCI-E bandwidth like what is seen with BRP4 and BRP5 GPU applications. I have a system currently configured with slot one set at x16 2.0 and slot two set at x16 3.0. The runtime per task is approximately the same with tasks completed via both GPUs.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: FWIW we have been

Quote:


FWIW we have been planning this update for months and possibly years.

For the time being I raised the quota for GPU hosts quite a bit (there is a "gpu_factor" that determines how many CPU cores a GPU equals when determining the total number of tasks per day. I raised it from 2 to 8).

BM

Thank you for making this change. My system has not run into any new quota limits since the change was made.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117607759709
RAC: 35248164

RE: Was this supposed to go

Quote:
Was this supposed to go out to 64 bit Windows?


The opening post said 32 bit Windows and 64 bit Linux. As far as I know, that hasn't changed. I presume the Windows app is 32 bit so I imagine it could run on both 32 and 64 bit Windows. It's listed for both on the applications page.

Quote:
I had to suspend all CPU tasks to get something resembling a running system. Still pretty unsteady running with only one GPU task and no CPU. I was running only 2 CPU tasks though after some poor performance with the BRP4 and 5.


I've just set up a host with a Q8400 CPU and a HD7850 (2GB) GPU running 64 bit Linux. The first opencl-beta task started running (0.5 CPU + 1 GPU) but extremely slowly - ~8% after 28 mins. There were 4 CPU tasks running. I changed the CPU utilization to 75%, 1 CPU task paused and the GPU task took off. It raced to completion in a further 14 mins. The very next task then took just 17.5 mins to complete. I then set up an app_config.xml file to process two concurrent GPU tasks, with each one reserving a CPU core. So with 2 GPU tasks and 2 CPU tasks actually running, the GPU tasks were now taking around 23 minutes, ie 11.5 mins each, a very worthwhile improvement. It seems the key (at least for AMD GPUs) is to have 1 CPU core available for each running GPU task.

Quote:
GTX 650 and a Phenom II X4 820. Not the newest, but this is just grinding to a stop.


My Q8400 is probably about the same vintage so I don't think your CPU should be a problem. I took a look at your tasks list. You don't seem to have any yet that have been completed and returned. Are any still being processed? Are you still not running any CPU tasks? I can't think of a reason why the GPU task shouldn't be running OK under those conditions. Perhaps the driver?

Cheers,
Gary.

Chris
Chris
Joined: 9 Apr 12
Posts: 61
Credit: 45056670
RAC: 0

Task #1 took 54 minutes to

Task #1 took 54 minutes to run, with no CPU tasks running. Hasn't uploaded yet.

I suspended the GW tasks, I'll try and burn them off overnight when I'm not actually doing anything.

I usually run 2 CPU tasks (CPDN when they have work) and 2 GPU tasks. For whatever reason, sometime last year I needed to leave 2 open cores to keep the GPU fed and the desktop environment usable. So it could be me, but I'm not sure why.

My GPU-Z memory controller load had a load, and then was showing a drop to 0, then a plateau of load. Not sure what that could mean either (except it being slower).

kiska
kiska
Joined: 31 Mar 12
Posts: 9
Credit: 108916550
RAC: 0

Well it seems my GT 525M is

Well it seems my GT 525M is about 4x faster than my core i5 2450M, but then I have to use 23x multiplier for my cpu otherwise it would overheat. So it probably is 3.5x faster than stock core i5 2450M.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

RE: Was this supposed to go

Quote:
Was this supposed to go out to 64 bit Windows?


I have been running it on Win7 64-bit for the last six days.

Quote:


I had to suspend all CPU tasks to get something resembling a running system. Still pretty unsteady running with only one GPU task and no CPU. I was running only 2 CPU tasks though after some poor performance with the BRP4 and 5.

GTX 650 and a Phenom II X4 820. Not the newest, but this is just grinding to a stop.


A work unit takes 38 to 40 minutes on my GTX 650 Ti, supported by one virtual core of an i7-3770. The other seven cores are on WCG/CEP2 (Win7 64-bit).

On another machine running WinXP, a work unit takes 28 to 30 minutes on a GTX 660, and 26 to 29 minutes on a GTX 750 Ti. Each GPU is supported by a single core of an E8400 Core2 Duo.

But note that these are dedicated PCs, and I don't try to do other work on them, so the desktop might be sluggish in that case, but I haven't checked.

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

Nice! :) Does it have FP64 at

Nice! :)
Does it have FP64 at 1/4th of those SIMD units ... or how many?
I mean ... if FP64 is 3 times slower than 32-bit flop ... 4 * 3 = 12 times less 64FLOPS than 32-bit FLOPS. But thats GOOD - my mobile GPU (AMD HD5870M 1GB GDDR5) has only ZERO cores for FP64 :-))))))))))

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

I have bought a FP64 desktop

I have bought a FP64 desktop GPU (to compile and test Asteroids@home CUDA app - which I need to fully undestand to re-write it into OpenCL :-) )! It is cheap NVidia with CUDA Capability 3.5 ... : ASUS GT640-1GD5-L :-)))
I will let You know how Einstein test/BETA app works on it - as soon as I will have it at home! :-)

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578180208
RAC: 201563

@Overtonesinger: the GT520

@Overtonesinger:

the GT520 you're referring to is a mainstream Fermi (like almost all 400 and 500 series cards) and hence performs FP64 at 1/12th the FP32 rate. However, that's fine for the current Einstein app as most instructions use FP32.

Your GT640 is a mainstream Kepler (like almost all 600 and 700 series cards) and hence performs FP64 at 1/24th the FP32 rate. However, GT640 is severely limited by memory bandwidth anyway, which will be even more pronounced in the new Einstein app (which requires far more bandwidth than "normal" programs). So it should work, but not as well as other cards relative to what the chip could do (with faster memory, but then it's called a GTX650).

MrS

Scanning for our furry friends since Jan 2002

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

Thanx I see it is really slow

Thanx
I see it is really slow in FP64... takes forever to finish this new Einstein GPU-BETA app (it might be one 24th of FP32, as You said) :-) BUT at least, (unlike my extreme 1.12 TFLOPs GPU in gaming notebook, which has ZERO FP64 support) I *can* test my FP64 apps on it!!! :D

OK, BOINC tells me it has 803 GFLOPs PEAK (I suppose it estimates the FP32 speed) :D That means... it has only: 33.46 GFLOPS-64-bit!
LOL! How a nice buy. :-)))

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

I ran the new Beta

I ran the new Beta application on a variety of different graphics cards in Windows 7 64-bit to see how each performs with this application. Below are the results.

Card, Runtime (s)

GTX 580, 702-707
GTX 680, 1053-1060
GTX 780 Ti, 692-693
7970, 442-501 (463 Avg)

The GTX 580 does have higher FP64 rating than the GTX 680 which may account for the difference but there may be another reason as well. The GTX 780 Ti has a slightly higher FP64 rating than the GTX 580. It would be interesting to see how the Titan Black or Tesla K40 handles this application with FP64 mode enabled.

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

This was actually a Haswell

This was actually a Haswell i5-3320M,
but never min. :)

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

Oh, really? You said

Oh, really? You said "severely limited by memory"?
How is that possible, that running the Einstein@home CaS6-GWopenCL-BETA peaks the memory controller load typically at 76 percent with long-term average load os 33% ? :)

This is the GDDR5 version of *new* 28nm desktop GT640! ... Thus is has 40.1 GB/s memory throughput, not 13 GB/s like the ddr3 version. :-)

Here are the screenshots of a running Cas6-GWopenCL-BETA app - with CPU-Z and GPU-Z. As a base for the GPU, I use the A8-3870K at 3028 MHz + (2133) dual DDR3 RAM OC-running at 2261.8 MHz:

https://dl.dropboxusercontent.com/u/69462289/scr/CaS6-GT640_II.png

https://dl.dropboxusercontent.com/u/69462289/scr/CaS6-GT640.png

GW-openCL-BETA WU completes in around: 81 minutes (1 hour 21 min).

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578180208
RAC: 201563

It's good that you got the

It's good that you got the better card. The old one with 900 MHz DDR3 at 128 bit bus achieves 28 GB/s, though. When going to GDDR5 at 2500 MHz nVidia dropped the width to 64 bit, hence it's still "only" got 40 GB/s. But 1/3 more bandwidth is certainly welcome! GDDR5 would also overclockl better ;)

Regarding the limits: I'm not a developer, but have followed the posts regarding this app. Obviously there are 2 phases: one with low memory controller load and almost 100% GPU load, whereas the other one with ~80% memory controller load can "only" achieve ~80% GPU load. In these phases more memory throughput should help proportionally.

MrS

Scanning for our furry friends since Jan 2002

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

Thanx! I am now going to

Thanx!
I am now going to OVERCLOCK the GT640, testing stability with HEAVEN Benchmark 4.0 . Memory first!, as I always do.
When I find the highest stable clock for MEM, I will start to experiment with the GPU core clock to find the highest stable clock. I suppose that with 47 C (MAX), it has pretty good potential for overclocking! :-)

I only hope it will be also stable in FP64 - when stable in FP32 and DX11 benchmarks.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1408864554
RAC: 1160806

You will have no problem with

You will have no problem with a card temp. that low on the GT640 since the max is 102C

My 550Ti's,650Ti's,and 660Ti's are all OC'd or "superclocked" and have for a couple years with the temp average of 60C and they never have any problems.

I check the GPU-Z when I load more tasks and use EVGA® PRECISION X 4.2.1

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

OK, MAGIC. You are right. I

OK, MAGIC.
You are right. I dont have any problems with temperature here on GT640 when overclocking it.
I have used ASUS GPU-Tweak and I raised the MEM (GDDR5) as most ppl recommended, from 1250 to 1500 (from 5k to 6000 effectively). When stable in UniGiNE Heaven 4.0 and temp still max 62 C, I raised the GPU core clock from 1046 to 1136.6 MHz. Stable! I cannot OC much further without finding a way to increase the Voltage. Any tips? :-) ... OMG, do I have to modify VBIOS to do that??? I dont like this (But I have done so in my mobile GPU ATi HD5870M)...
In Heaven-BenchM. it has max 64 C now... and with Einstein@home GW-openCL-nvidia-BETA app, it has only max. 54 C.

Now, first WU has been validated, so it seems stable also in FP64 calculations. :-) ... only 4.1 % speedup.
EDIT: Sorry, from the latest result it seems more like: 7.8 % average speedup!

Average GPU load increased from 77 to 80 percent!
Cool! ... Still, this app needs some optimizations to get closer to 98 average usage (and without need of OC GDDR5 MEM!), I think. Maybe by overlaping some HOST-to-device memory transfers with some computation.

http://einsteinathome.org/workunit/191325825

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

It always does (except those

It always does (except those integrated in APUs).
And it's mobile version always doesn't.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

I can no longer EDIT my post

I can no longer EDIT my post above...: About GT640 OC.
Mem (64-bit GDDR5 1250 MHz) properties in CAS6-GW-openCL-nvidia BETA app:

1. Original speed: 5000 MHz effectively (40 GB/s)... load AVG: 33 ; MAX: 78
2. OC to 1500 MHz: 6000 MHz effectively (48 GB/s)... load AVG: 31 ; MAX: 72
That's all, folks.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: I have used ASUS

Quote:
I have used ASUS GPU-Tweak and I raised the MEM (GDDR5)...I cannot OC much further without finding a way to increase the Voltage. Any tips? :-)


you could try using MSI Afterburner or Riva Tuner instead of ASUS GPU Tweak. i know Afterburner has a voltage adjustment, but i'm not sure about Riva Tuner. there are other GPU utilities out there as well...i just don't know what they are off the top of my head.

TJ
TJ
Joined: 11 Feb 05
Posts: 178
Credit: 21041858
RAC: 0

RE: RE: I have used ASUS

Quote:
Quote:
I have used ASUS GPU-Tweak and I raised the MEM (GDDR5)...I cannot OC much further without finding a way to increase the Voltage. Any tips? :-)

you could try using MSI Afterburner or Riva Tuner instead of ASUS GPU Tweak. i know Afterburner has a voltage adjustment, but i'm not sure about Riva Tuner. there are other GPU utilities out there as well...i just don't know what they are off the top of my head.


Correct about MSI Afterburner voltage adjuster, but this is fro some cards not working. I have experience with that. For the GTX660 for instance. While using EVGA Precision X, voltage can be adjusted (separate pop-up window). Works great, but also depends on the card, for some more regulators can be altered, for some less. You have to try it.

Greetings from
TJ

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

RE: I can no longer EDIT my

Quote:

I can no longer EDIT my post above...: About GT640 OC.
Mem (64-bit GDDR5 1250 MHz) properties in CAS6-GW-openCL-nvidia BETA app:

1. Original speed: 5000 MHz effectively (40 GB/s)... load AVG: 33 ; MAX: 78
2. OC to 1500 MHz: 6000 MHz effectively (48 GB/s)... load AVG: 31 ; MAX: 72
That's all, folks.

Now my GT640 is OverClocked from 1046 to 1162.6 MHz (core) and 1499 MEM.
It completes WU typically in 1 hour 10 minutes 30 seconds! :-)))

Still validates those GW-openCL WUs... and has max. 65 degrees Celsia.

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

I have added ultra-quiet

I have added ultra-quiet 135mm fan Zalman ZM-F4 to the *BIG* passive Zalman-cube cooler on CPU, so the CPU now NEVER underclocks from 3000 to 900 MHz ... no surprise that Asteroids at home WU completes in 3 hour 10 minutes instead of 4:20 .......

SURPRISE, however, is: Einstein at home WU:
GW-OpenCL-beta now completes in 52 minutes instead of 1 hour 10 minutes!!!

I am getting close to performance of the GTX 750 Ti ! LOL! :)))

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

agony
agony
Joined: 2 Jul 07
Posts: 7
Credit: 1397467
RAC: 0

my ati 7870 pokes such a

my ati 7870 pokes such a workunit in about 13 minutes and so far i just had one bad one among them.

Filipe
Filipe
Joined: 10 Mar 05
Posts: 186
Credit: 406300184
RAC: 370728

My old 550 Ti does a WU on

My old 550 Ti does a WU on 39min

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

since we're on the topic of

since we're on the topic of run times, my HD 7970 crunches GW tasks 3 at a time in 1150s on average...so that's an average of ~383s/task (or 6 min 23 sec per task). i've crunched thousands of these tasks by now without a single error or invalid. i'm at stock clocks (900MHz/1375MHz).

Overtonesinger
Overtonesinger
Joined: 23 Jan 06
Posts: 21
Credit: 92797680
RAC: 116043

yy, COOL! AMD 7970 is very

yy, COOL! AMD 7970 is very powerful, I know it. I always wanted this GPU, but didn't have any desktop to plug it in (no money left for desktop).
I love AMD, cause they don't limit the number of FP64-capable stream processors (EXCEPT ommitting them completely in mobile GPUs and in all APUs) :)))
But now I need to practice CUDA + OpenCL development - on one cheapest FP64-bit-capable GPU.

btw: further stable OC: 1176 MHz core. Still validates. :O

BRISINGR-II: PRIME X370-PRO,AMD'Zen 1800X 3.7/4.1, 2x8 DDR4 G.Sk.3602@2400,Asus STRIX GTX1070 DirectCUIII 8GB,*2017-04-08

BRISINGR: nb ASUS G73-JH,i7 1.73,4x2 DDR3-1333 CL7,ATi5870M 1GB,*2011-02-24

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: I love AMD, cause they

Quote:
I love AMD, cause they don't limit the number of FP64-capable stream processors (EXCEPT omitting them completely in mobile GPUs and in all APUs) :)))


careful now...with the current generation of AMD GPUs, the 290X and the 290 have their FP64 performance limited to 1/8 of their FP32 performance, while the 280X has its FP64 performance limited to 1/4 of its FP32 performance. so AMD's current gen mid-range GPU actually outperforms its big brothers in FP64 performance. notice that all the top hosts running Einstein@Home have R9 280X's or 7970's (the former is actually just the latter rebadged as a current gen part)...there isn't a single R9 290X or 290 in the top 20.

Grutte Pier [Wa Oars]~GP500
Grutte Pier [Wa...
Joined: 18 May 09
Posts: 39
Credit: 6098013
RAC: 0

i'M PROCESSING THESE WU's now

i'M PROCESSING THESE WU's now on a hd 7870

The first few didn't finish.

the wu is have now is very very slow in the last 1%.
There is no GPU-load and Cpu-load is also not there.

It's creeping up very slow in the last hours, with 0.012% to go.

Is this common, some solution?