Improvements in the code of the clients

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250480775

RAC: 35261

Hm. twiddle_dee is not in our

9 Aug 2021 15:05:40 UTC

Message 188136

(moderation:

)

Hm. twiddle_dee is not in our own code; there is a twiddle_dee possibly in the clFFT library. That library, however, is linked but should not be used in the GW App, at least not on NVidia.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250480775

RAC: 35261

In clFFT there is

9 Aug 2021 15:09:00 UTC

Message 188137

(moderation:

)

In clFFT there is

 ./src/library/generator.stockham.h:209: inline std::string TwTableLargeName()
 ./src/library/generator.stockham.h-210- {
 ./src/library/generator.stockham.h-211- return "twiddle_dee";
 ./src/library/generator.stockham.h-212- }

...

./src/library/generator.stockham.h-282- ss << "\n __constant ";
./src/library/generator.stockham.h-283- ss << RegBaseType<PR>(2);
./src/library/generator.stockham.h:284: ss << " " << TwTableLargeName();

I'll patch that.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46820962642

RAC: 64267330

just be aware of the

9 Aug 2021 15:31:19 UTC

Message 188138

(moderation:

)

just be aware of the consequences of this change with regard to OpenCL supported features. defining a table in this way is only supported in openCL 2.0 and greater. and I think it's safe to assume that a large portion of the user base is using drivers that only have openCL 1.2

you might need to gatekeep the application from hosts that have incompatible drivers to avoid mass errors. this applies to both AMD and NVIDIA

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250480775

RAC: 35261

There is now a 1.25 FGRP App

10 Aug 2021 7:27:00 UTC

Message 188154

(moderation:

)

There is now a 1.25 FGRP App (Beta test) that should have the clFFT patched in the suggested way. For now this is restricted to NVidia Pascal & up (compute capability >= 6.0) and OpenCL 2.0 in the respective plan class. The app is available for all three major platforms (Windows, Linux, OSX).

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46820962642

RAC: 64267330

I tested out the app, but

10 Aug 2021 13:36:25 UTC

Message 188159

(moderation:

)

I tested out the app, but it’s not really any different in run speed or behavior. I can see __global in the hex editor now with twiddle_dee. Applying my patch overtop of this new 1.25 app brings runtime back to being fast again.

there may be some other changes that petri has made besides this that are complimentary.

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250480775

RAC: 35261

I would be happy to receive

10 Aug 2021 14:21:37 UTC

Message 188160

(moderation:

)

I would be happy to receive this patch. As long as it's OpenCL, we should be able to incorporate it in the App.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46820962642

RAC: 64267330

Bernd Machenschalk wrote: I

10 Aug 2021 16:37:33 UTC

Message 188163 in response to message 188160

(moderation:

)

Bernd Machenschalk wrote:

I would be happy to receive this patch. As long as it's OpenCL, we should be able to incorporate it in the App.

I've PMed you a link to the code and instructions/info.

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250480775

RAC: 35261

Thanks, got it. Regarding the

12 Aug 2021 11:26:41 UTC

Message 188188

(moderation:

)

Thanks, got it. Regarding the GPU code, this only changes the type of twiddle_dee and adds some options to the OpenCL compiler, in particular to use OpenCL 2.0 (*). I built app version 1.26 with that, please give it a try. It's the same plan class as 1.25, so the same restrictions apply.

(*) The other stuff in there just puts the CPU to sleep while the GPU is running, there is some other method implemented in our app.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46820962642

RAC: 64267330

Bernd Machenschalk

12 Aug 2021 13:13:39 UTC

Message 188191 in response to message 188188

(moderation:

)

Bernd Machenschalk wrote:

Thanks, got it. Regarding the GPU code, this only changes the type of twiddle_dee and adds some options to the OpenCL compiler, in particular to use OpenCL 2.0 (*). I built app version 1.26 with that, please give it a try. It's the same plan class as 1.25, so the same restrictions apply.

(*) The other stuff in there just puts the CPU to sleep while the GPU is running, there is some other method implemented in our app.

Hi Bernd, I tried 1.26 and I see the same performance as 1.25. Did you add the other conditions from my followup PM last night?

petri33 wrote:

First difference: a kernel that begins like this:

__attribute__(( reqd_work_group_size( 16, 16, 1 ) ))
kernel void
transpose_gcn_tw_fwd( global float2* restrict pmComplexIn, global float2* restrict pmComplexOut )
{
   const Tile localIndex = { get_local_id( 0 ), get_local_id( 1 ) };
   const Tile localExtent = { get_local_size( 0 ), get_local_size( 1 ) };

...

   local float2 lds[ 64 ][ 64 ];

should be changed to

   local float2 lds[ 64 ][ 65 ];

It uses a bit more memory, but the access alignment becomes better and the number of cache line conflicts is reduced.

petri33 wrote:

The same thing applies to kernel called transpose gcn_tw_back

and both of those gcn_tw kernels can have the line:

const Tile localExtent = { get_local_size( 0 ), get_local_size( 1 ) };

changed to

const Tile localExtent = { 16, 16 };

to help compiler to generate better code at compile time. The (16, 16) is the required work group size and can be hard coded,

That should make the final speedup.

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250480775

RAC: 35261

Thanks, probably missed that.

12 Aug 2021 14:06:07 UTC

Message 188192

(moderation:

)

Thanks, probably missed that. I'll take another look next week.

Improvements in the code of the clients

Forums › Wish List

Comment viewing options

Forums › Wish List