Fast Fourier Transform optimization?

cecht
cecht
Joined: 7 Mar 18
Posts: 1534
Credit: 2907902101
RAC: 2148128
Topic 220779

Are there any E@H apps that allow for command line optimization of FFT in either AMD or NVidia GPU apps? More generally, is there specific information from the Linux command 'clinfo' that would be useful for folks wanting to optimize GPU run parameters? I'm asking to help with upgrades to the program amdgpu-utils available from GitHub.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

Hello Cecht, the tool you

Hello Cecht,
the tool you mentioned (https://github.com/Ricks-Lab/amdgpu-utils) is also available for Debian and Ubuntu (https://packages.ubuntu.com/focal/ricks-amdgpu-utils). I admit not to have understood you question, though :o/

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 317816235
RAC: 387688

I think that's a no to both

I think that's a no to both questions I'm afraid.

[This may not be relevant/helpful]

IIRC the bottleneck in the FFT is the evaluation of the sine and cosine of theta, where theta is the ( radian measured ) argument with the precision/granularity of the base transform ie. however many data points we are transforming. Evaluation by power series approximation converges way too slow, but I think it was solved using lookup of precalculated values in a table and double angle formulae eg. 

sin(A + B ) = sinA * cosB + cosA * sinB ; cos(A + B) = cosA * cosB - sinA * sinB

making this amenable to a fused multiply/add scheme, if available.

[/This may not be relevant/helpful]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958636218
RAC: 713053

Mike Hewson wrote:making this

Mike Hewson wrote:
making this amenable to a fused multiply/add scheme, if available.

Isn't there a problem with loss of precision with fused multiply/add on some hardware, such as Intel GPUs?

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 317816235
RAC: 387688

Richard Haselgrove wrote:Mike

Richard Haselgrove wrote:
Mike Hewson wrote:
making this amenable to a fused multiply/add scheme, if available.

Isn't there a problem with loss of precision with fused multiply/add on some hardware, such as Intel GPUs?

Is there ? That's sad. I guess the question is equivalent to "does Intel/whoever comply with  IEEE754-2008 ?" As the prime market for GPUs is not computing then it may be better to be fast than right .....

[dull, dirty and dangerous]

So you are right to worry, as with E@H the FFTs are of order 222 data points IIRC. So you have to have a precision equal/deeper than that in the representation. Of interest is that 32 bit IEEE-754 floats have 23 bits significant for the fractional part of an operand.

This is a clue to calculating the trig functions via double angle formula. Pre-calculate the sine and cosine for each 2n from 2-1 thru to 2-23. Put these into respective tables/vectors indexed suitably, making two tables each of 23 entries of some precision. Now any 23 bit theta argument, from zero up until 1 - 2-23 , is going to be some sum of certain such powers of 2 ie.

a1 * 2-1 + a2 * 2-2 +  ..... + ak * 2-k   + ...... a23 * 2-23 { Each ak is either 0 or 1 }

Set the value of two accumulators, call them Asin and Acos. The first is set to sin(0.0) = 0.0 and the second is set to cos(0.0) = 1.0.

Now, by for looping say, shift the operand to the right by one bit and inspect the bit that falls off the right end ( first you will get a23 then another shift gets a22 etc ..... all along to a1 ). At each shift test the bit's value :

- if ak is 0 then forget it as 2-k  does not contribute to theta's value, so neither will sin(2-k) or cos(2-k) contribute to our task here. Exit that loop iteration.

- if ak is 1 then 2-k does contribute to theta's value. So by using sin(2-k) and cos(2-k) from your tables plug it into the double angle formulae :

Asinold <---- Asin

Acosold <---- Acos

Asin <---- Asinold * cos(2-k) + Acosold * sin(2-k

Acos <---- Acosold * cos(2-k) - Asinold * sin(2-k)

... where Asinold and Acosold are temporary variables with <---- indicating assignment. So actually I don't need FMA at all now that I come to think in detail ....

When the loop finishes looking at all the bit's of theta then you will have Asin = sin(theta) and Acos = cos(theta). To that level of precision anyway. For that matter the precision of the trigs doesn't have to equal the precision of the argument, though the precision of sine must be the same as cosine. But you need a full circle ie. so from 0 to just below 2 * PI is sought : mutatis mutandis to reach that range.

Plus to be truly general you want to enact the arguments moduli PI, bringing the argument back to some principal range. Why not go from -PI to +PI, a lovely (anti-)symmetric range ? Know that :

sin(-theta) = - sin(theta )

cos(-theta) = cos(theta)

{ Ultimately you're actually after cos(theta) + i * sin(theta) = ei*theta }

Alternatively : to be utterly brutal you could just pre-calculate every sine and cosine for every argument in your principal range. That unwraps all the above gobbledygook to simple lookups from two arrays each of 223 entries. What a luxury that would be ...

[/dull, dirty and dangerous]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

cecht
cecht
Joined: 7 Mar 18
Posts: 1534
Credit: 2907902101
RAC: 2148128

steffen_moeller wrote:Hello

steffen_moeller wrote:
Hello Cecht,
the tool you mentioned (https://github.com/Ricks-Lab/amdgpu-utils) is also available for Debian and Ubuntu (https://packages.ubuntu.com/focal/ricks-amdgpu-utils). I admit not to have understood you question, though :o/

Right, though amdgpu-utils 3.0.0 has been released, so the deb package should be updated.

My questions related to a rather minor point on what information should be listed from the amdgpu-ls --clinfo command. Rick, the dev, set that option to include parameters that are relevant for setting command line arguments of SETI apps. He wasn't sure whether E@H apps used similar or additional information (related to FFT), so I offered to ask here.

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 1534
Credit: 2907902101
RAC: 2148128

Mike Hewson wrote:I think

Mike Hewson wrote:
I think that's a no to both questions I'm afraid.

Thank you. The rest of what you and Richard posted is fascinating (though above my head!) and hopefully will be helpful to someone down the road.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.