Support AVX

Sebastian M. Bo...

Joined: 20 Feb 05

Posts: 63

Credit: 1529602660

RAC: 105

Maybe now, after almost two

6 Sep 2013 13:16:02 UTC

Message 106363 in response to message 106362

(moderation:

)

Maybe now, after almost two years, it's time to think again about support of AVX / FMA (3/4)?

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578346872

RAC: 197536

With AVX Haswell can send 256

14 Sep 2013 21:27:35 UTC

Message 106364

(moderation:

)

With AVX Haswell can send 256 bit vectors into the pipeline each clock tick, whereas Ivy still needed 2 128 bit vectors in 2 clocks. Intel slides say they want to enlarge the width to 512 bit in 2 years or so. Sounds like something worth using.. if it's not too much hassle.

And - with all due respect - for the next few years AVX support will liekly gain you more throughput than Einstein@Android.

MrS

Scanning for our furry friends since Jan 2002

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578346872

RAC: 197536

Any updates after one more

15 Sep 2014 18:45:03 UTC

Message 106365

(moderation:

)

Any updates after one more year?

MrS

Scanning for our furry friends since Jan 2002

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

Later BOINC clients (ie 7.3

16 Sep 2014 11:46:38 UTC

Message 106366

(moderation:

)

Later BOINC clients (ie 7.3 and 7.4) also report AVX feature if the CPU supports it.

BOINC blog

ahj

Joined: 25 Jul 10

Posts: 17

Credit: 4331992

RAC: 0

Paging Bernd for any updates

29 Mar 2015 4:01:43 UTC

Message 106367

(moderation:

)

Paging Bernd for any updates

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 728279580

RAC: 1177325

Fair question. Currently

6 Apr 2015 9:41:24 UTC

Message 106368

(moderation:

)

Fair question.

Currently the foundations are being laid so that the next generation GW app will use AVX (actually the best SIMD aritecture available on a given host). I think we might see this on E@H later this year.

The BRP app is mainly intended for GPU now and we won't touch the CPU code, I guess.

The FGRP (gamma ray pulsar search in FERMI/LAT data) app could benefit from an AVX enabled FFT.

Cheers
HB

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578346872

RAC: 197536

RE: the best SIMD

11 Apr 2015 12:17:38 UTC

Message 106369 in response to message 106368

(moderation:

)

Quote:

the best SIMD aritecture available on a given host

That would be the ideal solution :)

I'm curious: what do you have to do to realize this?

MrS

Scanning for our furry friends since Jan 2002

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429865636

RAC: 78586

The time has come. The next

11 Sep 2015 4:37:44 UTC

Message 106370

(moderation:

)

The time has come. The next GW run will surely support AVX, will it? ;-)

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578346872

RAC: 197536

The new search on the

19 Mar 2016 20:19:32 UTC

Message 106371 in response to message 106370

(moderation:

)

The new search on the advanced-generation LIGO detector data has an AVX app, although currently only for Linux.

MrS

Scanning for our furry friends since Jan 2002

Jesse Viviano

Joined: 8 Jun 05

Posts: 33

Credit: 133045917

RAC: 0

If you implement AVX, make

20 Mar 2016 16:39:54 UTC

Message 106372

(moderation:

)

If you implement AVX, make sure you have a way to deny 256-bit wide AVX to AMD Bulldozer and Piledriver processors and instead serve either SSE3 or 128-bit wide AVX plus FMA4 to those processors unless you prove that the 256-bit AVX meets a special case. See http://www.agner.org/optimize/ on why this should be done in most cases. The only advantage I can see to sending 256-bit AVX to those processors is if the programmer can fit the entire working set in the 256-bit registers and not in the 128-bit registers. If neither fit, 128-bit AVX and SSE3 are faster than 256-bit AVX due to some horrendous performance of the 256-bit registers when they need to be written out to memory especially in Piledriver. If both fit, then the 128-bit AVX or SSE is better because a 256-bit instruction takes two of the four shared decoders to decode while the 128-bit instruction uses just one. Bulldozer's set of four shared instruction decoders also has problems when handling 256-bit AVX instructions that must be split into two 128-bit instructions each because this set can only split one of these instructions per clock cycle, so a second 256-bit instruction could stall the decoder set.

Steamroller fixes these problems, so you should serve 256-bit wide AVX with optional FMA4 to this processor with no problem. I would expect the same for Excavator.

Support AVX

Forums › Wish List

Comment viewing options

Forums › Wish List