PCIe 40x, 16x, what next?

merle van osdol

Joined: 1 Mar 05

Posts: 513

Credit: 60724446

RAC: 0

17 Dec 2014 16:44:55 UTC

Topic 197881

(moderation:

)

I guess the way it is now is that most standard MoBo's are 16x and the enthusiast 40x MoBo cost about $500. What is the future for MoBo PCIe lanes? Will they gradually go toward 40x? Maybe the next generation will be 16x, 8x, 4x or are we permanently stuck with the situation as it is now?

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

woohoo

Joined: 28 Jul 14

Posts: 20

Credit: 352552543

RAC: 0

PCIe 40x, 16x, what next?

17 Dec 2014 17:00:22 UTC

Message 128404

(moderation:

)

1150 cpu has 16 lanes so you usually top out at 8x 8x
my 1150 mobo has a PLX chip so it can do 8x 8x 8x 8x but in reality PLX is like cheating so the cards are really only getting 4x 4x 4x 4x which isn't ideal

2011-3 cpu has 28 or 40 lanes so assuming 40 lane cpu you can do 16x 8x 8x 8x
with PLX you can do 16x 16x 16x 16x

Logforme

Joined: 13 Aug 10

Posts: 332

Credit: 1714373961

RAC: 0

Guess we'll know in 2016:

17 Dec 2014 17:32:53 UTC

Message 128405

(moderation:

)

Guess we'll know in 2016: https://www.pcisig.com/news_room/faqs/FAQ_PCI_Express_4.0/

merle van osdol

Joined: 1 Mar 05

Posts: 513

Credit: 60724446

RAC: 0

Thanks for the info

17 Dec 2014 18:02:14 UTC

Message 128406

(moderation:

)

Thanks for the info folks,

Seems like I'm the last one to learn about the new stuff. :-)

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

mikey

Joined: 22 Jan 05

Posts: 12906

Credit: 1884427515

RAC: 70611

RE: Guess we'll know in

18 Dec 2014 11:55:00 UTC

Message 128407 in response to message 128405

(moderation:

)

Quote:

Guess we'll know in 2016: https://www.pcisig.com/news_room/faqs/FAQ_PCI_Express_4.0/

I'm VERY glad to see the new 4.0 slots will be compatible with the current cards!

merle van osdol

Joined: 1 Mar 05

Posts: 513

Credit: 60724446

RAC: 0

RE: 1150 cpu has 16 lanes

18 Dec 2014 20:10:15 UTC

Message 128408 in response to message 128404

(moderation:

)

Quote:

1150 cpu has 16 lanes so you usually top out at 8x 8x
my 1150 mobo has a PLX chip so it can do 8x 8x 8x 8x but in reality PLX is like cheating so the cards are really only getting 4x 4x 4x 4x which isn't ideal

2011-3 cpu has 28 or 40 lanes so assuming 40 lane cpu you can do 16x 8x 8x 8x
with PLX you can do 16x 16x 16x 16x

Now I'm confused. What is a PLX chip and what do you get when you use 3 cards 8x,8x,8x?. The 2011-3 sounds to be like it is already equivalent to PCIe 4.0 in terms of 16Gt/s.

I guess I need to do a little more studying.

merle

What is freedom of expression? Without the freedom to offend, it ceases to exist.

— Salman Rushdie

woohoo

Joined: 28 Jul 14

Posts: 20

Credit: 352552543

RAC: 0

the PLX doubles the lanes

18 Dec 2014 20:39:14 UTC

Message 128409

(moderation:

)

the PLX doubles the lanes that the video cards see. for one or two gpus you don't need it, since on 1150 you can get 16x with one gpu and 8x 8x with two gpus all without PLX.

the problem is that when you use three gpus it becomes 8x 4x 4x and at that point Nvidia triple SLI wouldn't be supported as it requires a minimum of 8x to work. the PLX makes the video cards see 16x 8x 8x even though the underlying connections are still 8x 4x 4x.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 588178058

RAC: 118815

It's a bit more than that. A

18 Dec 2014 22:58:40 UTC

Message 128410 in response to message 128409

(moderation:

)

It's a bit more than that. A PLX chip really provides the number of lanes quoted to the GPUs. It sits between the chipset / CPU and GPUs, e.g.

No PLX:
- GPU1 with 8x to the CPU
- GPU2 with 8x to the CPU

We get:
- GPU1 with 16x to the PLX
- GPU2 with 16x to the PLX
- PLX with 16x to the CPU

This is great for games, where the GPUs have to talk to each other and can do so at full 16x bandwidth through the PLX chip, without needing to go through the CPU. For BOINC / GP-GPU it is less useful, because each GPU is only talking to the CPU, which in both cases is through a 16x connection.

There's still some benefit, though: no decent GP-GPU app requires PCIe bandwidth all the time. So both GPUs can get access to the full 16x bandwidth, if the other one currently doesn't need it. The load balancing is better.

I can not give you any exact numbers, though, how this impacts Einstein performance. What's for sure is that PCIe 4 is still some time away and socket 2011-3 boards and CPUs are expensive. To support GPUs it might actually be better to get 2 MSI "Eco" mainboards with cheap CPUs (with 16x PCIe 3 lanes) than trying to build one high-performance system.

MrS

Scanning for our furry friends since Jan 2002

woohoo

Joined: 28 Jul 14

Posts: 20

Credit: 352552543

RAC: 0

what i should have mentioned

18 Dec 2014 23:56:30 UTC

Message 128411

(moderation:

)

what i should have mentioned is that while PLX can help for games, i don't think it does much for Einstein.

i haven't been on the project full time but i have one rig with a single 290x and it can probably output 65-70k daily

my other rig is running a single 290x and a single 295x2(which is just two 290xs on one card) but it only outputs ~140k daily. it's an asus maxixmus vi extreme with a PLX chip but since i'm only using two slots the PLX doesn't apply here(8x 8x). the 295x2 has a PLX chip on board that dilutes the 8x into 4x 4x.

i'm only running one wu at a time per gpu so on the first rig with 16x i was completing units in less than an hour until i installed omega so now it's a little bit over an hour

on the second rig the 290x with a slight factory overclock on 8x finishes a unit in 1:15 but the 295x2 on 8x takes 1:46 per unit

the drop from 16x to 8x doesn't seem so bad, but the drop to 4x starts to really hurt the times.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 588178058

RAC: 118815

For Arecibo tasks you need to

20 Dec 2014 14:38:05 UTC

Message 128412 in response to message 128411

(moderation:

)

For Arecibo tasks you need to run at least 2 WUs per GPU on any half-decent card. Even my HD4000 iGPU benefits significantly from 2 WUs in parallel. You high end cards might be happier running even more.

The reason I say this: with only 1 concurrent WU the GPU stalls as soon as anything gets in the way of crunching, be it PCIe transfers, CPU support or GPu memory access. With more concurrent WUs there can be better load balancing, i.e. if the GPU is waiting for something to continue one WU, another can take over. It doesn't work as fine-grained as for CPUs yet, but there's certainly a huge net benefit for high end cards.

BTW: the top hosts with 2 Tahitis (smaller than your Hawaiis) achieve about 120k RAC per GPU, using i7 4770K as hosts, i.e. with 8x PCIe 3 or with a PLX.

MrS

Scanning for our furry friends since Jan 2002

woohoo

Joined: 28 Jul 14

Posts: 20

Credit: 352552543

RAC: 0

I was more interested in the

21 Dec 2014 1:12:51 UTC

Message 128413

(moderation:

)

I was more interested in the comparison between single and triple gpu than trying to achieve maximum output. The benefit of lower gpu utilization is lower power/heat.

PCIe 40x, 16x, what next?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner