I guess the way it is now is that most standard MoBo's are 16x and the enthusiast 40x MoBo cost about $500. What is the future for MoBo PCIe lanes? Will they gradually go toward 40x? Maybe the next generation will be 16x, 8x, 4x or are we permanently stuck with the situation as it is now?
merle
What is freedom of expression? Without the freedom to offend, it ceases to exist.
— Salman Rushdie
Copyright © 2024 Einstein@Home. All rights reserved.
PCIe 40x, 16x, what next?
)
1150 cpu has 16 lanes so you usually top out at 8x 8x
my 1150 mobo has a PLX chip so it can do 8x 8x 8x 8x but in reality PLX is like cheating so the cards are really only getting 4x 4x 4x 4x which isn't ideal
2011-3 cpu has 28 or 40 lanes so assuming 40 lane cpu you can do 16x 8x 8x 8x
with PLX you can do 16x 16x 16x 16x
Guess we'll know in 2016:
)
Guess we'll know in 2016: https://www.pcisig.com/news_room/faqs/FAQ_PCI_Express_4.0/
Thanks for the info
)
Thanks for the info folks,
Seems like I'm the last one to learn about the new stuff. :-)
merle
What is freedom of expression? Without the freedom to offend, it ceases to exist.
— Salman Rushdie
RE: Guess we'll know in
)
I'm VERY glad to see the new 4.0 slots will be compatible with the current cards!
RE: 1150 cpu has 16 lanes
)
Now I'm confused. What is a PLX chip and what do you get when you use 3 cards 8x,8x,8x?. The 2011-3 sounds to be like it is already equivalent to PCIe 4.0 in terms of 16Gt/s.
I guess I need to do a little more studying.
merle
What is freedom of expression? Without the freedom to offend, it ceases to exist.
— Salman Rushdie
the PLX doubles the lanes
)
the PLX doubles the lanes that the video cards see. for one or two gpus you don't need it, since on 1150 you can get 16x with one gpu and 8x 8x with two gpus all without PLX.
the problem is that when you use three gpus it becomes 8x 4x 4x and at that point Nvidia triple SLI wouldn't be supported as it requires a minimum of 8x to work. the PLX makes the video cards see 16x 8x 8x even though the underlying connections are still 8x 4x 4x.
It's a bit more than that. A
)
It's a bit more than that. A PLX chip really provides the number of lanes quoted to the GPUs. It sits between the chipset / CPU and GPUs, e.g.
No PLX:
- GPU1 with 8x to the CPU
- GPU2 with 8x to the CPU
We get:
- GPU1 with 16x to the PLX
- GPU2 with 16x to the PLX
- PLX with 16x to the CPU
This is great for games, where the GPUs have to talk to each other and can do so at full 16x bandwidth through the PLX chip, without needing to go through the CPU. For BOINC / GP-GPU it is less useful, because each GPU is only talking to the CPU, which in both cases is through a 16x connection.
There's still some benefit, though: no decent GP-GPU app requires PCIe bandwidth all the time. So both GPUs can get access to the full 16x bandwidth, if the other one currently doesn't need it. The load balancing is better.
I can not give you any exact numbers, though, how this impacts Einstein performance. What's for sure is that PCIe 4 is still some time away and socket 2011-3 boards and CPUs are expensive. To support GPUs it might actually be better to get 2 MSI "Eco" mainboards with cheap CPUs (with 16x PCIe 3 lanes) than trying to build one high-performance system.
MrS
Scanning for our furry friends since Jan 2002
what i should have mentioned
)
what i should have mentioned is that while PLX can help for games, i don't think it does much for Einstein.
i haven't been on the project full time but i have one rig with a single 290x and it can probably output 65-70k daily
my other rig is running a single 290x and a single 295x2(which is just two 290xs on one card) but it only outputs ~140k daily. it's an asus maxixmus vi extreme with a PLX chip but since i'm only using two slots the PLX doesn't apply here(8x 8x). the 295x2 has a PLX chip on board that dilutes the 8x into 4x 4x.
i'm only running one wu at a time per gpu so on the first rig with 16x i was completing units in less than an hour until i installed omega so now it's a little bit over an hour
on the second rig the 290x with a slight factory overclock on 8x finishes a unit in 1:15 but the 295x2 on 8x takes 1:46 per unit
the drop from 16x to 8x doesn't seem so bad, but the drop to 4x starts to really hurt the times.
For Arecibo tasks you need to
)
For Arecibo tasks you need to run at least 2 WUs per GPU on any half-decent card. Even my HD4000 iGPU benefits significantly from 2 WUs in parallel. You high end cards might be happier running even more.
The reason I say this: with only 1 concurrent WU the GPU stalls as soon as anything gets in the way of crunching, be it PCIe transfers, CPU support or GPu memory access. With more concurrent WUs there can be better load balancing, i.e. if the GPU is waiting for something to continue one WU, another can take over. It doesn't work as fine-grained as for CPUs yet, but there's certainly a huge net benefit for high end cards.
BTW: the top hosts with 2 Tahitis (smaller than your Hawaiis) achieve about 120k RAC per GPU, using i7 4770K as hosts, i.e. with 8x PCIe 3 or with a PLX.
MrS
Scanning for our furry friends since Jan 2002
I was more interested in the
)
I was more interested in the comparison between single and triple gpu than trying to achieve maximum output. The benefit of lower gpu utilization is lower power/heat.