All things Amd GPU

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3983

Credit: 47421722642

RAC: 62430029

I do remain skeptical that an

19 Jul 2024 21:57:48 UTC

Message 227005

(moderation:

)

I do remain skeptical that an MI100 would even be twice as fast for O3AS, even with the specs listing hypothetical FP32 specs showing ~1.75x speedup (FP64 doesn’t really matter with O3AS, and memory bandwidth is only ~20% better), when it’s so dependent on CPU speed. The CPU portion won’t speed up at all.

I would have loved to try one out via a vast rental. But none are available.

_________________________________________________________________________

pututu

Joined: 6 Apr 17

Posts: 63

Credit: 653417392

RAC: 2

Nothing beat actual

20 Jul 2024 19:14:52 UTC

Message 227025

(moderation:

)

Nothing beat actual performance numbers other than our best educated guesses based on published specifications and how O3AS compute works. I still remain optimistic that if Mi100 can get 1.75x performance over VII, it will still do 5M PPD or complete 500 O3AS tasks per day. As for CPU, pretty much all the consumer/prosumer CPUs released over the past 5 years can do 4GHz+ easily either running at stock or with overclocking.

Tom M

Joined: 2 Feb 06

Posts: 6483

Credit: 9608607016

RAC: 5516543

Looks like there are 4

20 Jul 2024 22:19:44 UTC

Message 227028 in response to message 227025

(moderation:

)

Looks like there are 4 available on eBay for $699 or best offer.

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

pututu

Joined: 6 Apr 17

Posts: 63

Credit: 653417392

RAC: 2

AMD HPC presentation slides

21 Jul 2024 0:43:00 UTC

Message 227029

(moderation:

)

AMD HPC presentation slides at NASA Ames research center with notable mention on Mi100 Instinct performance. As this is from the first party presentation, as always take it with a grain of salt.

I didn't digest the entire slides carefully but in slide #13 AMD claims its Mi100 to be faster than V100 in areas of computational simulation.

In slide #23, AMD claims that its hip/hipcc compiler can match cuda compiler in terms of runtime performance. In slide #22, AMD talks about seamless porting from CUDA to HIP APIs. Not relevant here but worth mentioning AMD plans to take a slice out of CUDA domination.

Slide #33 end notes describes the test conditions/assumptions used.

tictoc

Joined: 1 Jan 13

Posts: 45

Credit: 7268031987

RAC: 7072309

pututu wrote: AMD HPC

21 Jul 2024 2:38:13 UTC

Message 227032 in response to message 227029

(moderation:

)

pututu wrote:

AMD HPC presentation slides at NASA Ames research center with notable mention on Mi100 Instinct performance. As this is from the first party presentation, as always take it with a grain of salt.

I didn't digest the entire slides carefully but in slide #13 AMD claims its Mi100 to be faster than V100 in areas of computational simulation.

In slide #23, AMD claims that its hip/hipcc compiler can match cuda compiler in terms of runtime performance. In slide #22, AMD talks about seamless porting from CUDA to HIP APIs. Not relevant here but worth mentioning AMD plans to take a slice out of CUDA domination.

Slide #33 end notes describes the test conditions/assumptions used.

There are a number of GPU compute projects (non-BOINC) that have successfully ported CUDA apps over to HIP. AMD has some tools that can be used for converting CUDA source to HIP. https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-hipify-readme/ Additionally, HIP code can run on both AMD and NVIDIA GPUs.

I've yet to see any BOINC projects look at using HIP for AMD GPUs. That pretty much boils down to the fact that the full ROCm and HIP stack is only available on Linux, and for BOINC projects, AMD GPU users running BOINC on Linux are a niche of a niche. Also, the ecosystem around ROCm, while much better than it was just a few years ago, is still in it's infancy compared to CUDA (SDK released in 2007).

The only DC related work that I've seen any HIP related work on is Folding@Home. Folding@Home does not currently have a HIP application, but the code that runs the GPU work (OpenMM) does have a plugin for running AMD GPUs in Linux via HIP. For OpenMM the improvement over the OpenCL implementation is massive. OpenMM HIP Plugin maintained by AMD: https://github.com/amd/openmm-hip . The original plugin repo with some OpenCL-HIP comparison tests on consumer hardware: https://github.com/StreamHPC/openmm-hip-old/issues/1

tictoc

Joined: 1 Jan 13

Posts: 45

Credit: 7268031987

RAC: 7072309

pututu wrote: Nothing

24 Jul 2024 3:40:58 UTC

Message 227086 in response to message 227025

(moderation:

)

pututu wrote:

Nothing beat actual performance numbers other than our best educated guesses based on published specifications and how O3AS compute works. I still remain optimistic that if Mi100 can get 1.75x performance over VII, it will still do 5M PPD or complete 500 O3AS tasks per day. As for CPU, pretty much all the consumer/prosumer CPUs released over the past 5 years can do 4GHz+ easily either running at stock or with overclocking.

Back in 2021 when I had a handful of MI100s to test, we did see about a 2x performance improvement over the MI50, but that was with a primarily fp64 simulation workload in a 4 GPU cluster. Not at all comparable to what what we are running here. Single GPU performance was around 1.2x-1.6x depending on the workload.

For anyone that is interested, I just put an MI100 on my test bench. Once I have everything setup, I'll post some results.

Tom M

Joined: 2 Feb 06

Posts: 6483

Credit: 9608607016

RAC: 5516543

Tictoc,I am definitely

25 Jul 2024 19:44:41 UTC

Message 227099 in response to message 227086

(moderation:

)

Tictoc,

I am definitely interested in your Mi100 results.

###edit###

Your computers are hidden. Could you post a link?

TY.

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

tictoc

Joined: 1 Jan 13

Posts: 45

Credit: 7268031987

RAC: 7072309

Sorry, I forgot I changed

26 Jul 2024 1:24:29 UTC

Message 227138

(moderation:

)

Sorry, I forgot I changed that. They should be visible now. system: https://einsteinathome.org/host/13191242 tasks: https://einsteinathome.org/host/13191242/tasks/0/0

Initial results are about what I expected. Roughly in line with a Radeon IIV. I haven't dialed anything in yet, and the Mi100 was temp throttling (HBM) under load. Results are pretty sloppy, and not really a good representation of what the ultimate performance can be. I'll sort out the cooling issues, and post some proper results over the weekend. The shroud I printed plus an 80mm 3k RPM fan is not up to the task of cooling over 200W on my test bench. Tasks in the results are running 6x with the GPU power capped at 215W, and the rest of the system specs are below.

OS: Arch Linux | Kernel: 6.10 | ROCm: 6.1.2 | CPU: AMD Threadripper 7960x locked @ 5GHz all-core.

Tom M

Joined: 2 Feb 06

Posts: 6483

Credit: 9608607016

RAC: 5516543

It also looks like your using

26 Jul 2024 2:11:07 UTC

Message 227139 in response to message 227138

(moderation:

)

It also looks like your using a lot less CPU time than run time.

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

pututu

Joined: 6 Apr 17

Posts: 63

Credit: 653417392

RAC: 2

tictoc wrote:Sorry, I

26 Jul 2024 3:14:35 UTC

Message 227141 in response to message 227138

(moderation:

)

tictoc wrote:

Sorry, I forgot I changed that. They should be visible now. system: https://einsteinathome.org/host/13191242 tasks: https://einsteinathome.org/host/13191242/tasks/0/0

Initial results are about what I expected. Roughly in line with a Radeon IIV. I haven't dialed anything in yet, and the Mi100 was temp throttling (HBM) under load. Results are pretty sloppy, and not really a good representation of what the ultimate performance can be. I'll sort out the cooling issues, and post some proper results over the weekend. The shroud I printed plus an 80mm 3k RPM fan is not up to the task of cooling over 200W on my test bench. Tasks in the results are running 6x with the GPU power capped at 215W, and the rest of the system specs are below.

OS: Arch Linux | Kernel: 6.10 | ROCm: 6.1.2 | CPU: AMD Threadripper 7960x locked @ 5GHz all-core.

For my P100 with passive heatsink, I use a push-pull fan configuration setup. Card ran with power limit of 140 to 150W range that I usually use. With this setup that I posted in STH site, I didn't get much drop, probably around 2-3°C iirc but good enough for my use condition (trying to run in low seventies °C and "quiet". The push fan ran like 2200 rpm without the jet engine noise). The pull/exhaust fan is just a regular fan (low rpm), so I zip tied two fans together to increase the static pressure a bit. Probably will be better with a true server high rpm fan. The post above me uses a blower fan (Nidec gamma 30) on the V100 and I read somewhere that similar setup will be loud.

BTW, can the Mi100 be power limited with corectrl or via other mean?

As a reference, I ran 6 tasks concurrently on VII. I do see a slight gain going from 5 to 6 tasks. The Mi100 in theory should be able to do more.

All things Amd GPU

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner