All things Amd GPU

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3983
Credit: 47421722642
RAC: 62430029

I do remain skeptical that an

I do remain skeptical that an MI100 would even be twice as fast for O3AS, even with the specs listing hypothetical FP32 specs showing ~1.75x speedup (FP64 doesn’t really matter with O3AS, and memory bandwidth is only ~20% better), when it’s so dependent on CPU speed. The CPU portion won’t speed up at all. 
 

I would have loved to try one out via a vast rental. But none are available.

_________________________________________________________________________

pututu
pututu
Joined: 6 Apr 17
Posts: 63
Credit: 653417392
RAC: 2

Nothing beat actual

Nothing beat actual performance numbers other than our best educated guesses based on published specifications and how O3AS compute works. I still remain optimistic that if Mi100 can get 1.75x performance over VII, it will still do 5M PPD or complete 500 O3AS tasks per day. As for CPU, pretty much all the consumer/prosumer CPUs  released over the past 5 years can do 4GHz+ easily either running at stock or with overclocking.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6483
Credit: 9608607016
RAC: 5516543

Looks like there are 4

Looks like there are 4 available on eBay for $699 or best offer.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

pututu
pututu
Joined: 6 Apr 17
Posts: 63
Credit: 653417392
RAC: 2

AMD HPC presentation slides

AMD HPC presentation slides at NASA Ames research center with notable mention on Mi100 Instinct performance. As this is from the first party presentation, as always take it with a grain of salt.

I didn't digest the entire slides carefully but in slide #13 AMD claims its Mi100 to be faster than V100 in areas of computational simulation. 

In slide #23, AMD claims that its hip/hipcc compiler can match cuda compiler in terms of runtime performance. In slide #22, AMD talks about seamless porting from CUDA to HIP APIs. Not relevant here but worth mentioning AMD plans to take a slice out of CUDA domination.

Slide #33 end notes describes the test conditions/assumptions used.

 

 

 

 

tictoc
tictoc
Joined: 1 Jan 13
Posts: 45
Credit: 7268031987
RAC: 7072309

pututu wrote: AMD HPC

pututu wrote:

AMD HPC presentation slides at NASA Ames research center with notable mention on Mi100 Instinct performance. As this is from the first party presentation, as always take it with a grain of salt.

I didn't digest the entire slides carefully but in slide #13 AMD claims its Mi100 to be faster than V100 in areas of computational simulation. 

In slide #23, AMD claims that its hip/hipcc compiler can match cuda compiler in terms of runtime performance. In slide #22, AMD talks about seamless porting from CUDA to HIP APIs. Not relevant here but worth mentioning AMD plans to take a slice out of CUDA domination.

Slide #33 end notes describes the test conditions/assumptions used.

There are a number of GPU compute projects (non-BOINC) that have successfully ported CUDA apps over to HIP.  AMD has some tools that can be used for converting CUDA source to HIP. https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-hipify-readme/  Additionally, HIP code can run on both AMD and NVIDIA GPUs.

I've yet to see any BOINC projects look at using HIP for AMD GPUs.  That pretty much boils down to the fact that the full ROCm and HIP stack is only available on Linux, and for BOINC projects, AMD GPU users running BOINC on Linux are a niche of a niche.  Also, the ecosystem around ROCm, while much better than it was just a few years ago, is still in it's infancy compared to CUDA (SDK released in 2007).

The only DC related work that I've seen any HIP related work on is Folding@Home.  Folding@Home does not currently have a HIP application, but the code that runs the GPU work (OpenMM) does have a plugin for running AMD GPUs in Linux via HIP.  For OpenMM the improvement over the OpenCL implementation is massive. OpenMM HIP Plugin maintained by AMD: https://github.com/amd/openmm-hip . The original plugin repo with some OpenCL-HIP comparison tests on consumer hardware: https://github.com/StreamHPC/openmm-hip-old/issues/1 

tictoc
tictoc
Joined: 1 Jan 13
Posts: 45
Credit: 7268031987
RAC: 7072309

pututu wrote: Nothing

pututu wrote:

Nothing beat actual performance numbers other than our best educated guesses based on published specifications and how O3AS compute works. I still remain optimistic that if Mi100 can get 1.75x performance over VII, it will still do 5M PPD or complete 500 O3AS tasks per day. As for CPU, pretty much all the consumer/prosumer CPUs  released over the past 5 years can do 4GHz+ easily either running at stock or with overclocking.

 

Back in 2021 when I had a handful of MI100s to test, we did see about a 2x performance improvement over the MI50, but that was with a primarily fp64 simulation workload in a 4 GPU cluster.  Not at all comparable to what what we are running here. Single GPU performance was around 1.2x-1.6x depending on the workload.

For anyone that is interested, I just put an MI100 on my test bench.  Once I have everything setup, I'll post some results.  

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6483
Credit: 9608607016
RAC: 5516543

Tictoc,I am definitely

Tictoc,

I am definitely interested in your Mi100 results.

###edit###

Your computers are hidden. Could you post a link?

TY.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

tictoc
tictoc
Joined: 1 Jan 13
Posts: 45
Credit: 7268031987
RAC: 7072309

Sorry, I forgot I changed

Sorry, I forgot I changed that.  They should be visible now. system: https://einsteinathome.org/host/13191242 tasks: https://einsteinathome.org/host/13191242/tasks/0/0

Initial results are about what I expected. Roughly in line with a Radeon IIV.  I haven't dialed anything in yet, and the Mi100 was temp throttling (HBM) under load.  Results are pretty sloppy, and not really a good representation of what the ultimate performance can be.  I'll sort out the cooling issues, and post some proper results over the weekend.  The shroud I printed plus an 80mm 3k RPM fan is not up to the task of cooling over 200W on my test bench.  Tasks in the results are running 6x with the GPU power capped at 215W, and the rest of the system specs are below.

OS: Arch Linux | Kernel: 6.10 | ROCm: 6.1.2 | CPU: AMD Threadripper 7960x locked @ 5GHz all-core. 

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6483
Credit: 9608607016
RAC: 5516543

It also looks like your using

It also looks like your using a lot less CPU time than run time.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

pututu
pututu
Joined: 6 Apr 17
Posts: 63
Credit: 653417392
RAC: 2

tictoc wrote:Sorry, I

tictoc wrote:

Sorry, I forgot I changed that.  They should be visible now. system: https://einsteinathome.org/host/13191242 tasks: https://einsteinathome.org/host/13191242/tasks/0/0

Initial results are about what I expected. Roughly in line with a Radeon IIV.  I haven't dialed anything in yet, and the Mi100 was temp throttling (HBM) under load.  Results are pretty sloppy, and not really a good representation of what the ultimate performance can be.  I'll sort out the cooling issues, and post some proper results over the weekend.  The shroud I printed plus an 80mm 3k RPM fan is not up to the task of cooling over 200W on my test bench.  Tasks in the results are running 6x with the GPU power capped at 215W, and the rest of the system specs are below.

OS: Arch Linux | Kernel: 6.10 | ROCm: 6.1.2 | CPU: AMD Threadripper 7960x locked @ 5GHz all-core. 

 

 

For my P100 with passive heatsink, I use a push-pull fan configuration setup. Card ran with power limit of 140 to 150W range that I usually use. With this setup that I posted in STH site,  I didn't get much drop, probably around 2-3°C iirc but good enough for my use condition (trying to run in low seventies °C and "quiet". The push fan ran like 2200 rpm without the jet engine noise).  The pull/exhaust fan is just a regular fan (low rpm), so I zip tied two fans together to increase the static pressure a bit. Probably will be better with a true server high rpm fan. The post above me uses a blower fan (Nidec gamma 30) on the V100 and I read somewhere that similar setup will be loud.

BTW, can the Mi100 be power limited with corectrl or via other mean?

As a reference, I ran 6 tasks concurrently on VII. I do see a slight gain going from 5 to 6 tasks. The Mi100 in theory should be able to do more. 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.