Support for (integrated) Intel GPUs (Ivy Bridge and later)

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181400534

RAC: 8964

LoL, good

16 Oct 2013 21:15:26 UTC

Message 117447

(moderation:

)

LoL, good written:

Quote:

CPU utilization while running GPU OpenCL applications is a problem for NVidia as well as Intel. The application uses only asynchronous kernel and memory copy functions but the driver "overrides" and decides to block anyway. One "hack" suggested by NVidia was to create allocate a large number of command queues so that the driver would think it should use the actual asynchronous calls, but that has not worked for me. If it wasn't driver related, then why would the same application work fine on AMD GPUs? So, write your local Intel developer and ask them why their driver needs CPU time to run asynchronous OpenCL kernels. You would think it would work properly with NVidia GPUs since OpenCL came from them. Nope. NVidia can't even follow their own specs properly. And it they can't, why should Intel? AMD isn't off the hook though. While their driver does async properly, it won't always compile the app properly. Maybe they should be renamed from NVidia, Intel and AMD to Larry, Moe, and Curly!

http://boinc.thesonntags.com/collatz/forum_thread.php?id=1019&postid=16833

At least I'm not alone in this boat :)

Unfortunately, has absolutely no idea what that "kernels per reduction" does in his app. So hard to comment how it would change load.
regarding many queues - this didn't work for me either. I tried to create smth like 20 queues - not change in app behavior was detected. Unlike Slicker I use simpler approach most of time, synchronous launches vs async ones... but looks like going to async will not help either.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181400534

RAC: 8964

If we start to share Intel

16 Oct 2013 21:22:27 UTC

Message 117448

(moderation:

)

If we start to share Intel OpenCL experience here, I would like to discuss another issue I have with SETI iGPU AP. Loss of precision.
Looks like this app produces slightly different results and more often lead to inconclusives. So far I tracked it to FFT call that results in FFT with ALL values slightly bigger that reference array (AMD implementation was used as reference it validates with CPU stock most of time). Such non-random deviation in same side surely will lead to deviation in final result but where this systematic shift appears first time not quite clear.

What about Einstein's Intel app validation? Any issues over same app for AMD/NV ?

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2954713277

RAC: 711935

RE: What about Einstein's

16 Oct 2013 23:05:05 UTC

Message 117449 in response to message 117448

(moderation:

)

Quote:

What about Einstein's Intel app validation? Any issues over same app for AMD/NV ?

Any issue like that would (should) be reported in the BRP4 Intel GPU app feedback thread in Problems and Bug Reports.

A quick search through doesn't reveal any validation problems except when ETA deliberately lowered the chip voltage and went too far.

My Haswell (154,193 credits in a month, so at least 2,500 completed tasks) is showing zero errors and zero invalid at the moment.

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

RE: If we start to share

17 Oct 2013 8:41:28 UTC

Message 117450 in response to message 117448

(moderation:

)

Quote:

If we start to share Intel OpenCL experience here, I would like to discuss another issue I have with SETI iGPU AP.

Hm, I think any deeper discussions should move to (a) separate thread(s).

Quote:

Loss of precision.
Looks like this app produces slightly different results and more often lead to inconclusives. So far I tracked it to FFT call that results in FFT with ALL values slightly bigger that reference array (AMD implementation was used as reference it validates with CPU stock most of time). Such non-random deviation in same side surely will lead to deviation in final result but where this systematic shift appears first time not quite clear.

Are you sure AMD's FFT is portable to other vendors' GPUs? Also, as far as I know they generate different kernel implementations (which you then can dump) based on the FFT setup and potentially even the hardware. We use a customised version of Apple's reference FFT, originally designed for NV's G80 architecture. You can get our version here.

Quote:

What about Einstein's Intel app validation? Any issues over same app for AMD/NV ?

If anything, the Intel GPU are even more stable in terms of validation than the AMD GPUs. Our Intel tasks exhibit less than 0.1% validation issues, which is about the level for CPUs. NVIDIA GPUs are solely covered by CUDA in our case, so there's no point in comparing them to the OpenCL tasks. FYI, we build all our OpenCL apps using AMD's APP SDK.

Oliver

Einstein@Home Project

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

RE: More precisely: SETI

17 Oct 2013 9:05:31 UTC

Message 117451 in response to message 117442

(moderation:

)

Quote:

More precisely: SETI AP is OpenCL 1.0 app. It can run under all OpenCL drivers starting from very first ones (it appears quite long ago, right when AMD starts to implement openCL on their GPUs).

We didn't care about the 1.0 models as they were so slow that even a contemporary CPU core was faster :-)

Quote:

If Einstein's app is true OpenCL 1.1 one it can use some different methods for hostGPU synching (events). So I'm wonder is it the case or not ?

Nope, nothing fancy there. You may a have look at the source code (binary radio pulsar search application: src/opencl/app) to see for yourself (I know, not the best design). It's not the very latest version but there's been only one (irrelevant) functional OpenCL change since then.

Oliver

Einstein@Home Project

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

RE: So I'm interesting

17 Oct 2013 9:26:11 UTC

Message 117452 in response to message 117441

(moderation:

)

Quote:

So I'm interesting about typical kernel launch size for Einstein app. I'm not aware if some profiling tools exist for intel, but as both our apps capable to run on all 3 GPU types profiling on NV for example (or on Ati) would be quite enough for my purpose. I have rich profiling data on ATi GPUs for comparison.

As stated in a previous post, NVIDIA is out of the comparison as we only use CUDA on those devices (for the time being). While the algorithm is more or less the same, the CUDA app itself is only roughly comparable to the OpenCL app as it has some natural differences and, most importantly, uses a different FFT implementation (CUFFT).

While we did use NVIDIA's CUDA profiler at some point we didn't yet get around to use the AMD profiler because of technical constraints (cross-compiling for Windows, headless Linux nodes). Not sure I can find the time right now to try this again with the latest tools like CodeXL, though...

Quote:

Also, are Einstein's app sources available and where if yes?

See my previous post :-) You should be able to build the app following the link above and the (basic) instructions provided on that page. You may then give it a try and profile it on your hardware, for a direct comparison.

Oliver

Einstein@Home Project

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181400534

RAC: 8964

RE: Are you sure AMD's FFT

17 Oct 2013 9:37:05 UTC

Message 117453 in response to message 117450

(moderation:

)

Quote:

Are you sure AMD's FFT is portable to other vendors' GPUs? Also, as far as I know they generate different kernel implementations (which you then can dump) based on the FFT setup and potentially even the hardware. We use a customised version of Apple's reference FFT, originally designed for NV's G80 architecture. You can get our version here.

Actually, I use modded oclFFT (Apple's implementation) too. I just meant "reference AMD" in sense that results got from app running on AMD GPU, not on Intel. But both apps use oclFFT, not AMD-specific one.

Quote:

FYI, we build all our OpenCL apps using AMD's APP SDK.

Oliver

Thanks, it differs from my approach, I build app for particular GPU vs particular vendor's SDK.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181400534

RAC: 8964

RE: We didn't care about

17 Oct 2013 9:42:39 UTC

Message 117454 in response to message 117451

(moderation:

)

Quote:

We didn't care about the 1.0 models as they were so slow that even a contemporary CPU core was faster :-)

Well, HD4870 was fast enough even being used via Brook+ ;)

Quote:

You may a have look at the source code (binary radio pulsar search application: src/opencl/app) to see for yourself (I know, not the best design). It's not the very latest version but there's been only one (irrelevant) functional OpenCL change since then.

Oliver

Thanks, will look.

PS. If I understood right, Intel's GPU binary I have on my system could run on NV GPU too w/o modification, right?
Is there any bench package for offline testing? AFAIK Einstein uses lot of additional data files so offline testing could be big issue...

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

RE: Actually, I use modded

17 Oct 2013 9:47:05 UTC

Message 117455 in response to message 117453

(moderation:

)

Quote:

Actually, I use modded oclFFT (Apple's implementation) too. I just meant "reference AMD" in sense that results got from app running on AMD GPU, not on Intel. But both apps use oclFFT, not AMD-specific one.

Oh, in that case I recommend you try our version. Apple's implementation had some issues on a few AMD Radeon series GPUs (the 6900 series IIRC) because of their use of the faster but less accurate native_sin/native_cos functions. Have a look at the commit log (starting at 48a3c01) to get an idea.

I know, it's unrelated to the CPU usage issue but it might help with your potential validation problems.

Oliver

Einstein@Home Project

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

RE: PS. If I understood

17 Oct 2013 9:53:36 UTC

Message 117456 in response to message 117454

(moderation:

)

Quote:

PS. If I understood right, Intel's GPU binary I have on my system could run on NV GPU too w/o modification, right?

Well, in principle yes. However, this OpenCL app never really worked on NVIDIA GPUs, primarily in terms of validation - even when we used their SDK to build it. NVIDIA's support of OpenCL is a dead end anyway.

Quote:

Is there any bench package for offline testing? AFAIK Einstein uses lot of additional data files so offline testing could be big issue...

Not provided with that source code package but you can always hook up your host to our project and get a task for it. Take the files downloaded and the command line and you should be good to go. I recommend you use the BRP4 app as it required the least number of input/data files (just 3).

Oliver

Einstein@Home Project

Support for (integrated) Intel GPUs (Ivy Bridge and later)

Forums › Technical News

Comment viewing options

Forums › Technical News