BRP6_1.56_cuda55 - Driver API not loading on either Host

JBird
Joined: 22 Dec 14
Posts: 1963
Credit: 4046216051
RAC: 0

Fog Buster! I think I've

Fog Buster!

I think I've found my particular *reason why - An existing app_config.xml - left over from previous 1.52 app, that bears the BRP6_cuda32 Planclass- instead of the BRP6-Beta-cuda55 moniker, retrieved from BMgr Tasks header.

Can only experiment by editing this and run 1 or 2
Here is current app_config:


einsteinbinary_BRP6
BRP6-cuda32-nv301
0.2
.5


===================
Do you concur, that changing the plan_class to This: BRP6-Beta-cuda55 would fix it? Or do I need to add v1.56 in there somewhere?
===
Please advise. Thanks!
=

@ floyd = Exactly!
=
@ Richard = Thanks for the Notes. I'm not versed in Process Explorer *yet- good time to start! I appreciate the nudge. - these cufft and cudart dlls came in with my fresh reload on both machines. In the recent past when changing apps, BOINC Einstein Server has kindly "removed" the no longer needed .bins and .zaps etc. Didn't happen that way this time. Cache was empty, all Pendings Validated.
And while I waited for Cache to empty before refilling, y'all released a new Beta!
=
@ ME = My bad for not checking. So glad to find this bug today!
=
@ archae86 = Thanks a million for finding what *you found, upon inspection - and sharing those findings with me/us.
=

JBird
Joined: 22 Dec 14
Posts: 1963
Credit: 4046216051
RAC: 0

Perhaps you wanted to know

Perhaps you wanted to know why the reported GFLOPS was negative?

Yes in fact!

Below is BOINCs own post In Boinc Manager Event Log which does give an accurate number.

7/9/2015 6:10:39 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 960 (driver version 353.30, CUDA version 7.5, compute capability 5.2, 2048MB, 1923MB available, 2618 GFLOPS peak)
=
Same for the 970 on other Host
=
So, just hoping application execution code not dependent on this misread.
It may only be meaningful to dcf or whatever estimate clock is based on; I've seen Estimated Computation size: 590000 gflops listed in Properties of tasks.
But wont know if this will be vastly improved (estimated completion time) when I edit the cuda55 entry in app_config - I'm expecting it too.
Here's hoping....

Adjusting the detection routine in stderr, is on youguys

JBird
Joined: 22 Dec 14
Posts: 1963
Credit: 4046216051
RAC: 0

OK edited app_config to

OK edited app_config to this:


einsteinbinary_BRP6
BRP6-Beta-cuda55
0.2
.5


=
Running 2 now.
=
went to slot 11 and found this
20:43:54][3204][INFO ] Version of installed CUDA driver: 7050
[20:43:54][3204][INFO ] Version of CUDA driver API used: 3020

and cudart32_55 and cufft32_55 (dlls present and the exec of course
=
So, still missing something - perhaps a specific cuda55 _5050 file_ref or other such ID - to load 5050.

Care to share your app_config that WORKS and Loads this Driver - Anyone?

Note: This stderr from slot 11 was timed with the app start

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117542363460
RAC: 35347667

RE: You did raise *another

Quote:
You did raise *another question I've had for a while, re: negative GFLOPS reporting; which I believe, I can remedy if I can find the post.


OK, since you are concerned about this, here is a response. You need to understand that the stderr output is generated by the BRP6 app. The author(s) of that app are NOT the BOINC Devs. It takes time and effort to add code to properly detect all varieties of GPU cards and it's not surprising that the Einstein Devs don't waste their time trying to keep up with every different hardware variant that comes on the market when they don't have to. This information in stderr output is purely cosmetic. It does NOT in any way affect the calculations.

With BOINC, it's a different matter. The BOINC Devs take a lot of time and effort to have the BOINC client properly detect the hardware. This is why the client tends to keep up with new hardware but the science app doesn't have to. If BOINC properly detects the hardware, then the science app can use it even if it makes cosmetic mistakes about what the hardware is really capable of. Sure, it would be nice to see the same value that BOINC produces but it doesn't make any difference to the performance if it is reported incorrectly.

Quote:
But, back to my original query: Yes I should have said,
"The *appropriate (5050) CUDA Driver API was not loaded"


I'm sorry, this doesn't really make sense to me because you don't load an API. As I said previously, I'm not a programmer so my understanding is likely to be flawed but here goes anyway.

API stands for Application Programming Interface. It is a language and message format used by an application program to communicate with the operating system or some other control program or communications protocol. APIs are implemented by writing function calls in the program, which provide the linkage to the required system routines for execution. Over time, APIs evolve and have different versions, just like other software systems do. Since a driver - a bit of software that handles a particular bit of hardware - is a different thing to an API - a language and message format - I can't see any reason why these two different things would necessarily have to have the same version number.

All you need to worry about is having the correct driver and CUDA libs installed on your computer. The Developer will worry about whether on not the correct API is being used for the app to properly communicate with and use the proper OS routines. The only thing you should expect to see is a message saying that the app has found a Driver Version equal to or greater than 5050. You shouldn't care about the API version, particularly if the app is working and results are validating.

Cheers,
Gary.

JBird
Joined: 22 Dec 14
Posts: 1963
Credit: 4046216051
RAC: 0

Thanks Gary for the

Thanks Gary for the 411.
Semantics aside,
My goal remains re: 1.56 BRP6-Beta-cuda55.
I am not *sufficiently configured here

Please tell me what file needs modifying - editing -spell-checked or what, on my machines to enable cuda55/5050 USE.
=


einsteinbinary_BRP6
BRP6-Beta-cuda55
0.2
.5


=
Something is *missing - or improper Syntax, bad or missing File name -
Something somewhere (another file?) is not properly activated or in the wrong place.
What and where is it?

Please and Thank you

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18714219530
RAC: 6368488

RE: Use Process Explorer

Quote:

Use Process Explorer while the task is running to see which DLL files are being used, and where they're being loaded from.

I was caught out many years ago at SETI Beta, when I confidently predicted that an application would fail because it had been deployed wrongly - but it ran successfully and validated.

Process explorer demonstrated (screenshot in that link) that Windows had found some SDK files I'd forgotten I'd even installed on that machine - and they were the right ones for the application.

In general, Windows will try to use the files supplied by Einstein first, but if you've installed developer files (not needed for normal crunching), they may be used instead.

Richard, I have no experience with Process Explorer yet but I did see something curious when I ran the 1.57 CUDA 5.5 app against Dependency Walker. It showed the naming of the CUDA dll's being different than the actual names that Einstein gives the CUDA 5.5 dll's. Dependency Walker throws errors, "file not found".

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18714219530
RAC: 6368488

I've completed two 1.57 tasks

I've completed two 1.57 tasks so far. One valid and another pending. I'm seeing absolutely no improvement in processing times. The tasks ran in the same normalized times as my previous 1.52 tasks. Based on the times reported for the new CUDA 5.5 tasks documented in this thread I should have seen an improvement on the order of 1000-2000 sec less running time. So is this proof that the app is running on the old CUDA 3.2 libraries?

 

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956179797
RAC: 715852

RE: RE: Use Process

Quote:
Quote:

Use Process Explorer while the task is running to see which DLL files are being used, and where they're being loaded from.

I was caught out many years ago at SETI Beta, when I confidently predicted that an application would fail because it had been deployed wrongly - but it ran successfully and validated.

Process explorer demonstrated (screenshot in that link) that Windows had found some SDK files I'd forgotten I'd even installed on that machine - and they were the right ones for the application.

In general, Windows will try to use the files supplied by Einstein first, but if you've installed developer files (not needed for normal crunching), they may be used instead.


Richard, I have no experience with Process Explorer yet but I did see something curious when I ran the 1.57 CUDA 5.5 app against Dependency Walker. It showed the naming of the CUDA dll's being different than the actual names that Einstein gives the CUDA 5.5 dll's. Dependency Walker throws errors, "file not found".


See archae86's message 141792 in Gary's timing discussion thread, in Cruncher's Corner.

Einstein uses BOINC's "copy and rename" facility on these DLLs. Copy the app and anything mentioned in the sections listed to a scratch folder. Rename those copies (only) of the two cuda DLLs as shown in , and retry Dependency Walker. That's how I found the dependency on LIBWINPTHREAD-1.DLL for v1.54 reported in Technical News.

Edit - in reply to your second post, look in the slot directory while the app is running. And double-check your own section for v1.57 in client_state.xml

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18714219530
RAC: 6368488

Thanks for the help, Richard.

Thanks for the help, Richard. I can see now where the explicit files get renamed for use in the slots. I tried your experiment with a scratch directory and the files and got rid of the file not found errors for the CUDA 5.5 dll's. I still have quite a few file errors in Dependency Walker though. Then after a few minutes of DW being open on the files in the scratch directory, a good portion of the files not found were then found. Still have errors in IESHIMS.DLL and DCOMP.DLL. Six more files with the API-MS-WIN header listed also. Don't know the significance of that. Looked into the client_state file and the app version seems to be in place for 1.57. I've looked at several other Einstein users reporting the 1.57 app as valids and all their results show the CUDA API version as 3020 also. So I guess all that is benign and nothing to worry about. Sure would like to see some improvement in runtimes though I haven't finished many yet.

 

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577766879
RAC: 199463

JBird, wouldn't it be easier

JBird, wouldn't it be easier to simply rename the app_info and test? This way everything should run as intended by the project. Then you know how it should work like and can try again to recreate this via an app_info, if you want to use the new beta app for all tasks. Otherwise simply use the much easier app_config to set the number of concurrent tasks.

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.