CUDA WUs failing on Linux

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0
Topic 195779

This isn't necessarily a bug report, but I thought to post it because the Boinc application handles it in a very user-unfriendly way.

I'm using Debian Linux (Squeeze) on this computer. I recently installed a new NVIDIA GPU to crunch some CUDA work units. Installation of the kernel module went fine and Boinc detected it. Problems started occurring when all the CUDA work units immediately terminated with "Computation error" or such. It did not give any other information what to do, so I searched this way:

# cd /var/lib/boinc-client/
# grep error *
client_state.xml:../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory

So there's the cause, let's see the other dependecies too:

# ldd projects/einstein.phys.uwm.edu/einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270
linux-gate.so.1 => (0xf76f5000)
libcufft.so.3 => not found
libcudart.so.3 => not found
libcuda.so.1 => /usr/lib32/libcuda.so.1 (0xf6ccb000)
libpthread.so.0 => /lib32/libpthread.so.0 (0xf6cb2000)
libm.so.6 => /lib32/libm.so.6 (0xf6c8c000)
libstdc++.so.6 => not found
libc.so.6 => /lib32/libc.so.6 (0xf6b45000)
/lib/ld-linux.so.2 (0xf76f6000)
libz.so.1 => not found
libdl.so.2 => /lib32/libdl.so.2 (0xf6b40000)
libgcc_s.so.1 => /usr/lib32/libgcc_s.so.1 (0xf6b22000)

It turns out libcufft.so.3 and libcudart.so.3 are already installed, so all that remains are the two files:

# apt-file find libz.so.1
lib32z1: /usr/lib32/libz.so.1

# apt-file find libstdc++.so.6
lib32stdc++6: /usr/lib32/libstdc++.so.6

Needless to say, Debian users have to install packages lib32z1 and lib32stdc++6 to be able to process CUDA work units.

mikey
mikey
Joined: 22 Jan 05
Posts: 12829
Credit: 1883631328
RAC: 1106449

CUDA WUs failing on Linux

Quote:

This isn't necessarily a bug report, but I thought to post it because the Boinc application handles it in a very user-unfriendly way.

I'm using Debian Linux (Squeeze) on this computer. I recently installed a new NVIDIA GPU to crunch some CUDA work units. Installation of the kernel module went fine and Boinc detected it. Problems started occurring when all the CUDA work units immediately terminated with "Computation error" or such. It did not give any other information what to do, so I searched this way:

It turns out libcufft.so.3 and libcudart.so.3 are already installed, so all that remains are the two files:

Needless to say, Debian users have to install packages lib32z1 and lib32stdc++6 to be able to process CUDA work units.

What you are seeing is a major reason alot of people don't go to Linux, it is not yet a true plug and play with respect to the many things that people can do with Windows. No Windows is NOT better but in alot of respects it is easier. Boinc for Windows installs everything you will need and just works, the couple of things it has troubles with, switching users while using the gpu to crunch and multiple gpu's, are being addressed in the Beta versions now.

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0

RE: What you are seeing is

Quote:
What you are seeing is a major reason alot of people don't go to Linux, it is not yet a true plug and play with respect to the many things that people can do with Windows. No Windows is NOT better but in alot of respects it is easier. Boinc for Windows installs everything you will need and just works, the couple of things it has troubles with, switching users while using the gpu to crunch and multiple gpu's, are being addressed in the Beta versions now.


Sure, Windows has the advantage of being easier to install, for example hardware manufacturer's foremost concern is to have their products be plug and play on Windows. There are also countless of desktop application which do not run on platforms other than Windows without considerable effort, so in that sense Windows has a locked-in user base.

Server side, it's a whole different world. Even the thought of having to manage a Windows server is just *painful* to me. On HPC clusters Linux is even more dominant, and it's easy to see why over 90% use it, because using Linux you can get the maximum performance of just about any hardware.

I actually have a couple of Debian machines running Boinc with a diskless PXE boot setup using NFS. There's nothing especially special but the basic system would be easily scalable to dozens of computers. I might write a tutorial of it. Anyone interested?

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 1

RE: Boinc for Windows

Quote:
Boinc for Windows installs everything you will need and just works


Actually, you will find that you need a pretty much updated version of Windows, before it works. So at least Windows Installer 3.1 and at least all the security and root certificates updates. Also good idea is to have the DirectX version updated to one of the latest 9.0c updates.

SP3 isn't necessary, it'll work on SP2. But only just. And no idea for how long, still.

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

Interesting. I have found, on

Interesting. I have found, on Kubuntu, that certain kernel/Nvidia driver combinations have trouble. I haven't been able to pin down why one works and another doesn't. However, when I have a working combination, I am very reluctant to upgrade either my Kernel or nvidia drivers.

However, I have noticed that having compositing enabled on the desktop (KDE) can occasionally make the system unhappy with cuda.

As far as the libraries mentioned above, do you think they should be set up as dependencies of one of the components in the boinc/cuda/E@H chain?

mikey
mikey
Joined: 22 Jan 05
Posts: 12829
Credit: 1883631328
RAC: 1106449

RE: RE: Boinc for Windows

Quote:
Quote:
Boinc for Windows installs everything you will need and just works

Actually, you will find that you need a pretty much updated version of Windows, before it works. So at least Windows Installer 3.1 and at least all the security and root certificates updates. Also good idea is to have the DirectX version updated to one of the latest 9.0c updates.

SP3 isn't necessary, it'll work on SP2. But only just. And no idea for how long, still.

I agree! Both you Jord and Temporal are correct..Windows is only easier not better in the sense of being able to configure the pc! Windows can be a royal pain and very finicky when it comes to how things are done. Windows does have a huge base of users though, from alot of pc's in the workplace which use Windows. People don't like learning too much so when they learn Windows at work, they bring it home with them too. My wife now has a Mac to do her pictures but a Windows machine for work. The Mac is new and she is sometimes pushing the keys to make things work and having to remember that they are different on a Mac!

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0

RE: Interesting. I have

Quote:

Interesting. I have found, on Kubuntu, that certain kernel/Nvidia driver combinations have trouble. I haven't been able to pin down why one works and another doesn't. However, when I have a working combination, I am very reluctant to upgrade either my Kernel or nvidia drivers.

However, I have noticed that having compositing enabled on the desktop (KDE) can occasionally make the system unhappy with cuda.

As far as the libraries mentioned above, do you think they should be set up as dependencies of one of the components in the boinc/cuda/E@H chain?


Do you mean dependencies as in which system? They are some kind of general purpose libraries, so I guess Debian package manager should install them. Making them dependencies of boinc-client package wouldn't make much sense because this is only one of the many projects of the Boinc system. I'm not sure how the dependencies could be better managed. Maybe it only needs to output more meaningful error messages.

As for the Nvidia driver problems, I can't say much without knowing more details. Possibly it is an issue with the Ubuntu stock kernel?

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

I am guessing that the only

I am guessing that the only trigger points would be when either the nvidia drivers or boinc is installed. The apt/dpkg dependency system should be installing them at some stage between the two. I would hope the whole point of package dependency systems is to avoid such hunting in the dark.

Am I right in assuming you are running a 64 bit OS? There has been some discussion of needing to install 32 bit compatibility packages to make it all work. I am sticking with 32 bit for now, so my input is limited if this is the case.

szaman
szaman
Joined: 24 May 11
Posts: 1
Credit: 712633
RAC: 0

I had simmilar problem (on

I had simmilar problem (on amd64 unstable debian) and everything worked after installing following packages:

lib32cudart3
lib32cufft3
lib32stdc++6
lib32z1
libcuda1-ia32

What's funny, I have with my Phenom 2 X6 1090T (6-core 3.2GHz) CPU and GTX480 CUDA GPU following estimated speeds:

In Windows7 (x64): 2.6 GFlops for CPU and 26 GFlops for GPU
In Linux: 3 GFlops for CPU and 30 GFlops for GPU

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.