That is certainly true now but it wasn't the case around a day or so ago. At that point, the host details said (erroneously) that there were [2] GTX 460s whereas there would have been just one plus the 8400 GS. I just happened to look at the host details at an opportune moment but I didn't have time to composa a message at that point. I guess both cards have been removed now to stop the trashing of tasks.
If you look at the stderr.txt output from a few tasks you can find examples like this one which shows the GPU used (device #1) as being the 8400 GS. That task (like many others) immediately failed with the error message
[06:20:46][5798][ERROR] Error executing CUDA FFT plan (error code: 6)
[06:20:46][5798][ERROR] Demodulation failed (error: 1012)!
There are a very small number of examples like this one which have actually completed successfully. Theee were crunched using device #0 which is listed as a GTX 460. I can find at least two of these so the 460 must have been up and running correctly for a few hours at least. It seems like things would work properly if just the 8400 GS were removed.
I was surprised when I checked and found that my boinc-client was not showing any cuda cards. I checked a number of discussion groups here and over at MW and got enough ideas that I think that I've got things fixed. Aside from the other changes, the really critical one has been that as soon as I get logged in to the computer to go to the terminal screen and, as root, do a "service boinc-client restart".
Then I start the Boinc Manager, go to the "Messages" tab and find that Boinc has correctly recognized both cuda cards. If I don't do a re-start, then Boinc doesn't show the cards,
I can't explain why the "show computer" indicates 2 Fermi cards.
The other issue that I had mentioned, the disappearing listing of WU's on the Task tab, seems to be very small WU's for the GPUs. I noticed that normally the WU's indicate some hours of estimated completion times. Some of the MW units indicate up to 30 hrs., but when I manage to catch one of the flash WU's, it's estimated time to completion is only, say, 30 minutes. Probably based on a CPU. If a GPU gets it, then it goes faster.
The last few times, I quickly got over to the "Messages" tab and saw that I was requesting WU's for the GPU specifically. Then Einstein or MW would quickly send me anywhere up to 14 tasks which were done in something like 45 seconds and the server recontacted for more, which it obliged. If there was a problem, wouldn't it refuse to send more work and stretch out the communications interval?
I checked and I do have "yes" set for Nvidia cards but "no" for stop GPU if computer is busy.
I haven't pulled any cards at all. It's still:
Wed 01 Dec 2010 02:27:30 PM EST NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 3020, compute capability 2.1, 768MB, 673 GFLOPS peak)
Wed 01 Dec 2010 02:27:30 PM EST NVIDIA GPU 1: GeForce 8400 GS (driver version unknown, CUDA version 3020, compute capability 1.1, 511MB, 22 GFLOPS peak)
Anyway, that's where I am at the moment. From here, it looks like everything is working correctly since I'm not getting any error messages like before. I'm confused though, since it sounds like from where you sit, I'm not returning good WU's.
... the really critical one has been that as soon as I get logged in to the computer to go to the terminal screen and, as root, do a "service boinc-client restart".
That suggests to me that BOINC is perhaps being started too early in the startup sequence - ie before the graphics hardware has been properly detected. If you get adventurous you could probably fix that fairly easily :-).
Quote:
I can't explain why the "show computer" indicates 2 Fermi cards.
I think it's just a BOINC limitation. I don't think it causes any huge problems.
Quote:
The other issue that I had mentioned, the disappearing listing of WU's on the Task tab, seems to be very small WU's for the GPUs....
Unfortunately, that can't be the case because "very small" tasks for the GPU don't exist. You need to look closely at your tasks list on the website. If you click on each TaskID for the failed ABP2 tasks you will see that they were being crunched on the 8400 GS and that they failed virtually immediately. If they weren't failing immediately, they would actually take considerably longer to crunch than those being crunched on the GTX 460 and even longer than being crunched on a CPU core only. If you want to keep the 8400 installed, probably you should just disable it in BOINC - follow the instructions that Gundolf supplied.
Quote:
... Einstein or MW would quickly send me anywhere up to 14 tasks which were done in something like 45 seconds ...
Because they were done on the 8400 and errored out immediately.
Quote:
... and the server recontacted for more, which it obliged. If there was a problem, wouldn't it refuse to send more work and stretch out the communications interval?
There is a daily quota which reduces by just one for each failed task. It takes quite a lot of failed tasks before your quota gets reduced to the minimum. If you get just one successful task occasionally, that would be enough to keep restoring your quota to the full value. If you look at your tasks list, you can see why your quota is always being restored.
Quote:
... it looks like everything is working correctly since I'm not getting any error messages like before. I'm confused though, since it sounds like from where you sit, I'm not returning good WU's.
Exactly. It's only good at the moment because there is no regular supply of ABP2 tasks for maybe a week or two. The problems will return once the project starts distributing new radio pulsar data to crunch.
When I was checking the other discussion groups, they also mentioned the possibility that Boinc might be starting too soon. I thought that the re-start would take care of that.
Right now, I've pulled the 8400GS card so that it's only the Fermi card and just checked the website. There is one successfully completed WU with the others "in progress". I clicked on the work unit and it successfully saw the 460 and worked with it.
Unfortunately, over at MW, it's still the same problem. When I click on the WU, it shows that it correctly identifies the card but then says that there is no cuda or double-precision card present. I wonder if it's a driver issue?
Here's what it said at MW, which is what I seem to remember E@H saying earlier:
6.10.58
process exited with code 1 (0x1, -255)
Device index specified on the command line was 0
Looking for a Double Precision capable NVIDIA GPU
The device GeForce GTX 460 from the command line cannot be used because a device supporting compute capability 1.3 (Double Precision) is required
Found 1 CUDA cards
Found a GeForce GTX 460
Device cannot be used, it does not have compute capability 1.3 support
No compute capability 1.3 cards have been found, exiting...
The GTX 460 is 2.1 compute capability. It's got to be backward-compatible! I wonder if the app has "1.3" hardwired into the code and that's why it's not working? I'll ask over there.
I'm no programmer and have just about hit the wall on this one.
When I was checking the other discussion groups, they also mentioned the possibility that Boinc might be starting too soon. I thought that the re-start would take care of that.
Right now, I've pulled the 8400GS card so that it's only the Fermi card and just checked the website. There is one successfully completed WU with the others "in progress". I clicked on the work unit and it successfully saw the 460 and worked with it.
Unfortunately, over at MW, it's still the same problem. When I click on the WU, it shows that it correctly identifies the card but then says that there is no cuda or double-precision card present. I wonder if it's a driver issue?
Here's what it said at MW, which is what I seem to remember E@H saying earlier:
6.10.58
process exited with code 1 (0x1, -255)
Device index specified on the command line was 0
Looking for a Double Precision capable NVIDIA GPU
The device GeForce GTX 460 from the command line cannot be used because a device supporting compute capability 1.3 (Double Precision) is required
Found 1 CUDA cards
Found a GeForce GTX 460
Device cannot be used, it does not have compute capability 1.3 support
No compute capability 1.3 cards have been found, exiting...
The GTX 460 is 2.1 compute capability. It's got to be backward-compatible! I wonder if the app has "1.3" hardwired into the code and that's why it's not working? I'll ask over there.
I'm no programmer and have just about hit the wall on this one.
Thanks for taking the time on this.
Regards,
Steve
I was on the MilkyWay boards and saw this:
"I'm getting a new Nvidia GTX460 card tomorrow for my @home computer and was wondering if Milky Way will detect it and it's double-precision capability when it starts up or will I have to do something like detach/re-attach to the project.
The old cards although Cuda-capable did not support double-precision.
Applications build for earlier CUDA cards (CUDA1.0 - 3.1) can't run on Fermi style GPUs. However, applications build for Fermi can be run on earlier (non-Fermi) GPUs.
This isn't something the projects decided, it's something that Nvidia decided!
Projects will have to re-release their applications build against the latest Nvidia APIs and they tend to want to do so when they actually change applications anyway as it costs time and money to build, test and implement them.
Thanks for the suggestions. Do you know if there is a Linux (Ubuntu) version?
One good thing, is that with the recent changes that folks here suggested, I am now successfully running all of the E@H WU's including the ABP ones with GPU. I don't know if these are double-precision WU's but they are GPU WUs.
RE: And your Computer
)
That is certainly true now but it wasn't the case around a day or so ago. At that point, the host details said (erroneously) that there were [2] GTX 460s whereas there would have been just one plus the 8400 GS. I just happened to look at the host details at an opportune moment but I didn't have time to composa a message at that point. I guess both cards have been removed now to stop the trashing of tasks.
If you look at the stderr.txt output from a few tasks you can find examples like this one which shows the GPU used (device #1) as being the 8400 GS. That task (like many others) immediately failed with the error message
There are a very small number of examples like this one which have actually completed successfully. Theee were crunched using device #0 which is listed as a GTX 460. I can find at least two of these so the 460 must have been up and running correctly for a few hours at least. It seems like things would work properly if just the 8400 GS were removed.
Cheers,
Gary.
RE: It seems like things
)
Or were ignored by BOINC, by using the 1 option in cc_config.xml.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
Thanks to everyone for the
)
Thanks to everyone for the feedback.
I was surprised when I checked and found that my boinc-client was not showing any cuda cards. I checked a number of discussion groups here and over at MW and got enough ideas that I think that I've got things fixed. Aside from the other changes, the really critical one has been that as soon as I get logged in to the computer to go to the terminal screen and, as root, do a "service boinc-client restart".
Then I start the Boinc Manager, go to the "Messages" tab and find that Boinc has correctly recognized both cuda cards. If I don't do a re-start, then Boinc doesn't show the cards,
I can't explain why the "show computer" indicates 2 Fermi cards.
The other issue that I had mentioned, the disappearing listing of WU's on the Task tab, seems to be very small WU's for the GPUs. I noticed that normally the WU's indicate some hours of estimated completion times. Some of the MW units indicate up to 30 hrs., but when I manage to catch one of the flash WU's, it's estimated time to completion is only, say, 30 minutes. Probably based on a CPU. If a GPU gets it, then it goes faster.
The last few times, I quickly got over to the "Messages" tab and saw that I was requesting WU's for the GPU specifically. Then Einstein or MW would quickly send me anywhere up to 14 tasks which were done in something like 45 seconds and the server recontacted for more, which it obliged. If there was a problem, wouldn't it refuse to send more work and stretch out the communications interval?
I checked and I do have "yes" set for Nvidia cards but "no" for stop GPU if computer is busy.
I haven't pulled any cards at all. It's still:
Wed 01 Dec 2010 02:27:30 PM EST NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 3020, compute capability 2.1, 768MB, 673 GFLOPS peak)
Wed 01 Dec 2010 02:27:30 PM EST NVIDIA GPU 1: GeForce 8400 GS (driver version unknown, CUDA version 3020, compute capability 1.1, 511MB, 22 GFLOPS peak)
Anyway, that's where I am at the moment. From here, it looks like everything is working correctly since I'm not getting any error messages like before. I'm confused though, since it sounds like from where you sit, I'm not returning good WU's.
Regards,
Steve
RE: I can't explain why the
)
That's a "feature" of BOINC (works as designed:-). It always shows the type of the "strongest" video adapter.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: ... the really critical
)
That suggests to me that BOINC is perhaps being started too early in the startup sequence - ie before the graphics hardware has been properly detected. If you get adventurous you could probably fix that fairly easily :-).
I think it's just a BOINC limitation. I don't think it causes any huge problems.
Unfortunately, that can't be the case because "very small" tasks for the GPU don't exist. You need to look closely at your tasks list on the website. If you click on each TaskID for the failed ABP2 tasks you will see that they were being crunched on the 8400 GS and that they failed virtually immediately. If they weren't failing immediately, they would actually take considerably longer to crunch than those being crunched on the GTX 460 and even longer than being crunched on a CPU core only. If you want to keep the 8400 installed, probably you should just disable it in BOINC - follow the instructions that Gundolf supplied.
Because they were done on the 8400 and errored out immediately.
There is a daily quota which reduces by just one for each failed task. It takes quite a lot of failed tasks before your quota gets reduced to the minimum. If you get just one successful task occasionally, that would be enough to keep restoring your quota to the full value. If you look at your tasks list, you can see why your quota is always being restored.
Exactly. It's only good at the moment because there is no regular supply of ABP2 tasks for maybe a week or two. The problems will return once the project starts distributing new radio pulsar data to crunch.
Cheers,
Gary.
Gary and Gundolf, Thanks
)
Gary and Gundolf,
Thanks for the responses.
When I was checking the other discussion groups, they also mentioned the possibility that Boinc might be starting too soon. I thought that the re-start would take care of that.
Right now, I've pulled the 8400GS card so that it's only the Fermi card and just checked the website. There is one successfully completed WU with the others "in progress". I clicked on the work unit and it successfully saw the 460 and worked with it.
Unfortunately, over at MW, it's still the same problem. When I click on the WU, it shows that it correctly identifies the card but then says that there is no cuda or double-precision card present. I wonder if it's a driver issue?
Here's what it said at MW, which is what I seem to remember E@H saying earlier:
6.10.58
process exited with code 1 (0x1, -255)
Device index specified on the command line was 0
Looking for a Double Precision capable NVIDIA GPU
The device GeForce GTX 460 from the command line cannot be used because a device supporting compute capability 1.3 (Double Precision) is required
Found 1 CUDA cards
Found a GeForce GTX 460
Device cannot be used, it does not have compute capability 1.3 support
No compute capability 1.3 cards have been found, exiting...
The GTX 460 is 2.1 compute capability. It's got to be backward-compatible! I wonder if the app has "1.3" hardwired into the code and that's why it's not working? I'll ask over there.
I'm no programmer and have just about hit the wall on this one.
Thanks for taking the time on this.
Regards,
Steve
RE: Gary and
)
I was on the MilkyWay boards and saw this:
"I'm getting a new Nvidia GTX460 card tomorrow for my @home computer and was wondering if Milky Way will detect it and it's double-precision capability when it starts up or will I have to do something like detach/re-attach to the project.
The old cards although Cuda-capable did not support double-precision.
Thanks for any help that you can provide.
Steve
You will need to install a modified app for the 460 to actually run on Milkyway.
http://www.arkayn.us/milkyway/MW_0.24_CUDA.zip
That allows the Fermi cards to run the work units."
RE: It's got to be
)
It isn't.
Applications build for earlier CUDA cards (CUDA1.0 - 3.1) can't run on Fermi style GPUs. However, applications build for Fermi can be run on earlier (non-Fermi) GPUs.
This isn't something the projects decided, it's something that Nvidia decided!
Projects will have to re-release their applications build against the latest Nvidia APIs and they tend to want to do so when they actually change applications anyway as it costs time and money to build, test and implement them.
Mikey, Thanks for the
)
Mikey,
Thanks for the suggestions. Do you know if there is a Linux (Ubuntu) version?
One good thing, is that with the recent changes that folks here suggested, I am now successfully running all of the E@H WU's including the ABP ones with GPU. I don't know if these are double-precision WU's but they are GPU WUs.
Regards,
Steve
Jord, Thanks for the
)
Jord,
Thanks for the information.
Regards,
Steve