CUDA probs and other things

KittenKaboodle
KittenKaboodle
Joined: 9 Feb 11
Posts: 13
Credit: 10765731
RAC: 0
Topic 195669

Hi,
I got 2 PCs with one Nvidea Geforce 460 each.
http://einsteinathome.org/host/3947978
http://einsteinathome.org/host/3947696

I got the following problems:

1) When only 1 CUDA task is running on 3947696, BOINC estimates the time for computing one WU to be an hour. That's close to the real execution time of 0:58. However, when I run 2 CUDA task on the PC, BOINC estimates the execution time to be 21 hours while the real value is only 1:25.
That means, that even if I set the buffer to a value of 10 days, BOINC will only download about 22 CUDA WUs and those WUs get computed in about 7.5 hours. Since this PC is disconnected from the internet during night for about 8 hours, it happens often that the Geforce 460 has not enough work to do.
It also means, that hundreds of CPU WUs get downloaded too and those WUs need many days to compute, which means my wingmen have to wait for a long time before they get their credits.
Is there any way to get more CUDA WUs?

2) Although both PCs got the same graphics card, the one with more memory and better Processor (3947696) is a lot faster when crunching CUDA WUs (about 35%). Letting the CUDA task on 3947978 have more processor time doesn't channge anything. Any idea how I can improve the slow PC (3947978)?

3) This morning, when 3947696 reported the completed WUs back to the server, a lot of WUs were errorous.
The log says "WU download error: couldn't get input files" and "MD5 check failed". Some of those WUs have 0 CPU time which means there was no computing at all while others have very high CPU time values. How can the CPU time be 5000 seconds if the logs says that it couldn't get input files? It's the first time this happened so I am asking myself what went wrong.

Thanks in advance for helping me.

mikey
mikey
Joined: 22 Jan 05
Posts: 12829
Credit: 1883671703
RAC: 1106012

CUDA probs and other things

Quote:

Quote:

Hi,
I got 2 PCs with one Nvidea Geforce 460 each.
http://einsteinathome.org/host/3947978
http://einsteinathome.org/host/3947696

I got the following problems:

1) When only 1 CUDA task is running on 3947696, BOINC estimates the time for computing one WU to be an hour. That's close to the real execution time of 0:58. However, when I run 2 CUDA task on the PC, BOINC estimates the execution time to be 21 hours while the real value is only 1:25.
That means, that even if I set the buffer to a value of 10 days, BOINC will only download about 22 CUDA WUs and those WUs get computed in about 7.5 hours. Since this PC is disconnected from the internet during night for about 8 hours, it happens often that the Geforce 460 has not enough work to do.
It also means, that hundreds of CPU WUs get downloaded too and those WUs need many days to compute, which means my wingmen have to wait for a long time before they get their credits.
Is there any way to get more CUDA WUs?

Maybe...but it would involve switching things manually. If you switched to the single wu crunching and set your cache to 10 days you should get a ton on units, then switch back to dual wu crunching for the night. then i nthe morning report all those units and get more, then in the evening switch again to fill up the cache for the overnight.

Quote:
2) Although both PCs got the same graphics card, the one with more memory and better Processor (3947696) is a lot faster when crunching CUDA WUs (about 35%). Letting the CUDA task on 3947978 have more processor time doesn't change anything. Any idea how I can improve the slow PC (3947978)?

You could make sure the drivers are the same on both but that probably won't help. The simple answer is no there is probably not alot you can do, the pc's and the OS's are different and that means the motherboards are different, the ram is different, the hard drives are different, lots of things different all contributing to different crunching times.

3) This morning, when 3947696 reported the completed WUs back to the server, a lot of WUs were errorous.
The log says "WU download error: couldn't get input files" and "MD5 check failed". Some of those WUs have 0 CPU time which means there was no computing at all while others have very high CPU time values. How can the CPU time be 5000 seconds if the logs says that it couldn't get input files? It's the first time this happened so I am asking myself what went wrong.

Thanks in advance for helping me.

Do you switch users on the pc's? If so you must stop crunching before you do or errors will occur, this is a Windows thing and is not fixable by the current version of Boinc. I have read where the newer 6.12.xx versions address this but am not totally sure it is fixed. It would be the newer 6.12.xx versions not the older ones, 6.12.15 is the newest Beta version. On the Home Page where is says Download Boinc, click on it then click on the All Versions link on the next page and you will see the Beta version download link.

KittenKaboodle
KittenKaboodle
Joined: 9 Feb 11
Posts: 13
Credit: 10765731
RAC: 0

RE: Maybe...but it would

Quote:
Maybe...but it would involve switching things manually. If you switched to the single wu crunching and set your cache to 10 days you should get a ton on units, then switch back to dual wu crunching for the night. then i nthe morning report all those units and get more, then in the evening switch again to fill up the cache for the overnight.

That's what I am doing right now. Thought there might be an explanation why BOINC ist estimating the time so badly.

Quote:
You could make sure the drivers are the same on both but that probably won't help. The simple answer is no there is probably not alot you can do, the pc's and the OS's are different and that means the motherboards are different, the ram is different, the hard drives are different, lots of things different all contributing to different crunching times.

OK, I guess I have to live with it.

Quote:
Do you switch users on the pc's? If so you must stop crunching before you do or errors will occur, this is a Windows thing and is not fixable by the current version of Boinc. I have read where the newer 6.12.xx versions address this but am not totally sure it is fixed. It would be the newer 6.12.xx versions not the older ones, 6.12.15 is the newest Beta version. On the Home Page where is says Download Boinc, click on it then click on the All Versions link on the next page and you will see the Beta version download link.

No, I didn't switch users on that PC. There must be another reason...

mikey
mikey
Joined: 22 Jan 05
Posts: 12829
Credit: 1883671703
RAC: 1106012

RE: RE: RE: Do you

Quote:
Quote:
Quote:
Do you switch users on the pc's? If so you must stop crunching before you do or errors will occur, this is a Windows thing and is not fixable by the current version of Boinc. I have read where the newer 6.12.xx versions address this but am not totally sure it is fixed. It would be the newer 6.12.xx versions not the older ones, 6.12.15 is the newest Beta version. On the Home Page where is says Download Boinc, click on it then click on the All Versions link on the next page and you will see the Beta version download link.

No, I didn't switch users on that PC. There must be another reason...

Did you play games, play videos, in short did you do anything with the pc that would involve more than just Boinc crunching? Almost anything can cause gpu problems when crunching as we are then asking it to do multiple things at once all the while zooming along at nearly 100 percent usage. Then the problem is that the only way to reset a gpu is by restarting the pc.

Oh it is not until version 6.12.16 that the switching users fix will show up in the Beta versions.

KittenKaboodle
KittenKaboodle
Joined: 9 Feb 11
Posts: 13
Credit: 10765731
RAC: 0

RE: Did you play games,

Quote:
Did you play games, play videos, in short did you do anything with the pc that would involve more than just Boinc crunching?

No, not at all. I simply clicked on "update" in the boinc client as first action in the morning.

Still asking myself why BOINC makes a wrong estimation of the execution time when more than 1 GPU WU is running. On my slower PC the estimated time is 14 hours and on the fast PC the estimated time is 21 hours. Really strange...

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 1

RE: Still asking myself why

Quote:
Still asking myself why BOINC makes a wrong estimation of the execution time when more than 1 GPU WU is running. On my slower PC the estimated time is 14 hours and on the fast PC the estimated time is 21 hours. Really strange...


Because the GPU scheduler in BOINC isn't built with running multiple tasks on the same piece of hardware in mind. That you do this, is something of your own choice, it's not something the software anticipates or supports. And if I read the developers correctly, it isn't something that's going to be supported (any time soon) either.

John Sheridan
John Sheridan
Joined: 1 Oct 06
Posts: 6
Credit: 19301816
RAC: 0

RE: Still asking myself

Quote:

Still asking myself why BOINC makes a wrong estimation of the execution time when more than 1 GPU WU is running. On my slower PC the estimated time is 14 hours and on the fast PC the estimated time is 21 hours. Really strange...


It's probably because you're using an app_info file to run 2 wus that's missing the tags (as are the ones posted on 3wu thread).
Grab the flops value out of your client_state.xml and pop it into the app_info. You will then need to tweak the value a little until the times are roughly correct.

KittenKaboodle
KittenKaboodle
Joined: 9 Feb 11
Posts: 13
Credit: 10765731
RAC: 0

RE: RE: Still asking

Quote:
Quote:

Still asking myself why BOINC makes a wrong estimation of the execution time when more than 1 GPU WU is running. On my slower PC the estimated time is 14 hours and on the fast PC the estimated time is 21 hours. Really strange...

It's probably because you're using an app_info file to run 2 wus that's missing the tags (as are the ones posted on 3wu thread).
Grab the flops value out of your client_state.xml and pop it into the app_info. You will then need to tweak the value a little until the times are roughly correct.

Yeah, that did the trick! Thank you very much, John.
Now I am able to reduce the CPU WUs in the cache.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.