It appears that BOINC detects the presence of a GPU on this particular host and will request GPU units regardless of the fact that there is no GPU application included in the app_info.xml. And it is because of the fact that there are no GPU applications in the app_info.xml that the Einstein server responds with the "remove app_info.xml" message and will then set a four-hour delay for the next communication with it.
Ok, here's what's happening.
BOINC 6.6.36 has separate CPU and GPU work fetch schedulers. When a GPU is detected, the GPU work fetch scheduler will start running and try to fetch work from whichever project you are attached to. This can be considered as a ping to the project, to check if they have a GPU application available.
When a project has no GPU application available, you'll see this work request pass by a couple of times, with ever increasing gaps between tries, until it will only ask once a day as it'll be on a 24 hour back-off.
That's what your four hour delay does as well, if you just let it continue, the next time it's an 8 hour delay, 16 hour, 24 hour.
What you do with the anonymous platform app_info.xml file is specifically tell BOINC only to use the applications specified in the file, none of the others. It won't disable the specific work request schedulers, those will continue until the maximum back-off time is reached. It's not pretty, but it works quite well.
So you can go back to an app_info.xml file with only CPU applications and then just ignore the messages about the file from the GPU work request. They'll minimize and only pass by once a day after that, with any luck in the middle of your night.
Hi,
from time to time I get the following error:
6.10.4
Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)
Activated exception handling...
[08:31:04][6088][INFO ] Starting data processing...
[08:31:04][6088][INFO ] Using CUDA device #0 "GeForce 9800 GX2" (642.82 GFLOPS)
[08:31:04][6088][ERROR] Couldn't open template bank file: templates_400Hz_2.bank (No such file or directory).
[08:31:04][6088][ERROR] Demodulation failed (error: 3)!
called boinc_finish
]]>
I have changed to to new boinc version (from 6.6.36) because I had the same problem.
Earlier I got his problem:
[00:18:22][5104][ERROR] Error creating CUDA FFT plan (error code: 2)
[00:18:22][5104][ERROR] Demodulation failed (error: 3)!
On the host there is only einstein@home working on the gpu. No games or other 3d stuff.
System:
WinXP SP3 32bit, q6600, 9800GX2 with 190.38 driver, SLI disabled
Any suggestions?
Hi,
from time to time I get the following error:
6.10.4
Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)
Activated exception handling...
[08:31:04][6088][INFO ] Starting data processing...
[08:31:04][6088][INFO ] Using CUDA device #0 "GeForce 9800 GX2" (642.82 GFLOPS)
[08:31:04][6088][ERROR] Couldn't open template bank file: templates_400Hz_2.bank (No such file or directory).
[08:31:04][6088][ERROR] Demodulation failed (error: 3)!
called boinc_finish
]]>
I have changed to to new boinc version (from 6.6.36) because I had the same problem.
Earlier I got his problem:
[00:18:22][5104][ERROR] Error creating CUDA FFT plan (error code: 2)
[00:18:22][5104][ERROR] Demodulation failed (error: 3)!
On the host there is only einstein@home working on the gpu. No games or other 3d stuff.
System:
WinXP SP3 32bit, q6600, 9800GX2 with 190.38 driver, SLI disabled
Any suggestions?
Seems to be a bad link to some files "Das System kann den angegebenen Pfad nicht finden"
is boinc installed in a different place
Maybe the *.xml needs to be edited.
PS: Any updated speed improvement (20% is too small to be usefull).
I just want to share my experience with the cuda app. As I mentioned earlier about half of the workunits get an calculation error:
Exit status 3 (0x3)
6.10.4
Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)
Activated exception handling...
[02:13:28][2040][INFO ] Starting data processing...
[02:13:28][2040][INFO ] Using CUDA device #1 "GeForce 9800 GX2" (642.82 GFLOPS)
[02:13:28][2040][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[02:13:28][2040][INFO ] Header contents:
------> Original WAPP file: p2030_53647_84633_0048_G55.97-00.86.N_4.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53647.979548611111
------> Number of samples/record: 512
------> Center freq in MHz: 1420
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 193815.465068
------> DEC (J2000): 200556.780691
------> Galactic l: 56.0573
------> Galactic b: -0.7369
------> Name: G55.97-00.86.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 470.4938
------> ZA at start: 4.9436
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Arzoumanian
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 420.4 cm^-3 pc
------> Scale factor: 8060.69
[02:13:29][2040][INFO ] Seed for random number generator is -1132392027.
[02:13:30][2040][ERROR] Error creating CUDA FFT plan (error code: 2)
[02:13:30][2040][ERROR] Demodulation failed (error: 3)!
called boinc_finish
I have updated BOINC to 6.10.4 from 6.6.36 because I had the problem before. I also updated the nvidia driver to the current 190.62 on a 9800GX2 (2x512MB).
The behaviour of BOINC is:
first: killing a number of workunits according the number of gpus. (I have updated to a second 9800GX2 and now it is killing 4 instead of 2 workunits).
second: calculating 4 workunits to 100%, then killing 4 workunits.
now it is in a state where there are 4 wu @ 94%, 4 wu @ 80%, 2 wu @ 4%, 3 wu @ 1 % and 16 wu < 1%. status in every line is suppressed/displaced (german: verdrängt).
Before I had upgraded to two 9800GX2 the situation was the following:
one gpu was crunching at one wu, but the second keep jumping between the other wu according to: starting by a low % wu and whenever the accomplished % equals the % the next wu then it suspended the current wu and jumped to the next one. This keeps happeniing up to a high % wu. Then it marked a new wu with suppressed and started all over again.
For me it seems more like a scheduler problem. The result is running out of memory on the gpu and than have no empty ram and so killing a wu.
Now I will change the OS from WinXP32bit SP3 to Win764bitRC and see, if there is the same.
hotze
...For me it seems more like a scheduler problem. The result is running out of memory on the gpu and than have no empty ram and so killing a wu.
Yes, and if I recall correctly, it was fixed in 6.6.38 and a 6.10.x version after 6.10.4.
Gruß,
Gundolf
Ok I will give it a try. But 6.6.38 has another bug (atleast in win7 64bit). It only recognizes 1 graphicscard. The second card is not detected. 6.5 is working fine so far.
I will try 6.6.38 in winxp later.
RE: It appears that BOINC
)
Ok, here's what's happening.
BOINC 6.6.36 has separate CPU and GPU work fetch schedulers. When a GPU is detected, the GPU work fetch scheduler will start running and try to fetch work from whichever project you are attached to. This can be considered as a ping to the project, to check if they have a GPU application available.
When a project has no GPU application available, you'll see this work request pass by a couple of times, with ever increasing gaps between tries, until it will only ask once a day as it'll be on a 24 hour back-off.
That's what your four hour delay does as well, if you just let it continue, the next time it's an 8 hour delay, 16 hour, 24 hour.
What you do with the anonymous platform app_info.xml file is specifically tell BOINC only to use the applications specified in the file, none of the others. It won't disable the specific work request schedulers, those will continue until the maximum back-off time is reached. It's not pretty, but it works quite well.
So you can go back to an app_info.xml file with only CPU applications and then just ignore the messages about the file from the GPU work request. They'll minimize and only pass by once a day after that, with any luck in the middle of your night.
I seemed to get a 4-hour wait
)
I seemed to get a 4-hour wait every time, not the increasing intervals each time as you described.
It's okay now, anyway. Got my new GPU installed and working happily now. Just would have been a concern if I remained with the weaker GPU, though...
Soli Deo Gloria
Tested with Nvidia driver
)
Tested with Nvidia driver version 190.62 on GeForce GTX 280.
BOINC CC 6.6.36
WU completes successfully with no problems.
Hi, from time to time I get
)
Hi,
from time to time I get the following error:
6.10.4
Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)
Activated exception handling...
[08:31:04][6088][INFO ] Starting data processing...
[08:31:04][6088][INFO ] Using CUDA device #0 "GeForce 9800 GX2" (642.82 GFLOPS)
[08:31:04][6088][ERROR] Couldn't open template bank file: templates_400Hz_2.bank (No such file or directory).
[08:31:04][6088][ERROR] Demodulation failed (error: 3)!
called boinc_finish
]]>
I have changed to to new boinc version (from 6.6.36) because I had the same problem.
Earlier I got his problem:
[00:18:22][5104][ERROR] Error creating CUDA FFT plan (error code: 2)
[00:18:22][5104][ERROR] Demodulation failed (error: 3)!
On the host there is only einstein@home working on the gpu. No games or other 3d stuff.
System:
WinXP SP3 32bit, q6600, 9800GX2 with 190.38 driver, SLI disabled
Any suggestions?
RE: Hi, from time to time I
)
Seems to be a bad link to some files "Das System kann den angegebenen Pfad nicht finden"
is boinc installed in a different place
Maybe the *.xml needs to be edited.
PS: Any updated speed improvement (20% is too small to be usefull).
I am a q9400, win7, 4GB ram,
)
I am a q9400, win7, 4GB ram, GTX275
The following wu went into error:
140476115
140474992
140193504
140164839
the following wu got an error
)
the following wu got an error immediately (tried also updating cudart and cudafft dlls to 2.3 but didn't work)
141150679
141150578
141150530
141150497
141150473
141150232
GeForce 8800 GTS with 320MB (compute capability 1.0, cuda 2.3, driver 190.62)
I just want to share my
)
I just want to share my experience with the cuda app. As I mentioned earlier about half of the workunits get an calculation error:
Exit status 3 (0x3)
6.10.4
Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)
Activated exception handling...
[02:13:28][2040][INFO ] Starting data processing...
[02:13:28][2040][INFO ] Using CUDA device #1 "GeForce 9800 GX2" (642.82 GFLOPS)
[02:13:28][2040][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[02:13:28][2040][INFO ] Header contents:
------> Original WAPP file: p2030_53647_84633_0048_G55.97-00.86.N_4.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53647.979548611111
------> Number of samples/record: 512
------> Center freq in MHz: 1420
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 193815.465068
------> DEC (J2000): 200556.780691
------> Galactic l: 56.0573
------> Galactic b: -0.7369
------> Name: G55.97-00.86.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 470.4938
------> ZA at start: 4.9436
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Arzoumanian
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 420.4 cm^-3 pc
------> Scale factor: 8060.69
[02:13:29][2040][INFO ] Seed for random number generator is -1132392027.
[02:13:30][2040][ERROR] Error creating CUDA FFT plan (error code: 2)
[02:13:30][2040][ERROR] Demodulation failed (error: 3)!
called boinc_finish
I have updated BOINC to 6.10.4 from 6.6.36 because I had the problem before. I also updated the nvidia driver to the current 190.62 on a 9800GX2 (2x512MB).
The behaviour of BOINC is:
first: killing a number of workunits according the number of gpus. (I have updated to a second 9800GX2 and now it is killing 4 instead of 2 workunits).
second: calculating 4 workunits to 100%, then killing 4 workunits.
now it is in a state where there are 4 wu @ 94%, 4 wu @ 80%, 2 wu @ 4%, 3 wu @ 1 % and 16 wu < 1%. status in every line is suppressed/displaced (german: verdrängt).
Before I had upgraded to two 9800GX2 the situation was the following:
one gpu was crunching at one wu, but the second keep jumping between the other wu according to: starting by a low % wu and whenever the accomplished % equals the % the next wu then it suspended the current wu and jumped to the next one. This keeps happeniing up to a high % wu. Then it marked a new wu with suppressed and started all over again.
For me it seems more like a scheduler problem. The result is running out of memory on the gpu and than have no empty ram and so killing a wu.
Now I will change the OS from WinXP32bit SP3 to Win764bitRC and see, if there is the same.
hotze
RE: ...For me it seems more
)
Yes, and if I recall correctly, it was fixed in 6.6.38 and a 6.10.x version after 6.10.4.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: RE: ...For me it
)
Ok I will give it a try. But 6.6.38 has another bug (atleast in win7 64bit). It only recognizes 1 graphicscard. The second card is not detected. 6.5 is working fine so far.
I will try 6.6.38 in winxp later.
hotze