CUDA App einsteinbinary 3.10 for Windows available for Beta Test

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 62

RE: It appears that BOINC

Message 94429 in response to message 94425

Quote:
It appears that BOINC detects the presence of a GPU on this particular host and will request GPU units regardless of the fact that there is no GPU application included in the app_info.xml. And it is because of the fact that there are no GPU applications in the app_info.xml that the Einstein server responds with the "remove app_info.xml" message and will then set a four-hour delay for the next communication with it.


Ok, here's what's happening.

BOINC 6.6.36 has separate CPU and GPU work fetch schedulers. When a GPU is detected, the GPU work fetch scheduler will start running and try to fetch work from whichever project you are attached to. This can be considered as a ping to the project, to check if they have a GPU application available.

When a project has no GPU application available, you'll see this work request pass by a couple of times, with ever increasing gaps between tries, until it will only ask once a day as it'll be on a 24 hour back-off.

That's what your four hour delay does as well, if you just let it continue, the next time it's an 8 hour delay, 16 hour, 24 hour.

What you do with the anonymous platform app_info.xml file is specifically tell BOINC only to use the applications specified in the file, none of the others. It won't disable the specific work request schedulers, those will continue until the maximum back-off time is reached. It's not pretty, but it works quite well.

So you can go back to an app_info.xml file with only CPU applications and then just ignore the messages about the file from the GPU work request. They'll minimize and only pass by once a day after that, with any luck in the middle of your night.

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 122
Credit: 17403887667
RAC: 7018901

I seemed to get a 4-hour wait

I seemed to get a 4-hour wait every time, not the increasing intervals each time as you described.

It's okay now, anyway. Got my new GPU installed and working happily now. Just would have been a concern if I remained with the weaker GPU, though...

Soli Deo Gloria

Michael Milan
Michael Milan
Joined: 3 Nov 05
Posts: 10
Credit: 1102562
RAC: 0

Tested with Nvidia driver

Tested with Nvidia driver version 190.62 on GeForce GTX 280.
BOINC CC 6.6.36

WU completes successfully with no problems.

hotze33
hotze33
Joined: 10 Nov 04
Posts: 100
Credit: 368387400
RAC: 0

Hi, from time to time I get

Hi,
from time to time I get the following error:
6.10.4

Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)

Activated exception handling...
[08:31:04][6088][INFO ] Starting data processing...
[08:31:04][6088][INFO ] Using CUDA device #0 "GeForce 9800 GX2" (642.82 GFLOPS)
[08:31:04][6088][ERROR] Couldn't open template bank file: templates_400Hz_2.bank (No such file or directory).
[08:31:04][6088][ERROR] Demodulation failed (error: 3)!
called boinc_finish

]]>

I have changed to to new boinc version (from 6.6.36) because I had the same problem.
Earlier I got his problem:
[00:18:22][5104][ERROR] Error creating CUDA FFT plan (error code: 2)
[00:18:22][5104][ERROR] Demodulation failed (error: 3)!
On the host there is only einstein@home working on the gpu. No games or other 3d stuff.
System:
WinXP SP3 32bit, q6600, 9800GX2 with 190.38 driver, SLI disabled
Any suggestions?

Grutte Pier [Wa Oars]~GP500
Grutte Pier [Wa...
Joined: 18 May 09
Posts: 39
Credit: 6098013
RAC: 0

RE: Hi, from time to time I

Message 94433 in response to message 94432

Quote:

Hi,
from time to time I get the following error:
6.10.4

Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)

Activated exception handling...
[08:31:04][6088][INFO ] Starting data processing...
[08:31:04][6088][INFO ] Using CUDA device #0 "GeForce 9800 GX2" (642.82 GFLOPS)
[08:31:04][6088][ERROR] Couldn't open template bank file: templates_400Hz_2.bank (No such file or directory).
[08:31:04][6088][ERROR] Demodulation failed (error: 3)!
called boinc_finish

]]>

I have changed to to new boinc version (from 6.6.36) because I had the same problem.
Earlier I got his problem:
[00:18:22][5104][ERROR] Error creating CUDA FFT plan (error code: 2)
[00:18:22][5104][ERROR] Demodulation failed (error: 3)!
On the host there is only einstein@home working on the gpu. No games or other 3d stuff.
System:
WinXP SP3 32bit, q6600, 9800GX2 with 190.38 driver, SLI disabled
Any suggestions?

Seems to be a bad link to some files "Das System kann den angegebenen Pfad nicht finden"
is boinc installed in a different place

Maybe the *.xml needs to be edited.

PS: Any updated speed improvement (20% is too small to be usefull).

morse [E.R.] - BOINC.Italy
morse [E.R.] - ...
Joined: 20 Feb 05
Posts: 2
Credit: 10747102
RAC: 0

I am a q9400, win7, 4GB ram,

I am a q9400, win7, 4GB ram, GTX275

The following wu went into error:

140476115
140474992
140193504
140164839

cenit
cenit
Joined: 25 Nov 05
Posts: 3
Credit: 14241293
RAC: 0

the following wu got an error

the following wu got an error immediately (tried also updating cudart and cudafft dlls to 2.3 but didn't work)

141150679
141150578
141150530
141150497
141150473
141150232

GeForce 8800 GTS with 320MB (compute capability 1.0, cuda 2.3, driver 190.62)

hotze33
hotze33
Joined: 10 Nov 04
Posts: 100
Credit: 368387400
RAC: 0

I just want to share my

I just want to share my experience with the cuda app. As I mentioned earlier about half of the workunits get an calculation error:

Exit status 3 (0x3)

6.10.4

Das System kann den angegebenen Pfad nicht finden. (0x3) - exit code 3 (0x3)

Activated exception handling...
[02:13:28][2040][INFO ] Starting data processing...
[02:13:28][2040][INFO ] Using CUDA device #1 "GeForce 9800 GX2" (642.82 GFLOPS)
[02:13:28][2040][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[02:13:28][2040][INFO ] Header contents:
------> Original WAPP file: p2030_53647_84633_0048_G55.97-00.86.N_4.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53647.979548611111
------> Number of samples/record: 512
------> Center freq in MHz: 1420
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 193815.465068
------> DEC (J2000): 200556.780691
------> Galactic l: 56.0573
------> Galactic b: -0.7369
------> Name: G55.97-00.86.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 470.4938
------> ZA at start: 4.9436
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Arzoumanian
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 420.4 cm^-3 pc
------> Scale factor: 8060.69
[02:13:29][2040][INFO ] Seed for random number generator is -1132392027.
[02:13:30][2040][ERROR] Error creating CUDA FFT plan (error code: 2)
[02:13:30][2040][ERROR] Demodulation failed (error: 3)!
called boinc_finish

I have updated BOINC to 6.10.4 from 6.6.36 because I had the problem before. I also updated the nvidia driver to the current 190.62 on a 9800GX2 (2x512MB).
The behaviour of BOINC is:
first: killing a number of workunits according the number of gpus. (I have updated to a second 9800GX2 and now it is killing 4 instead of 2 workunits).
second: calculating 4 workunits to 100%, then killing 4 workunits.
now it is in a state where there are 4 wu @ 94%, 4 wu @ 80%, 2 wu @ 4%, 3 wu @ 1 % and 16 wu < 1%. status in every line is suppressed/displaced (german: verdrängt).
Before I had upgraded to two 9800GX2 the situation was the following:
one gpu was crunching at one wu, but the second keep jumping between the other wu according to: starting by a low % wu and whenever the accomplished % equals the % the next wu then it suspended the current wu and jumped to the next one. This keeps happeniing up to a high % wu. Then it marked a new wu with suppressed and started all over again.

For me it seems more like a scheduler problem. The result is running out of memory on the gpu and than have no empty ram and so killing a wu.

Now I will change the OS from WinXP32bit SP3 to Win764bitRC and see, if there is the same.
hotze

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: ...For me it seems more

Message 94437 in response to message 94436

Quote:
...For me it seems more like a scheduler problem. The result is running out of memory on the gpu and than have no empty ram and so killing a wu.


Yes, and if I recall correctly, it was fixed in 6.6.38 and a 6.10.x version after 6.10.4.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

hotze33
hotze33
Joined: 10 Nov 04
Posts: 100
Credit: 368387400
RAC: 0

RE: RE: ...For me it

Message 94438 in response to message 94437

Quote:
Quote:
...For me it seems more like a scheduler problem. The result is running out of memory on the gpu and than have no empty ram and so killing a wu.

Yes, and if I recall correctly, it was fixed in 6.6.38 and a 6.10.x version after 6.10.4.

Gruß,
Gundolf

Ok I will give it a try. But 6.6.38 has another bug (atleast in win7 64bit). It only recognizes 1 graphicscard. The second card is not detected. 6.5 is working fine so far.
I will try 6.6.38 in winxp later.

hotze

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.