Many downloads, no tasks started!

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 2
Topic 194837

I'm on Boinc 6.10.43:

Since upgrading, for Einstein it has downloaded a few 10's of binary pulsar search CUDA WUs yet has not started any of them. I've had to set "NNT" to stop the steady stream of downloads.

This is on a 64bit Linux system with an nVidia GPU. Might there be a problem with Einstein trying to use a 32 bit nVidia runtime?...

Or any clues to try?

Anyone seen anything similar?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

Many downloads, no tasks started!

Since your computers are hidden, I can't tell for sure, but I'll bet that your GPU has 512 MB memory. That probably doesn't leave enough available memory to start the tasks.

See this thread for more info.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 2

RE: Since your computers

Message 97494 in response to message 97493

Quote:

Since your computers are hidden, I can't tell for sure, but I'll bet that your GPU has 512 MB memory. That probably doesn't leave enough available memory to start the tasks.

See this thread for more info.


That's exactly right, but now for Linux rather than the Mac. The nVidia card is 512MB video RAM and I only noticed the problem sometime after adding a second monitor...

I'm running Boinc with the cpu and work fetch debug enabled to see what clues there are in the messages there.

Looks like quite a bug to aimlessly download WUs that can never be run... Downloads should be automatically disabled until the presently held WUs succeed, fail, or expire.

Then again, reported available memory should be used in the first place...

Thanks for the quick answer!

Happy crunchin',
Martin

[edit]

Phew! This clinches it!

[Einstein@Home] [cpu_sched_debug] p2030_53831_43886_0165_G65.12-00.39.C_2.dm_420_0: insufficient GPU RAM (150MB < 400MB)

I've got that much open and active?!... Maybe I have!...

Also, I'm using Xinerama for the two screens. That may well have equally divided the available GPU RAM into 256MBytes each even before applications take their share... :-(

[/edit]

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Looks like quite a bug

Message 97495 in response to message 97494

Quote:
Looks like quite a bug to aimlessly download WUs that can never be run... Downloads should be automatically disabled until the presently held WUs succeed, fail, or expire.


Absolutely correct! ;-)

I think the devs are aware of the problem, but for quite some time already... :-)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

RE: Looks like quite a bug

Message 97496 in response to message 97494

Quote:
Looks like quite a bug to aimlessly download WUs that can never be run... Downloads should be automatically disabled until the presently held WUs succeed, fail, or expire.


That should be included since 6.10.37, but I'll report it to the developers.
Can you run with cc_config.xml with and on, please?

Quote:
Also, I'm using Xinerama for the two screens. That may well have equally divided the available GPU RAM into 256MBytes each even before applications take their share...


Yes, that'll happen with multi-monitor support. Not just that the available memory is split between the monitors, but both will use an X amount of memory for the GDI (Graphics Device Interface).

I don't know it this is possible with Nvidia's control panel, but when I ran the multi-monitor versions of Matrox, I could tell how much memory each VGA outlet could take up, thereby shifting the amount of free memory per monitor. Might be something to check.

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 2

RE: RE: Looks like quite

Message 97497 in response to message 97496

Quote:
Quote:
Looks like quite a bug to aimlessly download WUs that can never be run... Downloads should be automatically disabled until the presently held WUs succeed, fail, or expire.

That should be included since 6.10.37, but I'll report it to the developers.
Can you run with cc_config.xml with and on, please?


Can do. "NNT" is set at the moment. I'll run for NNT and then no NNT for a few.

At the moment, I'm letting a few run through to see if they run and validate ok. However, there's a killer problem...

Quote:
Quote:
Also, I'm using Xinerama for the two screens. That may well have equally divided the available GPU RAM into 256MBytes each even before applications take their share...

Yes, that'll happen with multi-monitor support. Not just that the available memory is split between the monitors, but both will use an X amount of memory for the GDI (Graphics Device Interface).

I don't know it this is possible with Nvidia's control panel, but when I ran the multi-monitor versions of Matrox, I could tell how much memory each VGA outlet could take up, thereby shifting the amount of free memory per monitor. Might be something to check.


nVidia's control panel is rather neat and reports the Xinerama setup fine, but I didn't notice any controls for adjusting the memory split.

As a test, I'm running with just one monitor and no Xinerama for the moment. The Einstein binary pulsar WUs look to be running through fine.

However, there's a killer problem:

The Einstein CUDA corrupts the visible display! Even for just a 'vesa' 80x40 character terminal display!! It will also cause the KDE window manager to lock out with the display looking very psychedelic as it zooms through all the colours (looks like the colour palette is being overwritten). Then, a short while later the display will show a repeated pattern of small coloured blocks that then remain static. Xorg then crashes and the keyboard is ineffective for all but the sysreq keys.

Just as well I can ssh into that machine to sort out the mess!

The system is all 64bit with v2.3 nVidia 64bit CUDA libraries. I notice that Einstein is 32bit and uses its own 32bit CUDA... Is there a clash? Or a more general bug?...

After the tests, I'll have to drop out of the binary pulsar search until fixed.

Regards,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35825044
RAC: 0

sounds like it would make for

Message 97498 in response to message 97497

sounds like it would make for a great screen saver, except for the crash part.

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 2

RE: sounds like it would

Message 97499 in response to message 97498

Quote:
sounds like it would make for a great screen saver, except for the crash part.


I always wondered what the data would look like while being worked on. It could indeed make for a fun 'screensaver' if it didn't cause data corruption or a crash!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 2

OK... Three WUs appear to

OK... Three WUs appear to have made it through ok:

h1_0166.90_S5R4__47_S5GCEa_1

h1_0166.90_S5R4__49_S5GCEa_0

p2030_53572_13320_0031_G35.83+01.66.N_3.dm_256_0

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 2

RE: OK... Three WUs appear

Message 97501 in response to message 97500

Quote:
OK... Three WUs appear to have made it through ok...


As have many more.

However...

Despite having set NNT, Boinc v6.10.43 is still downloading new tasks! 600 or so and still counting!!

Most tasks look to take about 3 hours. One was paused at about that mark and still showed another 2 hours to complete. I'm just wondering if that one task being held in memory caused a few other Einstein tasks to error out. Just in case, I've aborted it to give the other WUs a clear run.

Brief tests will be done soon but I'm guessing I'll just have to do a big abort.

Meanwhile...

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: However... Despite

Message 97502 in response to message 97501

Quote:

However...

Despite having set NNT, Boinc v6.10.43 is still downloading new tasks! 600 or so and still counting!!


That would be a big BOINC bug!

What does the button show when you highlight the Einstein project?

What does the Status column show for Einstein?

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.