Can someone please check those links what happened and if he can do something about it.Thanks :)
http://einsteinathome.org/task/206299674
http://einsteinathome.org/task/206299620
http://einsteinathome.org/task/206299080
Copyright © 2024 Einstein@Home. All rights reserved.
Error while computing
)
Did you try a reboot to clear up anything stuck with your GPU?
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: Can someone please
)
When you installed Boinc did you use the defaults or did you specify the directories etc? Also are you the only person using the pc or are there several people using it all with different log ins?
RE: Can someone please
)
Hi Apostolos,
Welcome to the Einstein project.
I've checked those three links and the info that each contained. I don't run any CUDA capable GPUs so I have no direct experience but I think I can figure out basically what happened but not why it happened. Unfortunately there is nothing that can be done about the tasks that were trashed. They will have been resent to someone else for completion.
As you listed them, it was actually the third one that failed first (check out the 'sent' and 'returned' times recorded in each file) and the other two failed as a direct consequence to what happened in the first failure. So, looking at the data for the third link we see the following snippets together with my commentary
This is a message from your OS that you should investigate (google?) but it doesn't seem to be causing a problem for BOINC.
These lines are quite normal and indicate the successful start of an ABP2 task. If you ever stop and restart BOINC or reboot your computer, you will see a repeat of these startup lines each time. You will find one of these restarts later in the file.
These follow immediately after the previous initialisation output and show a checkpoint being saved every minute. ABP2 tasks (binary pulsar search) each consist of 10 'mini-tasks' sent together to form one large task. This is simply for server convenience. So in a fully completed task, you should find 10 sets of these 'Checkpoint committed' messages. In your case there are 7 full sets and an 8th 'partial set'. This partial one is immediately followed by
which indicates that soon after the 20:16:48 checkpoint was written, either crunching was stopped and restarted or the machine was rebooted, or something of this nature. The timestamp given immediately after restarting [21:00:30] shows that crunching had stopped for about 43 minutes.
On restarting, you can see that the seven completed 'mini-tasks' were acknowledged and skipped and that the 8th uncompleted one was attempted to be reloaded from a saved checkpoint. Immediately following this you see
This is the actual problem. My guess is that at the time the checkpoint was to be loaded into GPU memory, there wasn't enough free and available GPU memory to hold it - or something like that. I have no idea why this happened. You might be able to deduce the reason if you can remember why crunching was off for 43 minutes as logged. Did you run something else that consumed and didn't release your GPU RAM?
Once the first task had failed, the next two were immediate casualties of the same set of circumstances. At least you didn't have any crunching time wasted with those two.
Will this problem happen again? Possibly. The CUDA app is being worked on and a new version is expected 'when it's ready'. We are currently consuming the available ABP2 data at 7 times the rate that new data is being produced so ABP2 tasks will probably be a lot scarcer soon. The current CUDA app requires both a CPU and a GPU and not very much of the total calculation load can actually be run on the GPU. There will be (usually) an improvement in total crunch time but the downside is that you tie up both a CPU and the GPU to achieve it. The improvement will be modest or even non-existent if it's a low end GPU. For these reasons, some volunteers prefer to use their GPU for projects that make more efficient use of them.
Cheers,
Gary.