Both instructions access only the 0x0040AD30 memory area directly, not the '0x00000000' (invalid address).
Quote:
[pre]***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x0040AA17 read attempt to address 0xFFFFFFFF[/pre]
It seems to be the same problem.
I think the processor could not prepare the address in time.
It has lots of work with this code...
I have some information that may be relevant to this issue of unhandled exceptions. I have around 70 or so machines crunching for EAH, many of which are overclocked and most of which are running the S41.06/07 version of Albert. Some are running S40.12 and just a few are still running the stock app. I only noticed your brilliant optimisation work around a week ago and have been converting machines to the S41.07 version as quickly as possible as this is giving the best results for me.
Around 20 of my machines are HP Vectra VL420s which I purchased at a surplus government equipment auction. (It's amazing what the Government is prepared to throw away for a song - but I digress). They have a P4 1.6G Williamette CPU and use PC133 SDRAM. They have a suitable PLL chip (ICS 950202) so the FSB can be tweaked with almost limitless precision using CPUFSB while running under WindowsXP-SP2. The stock configuration is 100 FSB and 16x multiplier. With CPUFSB, approximately 70% of these machines are running quite happily at 125 FSB and 16x multiplier = 2.0Gig. They are prime95 stable (approx 3 hrs runtime) at that level. Prime95 errors seem to creep in around 2050 to 2100MHz. Any machines that show Prime95 rounding errors lower than 2050 get backed off to around 1950 and so on. For some reason one or two machines in the batch need to be around 1750 - 1800 before they will operate stably. Collectively, these machines have done thousands of results without me noticing any invalids or any abnormal program terminations until now. Be aware however that I don't have time to monitor closely 70+ machines so I probably wouldn't notice the odd error or even the odd batch of errors if the problem went away quickly. I would certainly notice a machine lockup, and that hasn't been happening with these machines.
Just one machine in the whole batch of VL420s has now within the last 24 hours had 32 of these unhandled exceptions and this was enough for me to notice. Having read your comments implicating overclocking, I have tried to investigate this issue thoroughly. Here is the information I have gleaned:-
Machine: HP Vectra VL420 - P4 Williamette 1600 @ 1960MHz (FSB = 122.5MHz)
Optimised App Running: S41.06 - Boinc Version 5.2.13
EAH CPUID: 536520
Identical Machine CPUID: 610566 - running @ 2000MHz - no errors or invalids
Recent Result Names (536520): r1_1221.5, z1_1086.0, z1_1320.0, z1_1158.5, z1_1269.0
Error Message:
Quote:
5.2.13
- exit code -1073741819 (0xc0000005)
2006-05-08 10:41:09.6250 [normal]: Optimised by akosf S41.06 --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'.
r2006-05-08 10:41:09.6250 [normal]: Started search at lalDebugLevel = 0
2006-05-08 10:41:11.3281 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-05-08 10:41:11.3281 [normal]: No usable checkpoint found, starting from beginning.
***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x0040ABF8 read attempt to address 0x02024000
Of the various recent result files processed, the errors (32 of them in total) have all come from just one file - z1_1269.0 and none of the others. There have been no successfully completed results from z1_1269.0. The addresses mentioned in the message always seemed to be the same (I checked about 10 or so). Interspaced with the errors were perfectly normal and valid results which came from the other mentioned result files. There have been no errors from any other result file that I have noticed.
There are still three results from z1_1269.0 left in the work cache and which will be done in the next few hours or so once others which are processing correctly have been finished. I have backed off the CPU speed to 1900 from 1960 (should I back it off more??) and will report if this allows the last three problem results to complete correctly. Apart from these unhandled exceptions, everything else about the operation of the computer seems perfectly normal. The puzzling thing is why there are errors only from z1_1269.0 and not from any other result file.
I wiil report further once the three results of interest have been processed. There are still six @ 1.5 hrs each ahead of them in the worklist.
Cheers, (and many thanks for your brilliant work},
I wiil report further once the three results of interest have been processed. There are still six @ 1.5 hrs each ahead of them in the worklist.
The six have now finished successfully and the three from the file z1_1269.0 have failed yet again, even with the lowered FSB. In the meantime, two more from this same problem file have been downloaded so I've further reduced the FSB so that the CPU is at 1800MHz for the next lot. Once again there are a number of others from non-problem files to be processed before the next from z1_1269.0 comes up for processing.
I'll report what happens with the next one when it gets to run. There are two of these and they are separated so if the first fails again I'll have a further opportunity to go even lower with FSB. Maybe even more will download in the interim.
I have at least a dozen identical machines running the FSB at 125 (CPU at 2000MHZ) showing no signs of similar problems. Why does z1_1269.0 have to be so picky :).
Just to add my bit - done probably around 20 WU so far, both small and big on this 2.6 Intel with no over-clocking but well-tweaked OS, and no errors so far, tho still several pending. Obviously using S41.07
RE: I didn't get new
)
So it's overclocking. Will watch closely on my another overclocked PC when will test S-version on it. Thank you for explanation.
Mine isn't OCed (it's a Tyan
)
Mine isn't OCed (it's a Tyan board without any OC settings), it was quite warm though as one case vent broke, 11°C more than usual on CPU1.
RE: RE: [pre]***UNHANDLED
)
I have some information that may be relevant to this issue of unhandled exceptions. I have around 70 or so machines crunching for EAH, many of which are overclocked and most of which are running the S41.06/07 version of Albert. Some are running S40.12 and just a few are still running the stock app. I only noticed your brilliant optimisation work around a week ago and have been converting machines to the S41.07 version as quickly as possible as this is giving the best results for me.
Around 20 of my machines are HP Vectra VL420s which I purchased at a surplus government equipment auction. (It's amazing what the Government is prepared to throw away for a song - but I digress). They have a P4 1.6G Williamette CPU and use PC133 SDRAM. They have a suitable PLL chip (ICS 950202) so the FSB can be tweaked with almost limitless precision using CPUFSB while running under WindowsXP-SP2. The stock configuration is 100 FSB and 16x multiplier. With CPUFSB, approximately 70% of these machines are running quite happily at 125 FSB and 16x multiplier = 2.0Gig. They are prime95 stable (approx 3 hrs runtime) at that level. Prime95 errors seem to creep in around 2050 to 2100MHz. Any machines that show Prime95 rounding errors lower than 2050 get backed off to around 1950 and so on. For some reason one or two machines in the batch need to be around 1750 - 1800 before they will operate stably. Collectively, these machines have done thousands of results without me noticing any invalids or any abnormal program terminations until now. Be aware however that I don't have time to monitor closely 70+ machines so I probably wouldn't notice the odd error or even the odd batch of errors if the problem went away quickly. I would certainly notice a machine lockup, and that hasn't been happening with these machines.
Just one machine in the whole batch of VL420s has now within the last 24 hours had 32 of these unhandled exceptions and this was enough for me to notice. Having read your comments implicating overclocking, I have tried to investigate this issue thoroughly. Here is the information I have gleaned:-
Machine: HP Vectra VL420 - P4 Williamette 1600 @ 1960MHz (FSB = 122.5MHz)
Optimised App Running: S41.06 - Boinc Version 5.2.13
EAH CPUID: 536520
Identical Machine CPUID: 610566 - running @ 2000MHz - no errors or invalids
Recent Result Names (536520): r1_1221.5, z1_1086.0, z1_1320.0, z1_1158.5, z1_1269.0
Error Message:
Of the various recent result files processed, the errors (32 of them in total) have all come from just one file - z1_1269.0 and none of the others. There have been no successfully completed results from z1_1269.0. The addresses mentioned in the message always seemed to be the same (I checked about 10 or so). Interspaced with the errors were perfectly normal and valid results which came from the other mentioned result files. There have been no errors from any other result file that I have noticed.
There are still three results from z1_1269.0 left in the work cache and which will be done in the next few hours or so once others which are processing correctly have been finished. I have backed off the CPU speed to 1900 from 1960 (should I back it off more??) and will report if this allows the last three problem results to complete correctly. Apart from these unhandled exceptions, everything else about the operation of the computer seems perfectly normal. The puzzling thing is why there are errors only from z1_1269.0 and not from any other result file.
I wiil report further once the three results of interest have been processed. There are still six @ 1.5 hrs each ahead of them in the worklist.
Cheers, (and many thanks for your brilliant work},
Cheers,
Gary.
RE: Cheers, (and many
)
GARY!!!
It's so good to see you've returned!!! Where the devil have you been, old friend?
Michael Roycraft
microcraft
"The arc of history is long, but it bends toward justice" - MLK
RE: Where the devil have
)
Hi Michael,
So as not to disrupt this thread, I'll email you when I get a chance.
It's good to see you still contributing here.
Cheers,
Cheers,
Gary.
RE: I wiil report further
)
The six have now finished successfully and the three from the file z1_1269.0 have failed yet again, even with the lowered FSB. In the meantime, two more from this same problem file have been downloaded so I've further reduced the FSB so that the CPU is at 1800MHz for the next lot. Once again there are a number of others from non-problem files to be processed before the next from z1_1269.0 comes up for processing.
I'll report what happens with the next one when it gets to run. There are two of these and they are separated so if the first fails again I'll have a further opportunity to go even lower with FSB. Maybe even more will download in the interim.
I have at least a dozen identical machines running the FSB at 125 (CPU at 2000MHZ) showing no signs of similar problems. Why does z1_1269.0 have to be so picky :).
Cheers,
Cheers,
Gary.
Greetings All Just to add
)
Greetings All
Just to add my bit - done probably around 20 WU so far, both small and big on this 2.6 Intel with no over-clocking but well-tweaked OS, and no errors so far, tho still several pending. Obviously using S41.07
To Akosf, many thanks !!
Gray
RE: [pre]***UNHANDLED
)
Yes. This FSB modification totally eliminate this error.
So, it was a good cpu stability test. :)
hi akos, i can give you a
)
hi akos,
i can give you a vpn - vnc full account for a 3,0 GHz Xeon with HT for testing new albert´s. Machine ist running 27/7 only for einstein@home :^)
Contact:
EggZZ
RE: i can give you a vpn -
)
Thanks EggZZ.
But I cannot take the advantage of the opportunity.
I have lot of work and lot of other things, no free time.