Negative Time Warp

Der Mann mit der Ledertasche
Der Mann mit de...
Joined: 12 Dec 05
Posts: 151
Credit: 302594178
RAC: 0
Topic 198221

Hi Folks,

something looks curious on one of my Maschine (http://einsteinathome.org/host/12019173). This one crunching well for several weeks (Boinc 7.6.42/7.6.6/7.6.9). The Host running only 1 Intel BRP6 Beta and 3 FGRP4 (1Core reserved for GPU). Some weeks ago I noticed that the FGRP WU's started normal until a Progress of about 30% and a Runtime of about 3h. At this Point the Progress step back to 1,xxx % and finish every WU without any Error after about 30 hrs, formally about 12-14 hrs. The GPU WU's are not affected. I've checked Temp's, System, Drivers, I reset Project, I reinstall the hole BS, nothing helped. Searching "Dr. Google" didn't find something. Any suggestions?

Greetings from the North

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Negative Time Warp

Quote:

Hi Folks,

something looks curious on one of my Maschine (http://einsteinathome.org/host/12019173). This one crunching well for several weeks (Boinc 7.6.42/7.6.6/7.6.9). The Host running only 1 Intel BRP6 Beta and 3 FGRP4 (1Core reserved for GPU). Some weeks ago I noticed that the FGRP WU's started normal until a Progress of about 30% and a Runtime of about 3h. At this Point the Progress step back to 1,xxx % and finish every WU without any Error after about 30 hrs, formally about 12-14 hrs. The GPU WU's are not affected. I've checked Temp's, System, Drivers, I reset Project, I reinstall the hole BS, nothing helped. Searching "Dr. Google" didn't find something. Any suggestions?

Unfortunately that host no longer is showing any fast tasks, only slow ones.

I think it would help to look over the job_log_einstein.txt file in the boinc data directory

From one on my hosts, the job log shows

[pre]

1441351879 ue 35173.267327 ct 29178.480000 fe 105000000000000 nm LATeah1070E_80.0_2078_-2.43e-10_0 et 29200.990634
1441380962 ue 35173.267327 ct 29058.770000 fe 105000000000000 nm LATeah1070E_80.0_2442_-9.23e-10_1 et 29081.877956
1441409876 ue 35173.267327 ct 28890.660000 fe 105000000000000 nm LATeah1070E_80.0_2832_-7.27e-10_1 et 28912.355948
1441438959 ue 35277.227723 ct 29025.230000 fe 105000000000000 nm LATeah1070E_80.0_3248_-3e-12_0 et 29050.476094
1441468043 ue 35277.227723 ct 29059.110000 fe 105000000000000 nm LATeah1070E_80.0_3664_-3.79e-10_0 et 29082.448339
1441497029 ue 35277.227723 ct 28962.980000 fe 105000000000000 nm LATeah1070E_80.0_4158_-3.55e-10_1 et 28985.492337

[/pre]

first column time in seconds since T0

ue = estimated time
ct = cpu time
fe = flops (i think)
nm = task name
et = elapsed time

At least that will reveal exactly when the tasks slowed and perhaps that might able to be tied to some other clue, such as new release, or other events in the event logs.

Der Mann mit der Ledertasche
Der Mann mit de...
Joined: 12 Dec 05
Posts: 151
Credit: 302594178
RAC: 0

...and where can I get the

...and where can I get the needed information?

1440723982 ue 44916.202999 ct 2054.969000 fe 590000000000000 nm PM0051_014C1_164_1 et 57061.643153 es 0
1440784840 ue 28491.902557 ct 107274.200000 fe 105000000000000 nm LATeah1064E_48.0_434_-5.86e-10_0 et 111213.006718 es 0
1440786546 ue 28491.902557 ct 108681.600000 fe 105000000000000 nm LATeah1064E_48.0_434_-8.5e-10_1 et 112625.464962 es 0
1440786590 ue 28491.902557 ct 108991.000000 fe 105000000000000 nm LATeah1064E_48.0_461_-6.61e-10_0 et 112931.436409 es 0
1440787637 ue 44916.202999 ct 2239.750000 fe 590000000000000 nm PM0051_01421_258_0 et 63236.834308 es 0
1440851466 ue 44916.202999 ct 1901.547000 fe 590000000000000 nm PM0049_037C1_288_0 et 56450.569763 es 0
1440910310 ue 44916.202999 ct 1911.469000 fe 590000000000000 nm PM0049_03771_204_1 et 55706.958345 es 0
1440971163 ue 44916.202999 ct 2442.094000 fe 590000000000000 nm PM0049_03811_68_0 et 60647.574432 es 0
1440977471 ue 28491.902557 ct 121519.100000 fe 105000000000000 nm LATeah1061E_1360.0_254270_0.0_1 et 122164.401279 es 0
1440977476 ue 28491.902557 ct 121011.100000 fe 105000000000000 nm LATeah1061E_1328.0_693267_0.0_1 et 121655.255836 es 0
1440979267 ue 28491.902557 ct 122407.300000 fe 105000000000000 nm LATeah1061E_1296.0_500022_0.0_0 et 123058.956638 es 0
1441027998 ue 44916.202999 ct 2119.906000 fe 590000000000000 nm PM0049_037C1_276_0 et 56833.922279 es 0
1441085190 ue 44916.202999 ct 2114.875000 fe 590000000000000 nm PM0049_03771_228_0 et 57190.257339 es 0
1441100181 ue 28491.902557 ct 122043.000000 fe 105000000000000 nm LATeah1061E_1296.0_427697_0.0_0 et 122708.871290 es 0
1441100813 ue 28491.902557 ct 122666.700000 fe 105000000000000 nm LATeah1061E_1296.0_572072_0.0_0 et 123336.220974 es 0
1441101382 ue 28491.902557 ct 121452.300000 fe 105000000000000 nm LATeah1061E_1328.0_534317_0.0_0 et 122114.709385 es 0
1441145747 ue 44916.202999 ct 2376.063000 fe 590000000000000 nm PM0049_03871_220_1 et 59415.931336 es 0
1441201155 ue 44916.202999 ct 1887.547000 fe 590000000000000 nm PM0049_039A1_172_0 et 55406.588938 es 0
1441226862 ue 28491.902557 ct 123394.400000 fe 105000000000000 nm LATeah1061E_1328.0_262617_0.0_0 et 124417.673443 es 0
1441227961 ue 28491.902557 ct 124009.900000 fe 105000000000000 nm LATeah1061E_1328.0_347042_0.0_0 et 125045.424517 es 0
1441228080 ue 28491.902557 ct 123647.300000 fe 105000000000000 nm LATeah1061E_1328.0_519192_0.0_0 et 124676.194162 es 0
1441262915 ue 44916.202999 ct 2424.453000 fe 590000000000000 nm PM0051_02831_74_1 et 61759.263681 es 0
1441311362 ue 22229.952213 ct 82927.980000 fe 81923100000000 nm LATeah1062E_48.0_39_-4.78e-10_0 et 83281.410313 es 0
1441313888 ue 22229.952213 ct 85555.750000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.08e-10_0 et 85926.539526 es 0
1441321886 ue 44916.202999 ct 2087.359000 fe 590000000000000 nm PM0051_02941_6_0 et 58970.257833 es 0
1441333405 ue 28491.902557 ct 106304.000000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.9e-11_0 et 106763.213356 es 0
1441379716 ue 44916.202999 ct 1949.766000 fe 590000000000000 nm PM0051_03181_6_0 et 57828.836243 es 0
1441395093 ue 22229.952213 ct 83350.770000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.66e-10_1 et 83730.082275 es 0
1441398184 ue 22229.952213 ct 83907.970000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.65e-10_1 et 84294.570869 es 0
1441419421 ue 22229.952213 ct 85601.800000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.11e-10_0 et 86014.443184 es 0
1441438836 ue 44916.202999 ct 2300.141000 fe 590000000000000 nm PM0051_033D1_296_0 et 59119.385736 es 0
1441497052 ue 44916.202999 ct 1961.125000 fe 590000000000000 nm PM0051_03481_44_0 et 58214.659645 es 0
1441503590 ue 28491.902557 ct 107622.900000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.34e-10_0 et 108496.302332 es 0
1441506897 ue 28491.902557 ct 107845.800000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.32e-10_0 et 108711.538296 es 0
1441526085 ue 28491.902557 ct 105827.200000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.4e-11_1 et 106663.663586 es 0
1441556220 ue 44916.202999 ct 2072.266000 fe 590000000000000 nm PM0052_00151_164_0 et 59167.294909 es 0
1441613037 ue 28491.902557 ct 107877.400000 fe 105000000000000 nm LATeah1062E_80.0_41_-1e-12_1 et 108370.135456 es 0
1441615240 ue 44916.202999 ct 2052.781000 fe 590000000000000 nm PM0052_00381_180_1 et 58906.717886 es 0
1441619093 ue 28491.902557 ct 109998.200000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.25e-10_1 et 110504.588925 es 0
1441635224 ue 28491.902557 ct 107585.000000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.9e-10_1 et 108086.727041 es 0
1441673062 ue 43941.829497 ct 2118.578000 fe 590000000000000 nm PM0049_02081_28_2 et 57821.214819 es 0
1441722916 ue 28491.902557 ct 109316.400000 fe 105000000000000 nm LATeah1062E_80.0_67_-6.88e-10_1 et 109866.948675 es 0
1441726785 ue 28491.902557 ct 107132.800000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.67e-10_2 et 107679.916601 es 0
1441731503 ue 43941.829497 ct 2310.141000 fe 590000000000000 nm PM0052_00751_334_0 et 58428.766060 es 0
1441742977 ue 28491.902557 ct 107175.300000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.99e-10_1 et 107730.733374 es 0
1441789494 ue 43941.829497 ct 2111.594000 fe 590000000000000 nm PM0052_00941_258_1 et 57979.432055 es 0
1441829462 ue 28491.902557 ct 104752.400000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.99e-10_0 et 105388.579685 es 0
1441832468 ue 28491.902557 ct 102753.600000 fe 105000000000000 nm LATeah1062E_80.0_67_-3.98e-10_0 et 103277.664434 es 0
1441848810 ue 43941.829497 ct 2140.828000 fe 590000000000000 nm PM0052_01021_244_1 et 58537.375689 es 0
1441850121 ue 27873.823712 ct 104449.900000 fe 105000000000000 nm LATeah1062E_80.0_93_-1.41e-10_0 et 104960.060425 es 0

Greetings from the North

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2989339814
RAC: 702067

First, separate out the PM

First, separate out the PM tasks from the LAT tasks.

1440723982 ue 44916.202999 ct 2054.969000 fe 590000000000000 nm PM0051_014C1_164_1 et 57061.643153 es 0
1440787637 ue 44916.202999 ct 2239.750000 fe 590000000000000 nm PM0051_01421_258_0 et 63236.834308 es 0
1440851466 ue 44916.202999 ct 1901.547000 fe 590000000000000 nm PM0049_037C1_288_0 et 56450.569763 es 0
1440910310 ue 44916.202999 ct 1911.469000 fe 590000000000000 nm PM0049_03771_204_1 et 55706.958345 es 0
1440971163 ue 44916.202999 ct 2442.094000 fe 590000000000000 nm PM0049_03811_68_0 et 60647.574432 es 0
1441027998 ue 44916.202999 ct 2119.906000 fe 590000000000000 nm PM0049_037C1_276_0 et 56833.922279 es 0
1441085190 ue 44916.202999 ct 2114.875000 fe 590000000000000 nm PM0049_03771_228_0 et 57190.257339 es 0
1441145747 ue 44916.202999 ct 2376.063000 fe 590000000000000 nm PM0049_03871_220_1 et 59415.931336 es 0
1441201155 ue 44916.202999 ct 1887.547000 fe 590000000000000 nm PM0049_039A1_172_0 et 55406.588938 es 0
1441262915 ue 44916.202999 ct 2424.453000 fe 590000000000000 nm PM0051_02831_74_1 et 61759.263681 es 0
1441321886 ue 44916.202999 ct 2087.359000 fe 590000000000000 nm PM0051_02941_6_0 et 58970.257833 es 0
1441379716 ue 44916.202999 ct 1949.766000 fe 590000000000000 nm PM0051_03181_6_0 et 57828.836243 es 0
1441438836 ue 44916.202999 ct 2300.141000 fe 590000000000000 nm PM0051_033D1_296_0 et 59119.385736 es 0
1441497052 ue 44916.202999 ct 1961.125000 fe 590000000000000 nm PM0051_03481_44_0 et 58214.659645 es 0
1441556220 ue 44916.202999 ct 2072.266000 fe 590000000000000 nm PM0052_00151_164_0 et 59167.294909 es 0
1441615240 ue 44916.202999 ct 2052.781000 fe 590000000000000 nm PM0052_00381_180_1 et 58906.717886 es 0
1441673062 ue 43941.829497 ct 2118.578000 fe 590000000000000 nm PM0049_02081_28_2 et 57821.214819 es 0
1441731503 ue 43941.829497 ct 2310.141000 fe 590000000000000 nm PM0052_00751_334_0 et 58428.766060 es 0
1441789494 ue 43941.829497 ct 2111.594000 fe 590000000000000 nm PM0052_00941_258_1 et 57979.432055 es 0
1441848810 ue 43941.829497 ct 2140.828000 fe 590000000000000 nm PM0052_01021_244_1 et 58537.375689 es 0

Then, see where the et transitions occurred

1440784840 ue 28491.902557 ct 107274.200000 fe 105000000000000 nm LATeah1064E_48.0_434_-5.86e-10_0 et 111213.006718 es 0
1440786546 ue 28491.902557 ct 108681.600000 fe 105000000000000 nm LATeah1064E_48.0_434_-8.5e-10_1 et 112625.464962 es 0
1440786590 ue 28491.902557 ct 108991.000000 fe 105000000000000 nm LATeah1064E_48.0_461_-6.61e-10_0 et 112931.436409 es 0
1440977471 ue 28491.902557 ct 121519.100000 fe 105000000000000 nm LATeah1061E_1360.0_254270_0.0_1 et 122164.401279 es 0
1440977476 ue 28491.902557 ct 121011.100000 fe 105000000000000 nm LATeah1061E_1328.0_693267_0.0_1 et 121655.255836 es 0
1440979267 ue 28491.902557 ct 122407.300000 fe 105000000000000 nm LATeah1061E_1296.0_500022_0.0_0 et 123058.956638 es 0
1441100181 ue 28491.902557 ct 122043.000000 fe 105000000000000 nm LATeah1061E_1296.0_427697_0.0_0 et 122708.871290 es 0
1441100813 ue 28491.902557 ct 122666.700000 fe 105000000000000 nm LATeah1061E_1296.0_572072_0.0_0 et 123336.220974 es 0
1441101382 ue 28491.902557 ct 121452.300000 fe 105000000000000 nm LATeah1061E_1328.0_534317_0.0_0 et 122114.709385 es 0
1441226862 ue 28491.902557 ct 123394.400000 fe 105000000000000 nm LATeah1061E_1328.0_262617_0.0_0 et 124417.673443 es 0
1441227961 ue 28491.902557 ct 124009.900000 fe 105000000000000 nm LATeah1061E_1328.0_347042_0.0_0 et 125045.424517 es 0
1441228080 ue 28491.902557 ct 123647.300000 fe 105000000000000 nm LATeah1061E_1328.0_519192_0.0_0 et 124676.194162 es 0

1441311362 ue 22229.952213 ct 82927.980000 fe 81923100000000 nm LATeah1062E_48.0_39_-4.78e-10_0 et 83281.410313 es 0
1441313888 ue 22229.952213 ct 85555.750000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.08e-10_0 et 85926.539526 es 0

1441333405 ue 28491.902557 ct 106304.000000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.9e-11_0 et 106763.213356 es 0
1441395093 ue 22229.952213 ct 83350.770000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.66e-10_1 et 83730.082275 es 0
1441398184 ue 22229.952213 ct 83907.970000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.65e-10_1 et 84294.570869 es 0
1441419421 ue 22229.952213 ct 85601.800000 fe 81923100000000 nm LATeah1062E_48.0_39_-8.11e-10_0 et 86014.443184 es 0

1441503590 ue 28491.902557 ct 107622.900000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.34e-10_0 et 108496.302332 es 0
1441506897 ue 28491.902557 ct 107845.800000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.32e-10_0 et 108711.538296 es 0
1441526085 ue 28491.902557 ct 105827.200000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.4e-11_1 et 106663.663586 es 0
1441613037 ue 28491.902557 ct 107877.400000 fe 105000000000000 nm LATeah1062E_80.0_41_-1e-12_1 et 108370.135456 es 0
1441619093 ue 28491.902557 ct 109998.200000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.25e-10_1 et 110504.588925 es 0
1441635224 ue 28491.902557 ct 107585.000000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.9e-10_1 et 108086.727041 es 0
1441722916 ue 28491.902557 ct 109316.400000 fe 105000000000000 nm LATeah1062E_80.0_67_-6.88e-10_1 et 109866.948675 es 0
1441726785 ue 28491.902557 ct 107132.800000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.67e-10_2 et 107679.916601 es 0
1441742977 ue 28491.902557 ct 107175.300000 fe 105000000000000 nm LATeah1062E_80.0_41_-7.99e-10_1 et 107730.733374 es 0
1441829462 ue 28491.902557 ct 104752.400000 fe 105000000000000 nm LATeah1062E_80.0_41_-1.99e-10_0 et 105388.579685 es 0
1441832468 ue 28491.902557 ct 102753.600000 fe 105000000000000 nm LATeah1062E_80.0_67_-3.98e-10_0 et 103277.664434 es 0
1441850121 ue 27873.823712 ct 104449.900000 fe 105000000000000 nm LATeah1062E_80.0_93_-1.41e-10_0 et 104960.060425 es 0

Finally, use a utility like Online Conversion to decode the timestamp in the first column. The first of the slow ones - 1441228080 - was completed at Wed, 02 Sep 2015 21:08:00 GMT. You can explore the other ones the same way.

Der Mann mit der Ledertasche
Der Mann mit de...
Joined: 12 Dec 05
Posts: 151
Credit: 302594178
RAC: 0

...ok, fine, so I can see

...ok, fine, so I can see when the WU were completed. But I can't see a timestamp where the WU slows down. I've to look for the next 30 hrs if the event protocol from Windows will show me something interesting.

But even if there is something slowing down my Computer after 3 hrs and 30-34% Progress; why switched the Progress back to zero or 1.xxx Percent?

Greetings from the North

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: ...ok, fine, so I can

Quote:
...ok, fine, so I can see when the WU were completed. But I can't see a timestamp where the WU slows down.

You will not see that in this data.

I would link to thank Richard for presenting your log data so nicely.

You will see when, in history your WUs slowed down (or sped up).

Quote:


I've to look for the next 30 hrs if the event protocol from Windows will show me something interesting.

But even if there is something slowing down my Computer after 3 hrs and 30-34% Progress; why switched the Progress back to zero or 1.xxx Percent?

No, the problem is you are assuming the %complete is accurate all the time. It is rarely accurate at the best of times (especially currently).

boinc manager (what you see) takes a guess based on some info, how long it should take, and hopefully the application sends an update and the manager adjusts during the runtime. Its first guess is too low, so it gets a realistic update and that is what you see. (hint ue = estimated time)

So you need to look at your host and determine what happened to cause the slowdown. You have the timestamps, when it started and stopped and a Windows event log to play in.

Sep 2 and 30 hours before that would be a good place to look at.

Thanks again Richard.

Der Mann mit der Ledertasche
Der Mann mit de...
Joined: 12 Dec 05
Posts: 151
Credit: 302594178
RAC: 0

...perhaps we're talking in

...perhaps we're talking in different Directions.
If something went wrong on this Computer, e.g. slowing down, why begin the Progress nearly 0 Percent after a former Progress of more than 30% after 3+ hrs?

Greetings from the North

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.