Lots of FPU errors

next_ghost
next_ghost
Joined: 25 Mar 05
Posts: 12
Credit: 246383
RAC: 0

RE: Did you modify your

Message 77695 in response to message 77694

Quote:
Did you modify your app_info.xml file?

Not at first. I've modified app_info.xml to allow einstein_S5R1 4.17 to compute einstein_S5R3 4.20 workunits now. I'll be really surprised if some workunit passes input sanity checks...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119163370691
RAC: 24613696

RE: RE: Did you modify

Message 77696 in response to message 77695

Quote:
Quote:
Did you modify your app_info.xml file?

Not at first. I've modified app_info.xml to allow einstein_S5R1 4.17 to compute einstein_S5R3 4.20 workunits now. I'll be really surprised if some workunit passes input sanity checks...

If the data in your cache is now "branded" as 4.20, you must have been crunching again with that version app, after you attempted to switch from 4.24 back to 4.17. In your tasks list, I can see the most recent tasks crunched with 4.20 and earlier ones crunched with 4.24 but none yet crunched with 4.17. You have one more task "in progress" at the moment. Two questions. How is it "branded" when you look with Boinc Manager? What actual app version is running when it is crunching? Don't necessarily believe what Boinc Manager might say - check for yourself with ps.

If 4.17 is actually running then you have successfully transitioned back to that version and it will be very interesting if that runs to completion without a signal 8 (FPU error). I'm sure Bernd will be very interested if it turns out that the FPU errors are somehow associated with changes introduced in the more recent versions of the app.

Cheers,
Gary.

next_ghost
next_ghost
Joined: 25 Mar 05
Posts: 12
Credit: 246383
RAC: 0

RE: If the data in your

Message 77697 in response to message 77696

Quote:
If the data in your cache is now "branded" as 4.20, you must have been crunching again with that version app, after you attempted to switch from 4.24 back to 4.17. In your tasks list, I can see the most recent tasks crunched with 4.20 and earlier ones crunched with 4.24 but none yet crunched with 4.17. You have one more task "in progress" at the moment. Two questions. How is it "branded" when you look with Boinc Manager? What actual app version is running when it is crunching? Don't necessarily believe what Boinc Manager might say - check for yourself with ps.

That workunit got deleted when I modified app_info.xml. A few workunits got cruched by 4.20 because I've accidentally left app_info.xml owned by root with 400 access permissions...

I have no workunits to crunch at the moment and BOINC says
Einstein@Home|Message from server: To get more Einstein@Home work, finish current work, stop BOINC, remove app_info.xml file, and restart.

Yes, I do have this entry in app_info.xml:

    
        einstein_S5R3
        420
       
            einstein_S5R1_4.17_i686-pc-linux-gnu
            
        
        
            einstein_S5R1_4.17_i686-pc-linux-gnu.so
        
    
next_ghost
next_ghost
Joined: 25 Mar 05
Posts: 12
Credit: 246383
RAC: 0

When I rename app_info.xml

When I rename app_info.xml and restart BOINC to get some work, rename it back and restart BOINC again, BOINC says this:

Einstein@Home|Found app_info.xml; using anonymous platform
Einstein@Home|[error] State file error: bad application name einstein_S5R3
Einstein@Home|[error] State file error: missing application einstein_S5R3
Einstein@Home|[error] Can't handle workunit in state file
Einstein@Home|[error] State file error: missing task h1_0787.70_S5R2__291_S5R3a
Einstein@Home|[error] Can't link task h1_0787.70_S5R2__291_S5R3a_1 in state file
Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119163370691
RAC: 24613696

RE: Yes, I do have this

Message 77699 in response to message 77697

Quote:

Yes, I do have this entry in app_info.xml:
    
        einstein_S5R3
        420
       
            einstein_S5R1_4.17_i686-pc-linux-gnu
            
        
        
            einstein_S5R1_4.17_i686-pc-linux-gnu.so
        
    

Your problem is the S5R1_4.17 bit!

I wasn't properly paying attention when Bikeman made the suggestion to use that app. We are currently doing the S5R3 run and so you need apps that have S5R3 in the name and not S5R1. On looking back through the S5R3 betas, there was one that was version 4.16 for Linux that came before version 4.20. The one before that was version 4.14 which is the one I've been using until recently. I've never used 4.16 and so I wasn't smart enough to notice that 4.17 was probably wrong for a Linux version.

Do you recall what version you were using before you started getting the FPU errors with version 4.20? It could have been either 4.16 or 4.14 or it could have been an even earlier version. If you were intending to go back and try 4.16 for example, you would perform the above style edit on the 4.16 app_info.xml file to allow 4.20 branded data to be crunched with 4.16. Once you have added the references for 4.20 to be crunched by 4.16 it should work. All you should need to do is

  • * Stop boinc
    * Copy the 4.16 executables and the modified app_info.xml to the EAH project directory.
    * Restart boinc
    * Boinc should be using the 4.16 app to crunch 4.20 tasks if you have the modifications to app_info.xml file done correctly. It will say that it is using 4.20 if you have 4.20 tasks in your cache but it will actually be using 4.16.

I'm very sorry for not noticing the error earlier with the suggestion to use 4.17.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 805214239
RAC: 1246840

Oh sorry, my fault

Oh sorry, my fault actually!!!

I wasn't aware the S5R1 apps were still on that server as well.

My apologies!!!

The S5R3 4.16 would be here:

http://einstein.phys.uwm.edu/app_test/linux/einstein_S5R3_4.16_i686-pc-linux-gnu.tar.gz

CU

Bikeman

next_ghost
next_ghost
Joined: 25 Mar 05
Posts: 12
Credit: 246383
RAC: 0

4.16 crashes as well. I've

4.16 crashes as well. I've found out that glibc was updated to 2.7 around the time of first crashes. I'll try again with glibc 2.6.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.