We proposed an additional GPU memory checking procedure before CUDA tasks are launched. Backround: today, among other things, the total device memory is used to decide whether we sent work to a client requesting GPU work or not. This is, however, not enough to ensure that there's enough memory available on the device when the task is actually launched. The new method (already added to BOINC, pending release 6.10.25) will check the available memory and compare it to the minimum requirements of a given task. If it's not sufficient the task will be deferred for five minutes and other GPU tasks might be launched in the meantime. It's up to the user to decide whether this memory limitation is temporary or permanent (e.g. caused by multi-head setup and/or screen resolution) in which case one should opt-out GPU work. This opt-out process should eventually be made more fine-grained such that this decision can be made on the application rather than the project level.
The check-in notes mentioned both the scheduler and the client needed to be updated.. so this will only help Einstein@Home when they upgrade their server-side software, right?
I also hope by then the size of the E@H CUDA/GPU tasks will also be less than 3 hours (a limit which appears to be what the people at S@H are using) so E@H plays better with others on the GPU.
Our tests indicate that the upcoming ABP2 GPU tasks typically take ~0.6 hours per WU. This is still "just" a factor of 2-3 faster than the ABP2 CPU version, but after the ABP2 release we are going to concentrate on improving the GPU code.
Cheers,
Oliver
You mentioned elsewhere that this one uses single precision a lot more. Would there be a speed benefit to having an x64 app?
Could the inefficiencies of the actually CUDA app have to do with the high values of page faults from the cpu-app???
Here a log from sysinternals process-explorer:
Quote:
Process_________PID___CPU___Virtual Size__Working Set__Page Faults____PF Delta__CPU Time
einsteinbinary____2276_100.00_V138.348___K100.908____K39.551.450__4.990____2:11:58.562
Amolqc-preRC1_5_2840_000.00_V261.496___K075.600____K18.882______0.000____1:37:35.015
Okay QMC is not running, but look at the cpu times and compare the values of page faults.
For your information, We
)
For your information,
We proposed an additional GPU memory checking procedure before CUDA tasks are launched. Backround: today, among other things, the total device memory is used to decide whether we sent work to a client requesting GPU work or not. This is, however, not enough to ensure that there's enough memory available on the device when the task is actually launched. The new method (already added to BOINC, pending release 6.10.25) will check the available memory and compare it to the minimum requirements of a given task. If it's not sufficient the task will be deferred for five minutes and other GPU tasks might be launched in the meantime. It's up to the user to decide whether this memory limitation is temporary or permanent (e.g. caused by multi-head setup and/or screen resolution) in which case one should opt-out GPU work. This opt-out process should eventually be made more fine-grained such that this decision can be made on the application rather than the project level.
Cheers,
Oliver
Einstein@Home Project
The check-in notes mentioned
)
The check-in notes mentioned both the scheduler and the client needed to be updated.. so this will only help Einstein@Home when they upgrade their server-side software, right?
RE: RE: I also hope by
)
You mentioned elsewhere that this one uses single precision a lot more. Would there be a speed benefit to having an x64 app?
Milkyway have one for ATI/Cal.
BOINC blog
RE: You mentioned
)
I did some experiments with x64 builds (under Linux) and found no significant performance increase so far.
CU
Bikeman
RE: RE: You mentioned
)
Hi Bikeman,
I presume that is the CPU app you are referring to?
Cheers
BOINC blog
RE: Hi Bikeman, I presume
)
Yes. If even the CPU is not benefitting from x64 compilation, the GPU app should not benefit either.
CU
Bikeman
Any news on ABP2 CUDA? or a
)
Any news on ABP2 CUDA? or a 'possible' ATI client?
Could the inefficiencies of
)
Could the inefficiencies of the actually CUDA app have to do with the high values of page faults from the cpu-app???
Here a log from sysinternals process-explorer:
Okay QMC is not running, but look at the cpu times and compare the values of page faults.
Discussion of ABP2 CUDA app
)
Discussion of ABP2 CUDA app continues here:
http://einsteinathome.org/node/194710