Page MenuHomePhabricator

[libomptarget] compile DeviceRTL bc files with -O3
ClosedPublic

Authored by ye-luo on Jul 7 2022, 10:10 PM.

Details

Summary

bc files of DeviceRTL are compiled with -O3, the same as the static library.

Diff Detail

Event Timeline

ye-luo created this revision.Jul 7 2022, 10:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 10:10 PM
Herald added a subscriber: mgorny. · View Herald Transcript
ye-luo requested review of this revision.Jul 7 2022, 10:10 PM
Herald added a project: Restricted Project. · View Herald Transcript
ye-luo retitled this revision from [libomptarget] compile bc files with -O3 to [libomptarget] compile DeviceRTL bc files with -O3.Jul 7 2022, 10:13 PM
ye-luo edited the summary of this revision. (Show Details)
ye-luo added a reviewer: tianshilei1992.
ye-luo edited the summary of this revision. (Show Details)
jhuber6 accepted this revision.Jul 8 2022, 7:49 AM
jhuber6 added a subscriber: jhuber6.

LG assuming this doesn't break anything anymore, we used to have problems with definitions getting optimized out but it seems to be fixed. The plan is still to remove this in favor of the static library and LTO, but this should improve things until we make the change.

This revision is now accepted and ready to land.Jul 8 2022, 7:49 AM
This revision was automatically updated to reflect the committed changes.
domada added a subscriber: domada.Jul 13 2022, 11:07 AM

@ye-luo Hi,
Could you describe/share with us some benchmarks results which prove that it is worth to turn on -O3 optimization?

@ye-luo Hi,
Could you describe/share with us some benchmarks results which prove that it is worth to turn on -O3 optimization?

I don't have any graphs, but most applications will see some performance gain when using a more optimized runtime library. I've looked at XSBench, RSBench, MiniQMC, and SU3Bench. Is there a reason having O3 is not desirable? It should only slightly increase the build times for LLVM, which is hardly worth slower execution times.

When I compared miniQMC kernel performance w/ w/o LTO, the difference comes from bc files (slower) being compiled with O1 and the LTO used static library (faster) being compiled with O3. About 30% difference on a kernel I was monitoring.
To reduce the variants among compilation options, it is better to just use O3.
For a long time, we cannot change to O3 because of the backend rejects the kernel compiled with O3. This issue has been resolved and I changed the bc compilation to O3.

@jhuber6 Your accepting comment sounds that there is a risk connected with -O3 optimization. That's why I wanted to know if it is worth to turn on O3 optimization. Thanks for explanation.
@ye-luo Thanks for your response.