This is an archive of the discontinued LLVM Phabricator instance.

Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3).
AbandonedPublic

Authored by hdelan on Jan 4 2022, 2:26 AM.

Details

Reviewers
None

Diff Detail

Unit TestsFailed

Event Timeline

hdelan created this revision.Jan 4 2022, 2:26 AM
hdelan requested review of this revision.Jan 4 2022, 2:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2022, 2:26 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a subscriber: tra.Jan 4 2022, 10:44 AM
tra added inline comments.
clang/lib/Driver/ToolChains/Cuda.cpp
433

I think this would be contrary to the expectation that lack of -O in clang means - do not optimize and it generally implies the whole compilation chain, including assembler. Matching whatever nvidia tools do is an insufficient reason for breaking this assumption, IMO.

If you do want do run optimized ptxas on unoptimized PTX, you can use -Xcuda-ptxas -O3.

hdelan added inline comments.Jan 5 2022, 2:34 AM
clang/lib/Driver/ToolChains/Cuda.cpp
433

I think for the average user, consistency across the ptxjitcompiler and ptxas is far more important than assuming that no -O means no optimization. I think most users will assume that no -O will assume that whatever tools being used will take their default optimization level, which in the case of clang is -O0 and in the case of ptxas is -O3.

We have had a few bugs with ptxas/ptxjitcompiler at higher optimization levels, which were quite hard to pin down since offline ptxas and ptxjitcompiler were using different optimisation levels, making bugs appear in one and not the other. Of course we are aware of this now but this inconsistency can result in bugs that are difficult to diagnose. Having consistency between the ptxjitcompiler and ptxas is therefore of practical benefit. Whereas if we are to leave it as is, with ptxas defaulting to -O0, the benefit is purely semantic and not practical.

tra added inline comments.Jan 5 2022, 10:52 AM
clang/lib/Driver/ToolChains/Cuda.cpp
433

I think for the average user, consistency across the ptxjitcompiler and ptxas is far more important than assuming that no -O means no optimization.

The default is intended to provide the least amount of surprises for the most users. There are more users of clang as a CUDA compiler than users of clang as a cuda compiler who care about consistency with ptxjitcompiler. My point is that the improvements for a subset of users should be balanced vs usability in the common case. In this case the benefit does not justify the downsides, IMO.

Please add me as a reviewer when the patch is ready for public review and we'll discuss it in a wider LLVM community.

hdelan added inline comments.Jan 31 2022, 1:56 AM
clang/lib/Driver/ToolChains/Cuda.cpp
433

We have come to the same conclusion that it is best to leave this unchanged upstream. However this change has been made locally in intel/llvm.

hdelan abandoned this revision.Jan 31 2022, 1:58 AM

Closing revision