This is an archive of the discontinued LLVM Phabricator instance.

Make InstCombine aware of TargetTransformInfo when optimize extension
Needs ReviewPublic

Authored by wengxt on Jun 25 2015, 3:10 PM.

Details

Summary

In NVPTX64, 64-bit register is simulated by two 32-bit register which
makes it more expensive on 64-bit operations.

In this patch we compute the extra cost of extending the expression
tree for zext and sext by exploiting TargetTransformInfo and make sure
the transform will not bring more costs.

Diff Detail

Event Timeline

wengxt updated this revision to Diff 28507.Jun 25 2015, 3:10 PM
wengxt retitled this revision from to Make InstCombine aware of TargetTransformInfo when optimize extension.
wengxt updated this object.
wengxt edited the test plan for this revision. (Show Details)
wengxt added reviewers: dberlin, chandlerc.
wengxt added subscribers: tstellarAMD, jingyue, broune and 3 others.
wengxt updated this revision to Diff 28508.Jun 25 2015, 3:15 PM

update reviewer

chandlerc requested changes to this revision.Jun 26 2015, 1:19 AM
chandlerc edited edge metadata.

This doesn't seem like the right approach at all.

I don't think we want to use cost heuristics to dictate the canonical form in the IR. It makes the canonicalization *much* more complex and hard to manage.

I think that NVPTX should either change the datalayout to make i64 illegal (and that seems reasonable for instcombine to respect, but that would for example mean it would be an *absolute* decision, not a cost decision) or you'll need to do a late transform to narrow the operations as much as possible.

This revision now requires changes to proceed.Jun 26 2015, 1:19 AM
wengxt updated this revision to Diff 28589.Jun 26 2015, 12:02 PM
wengxt edited edge metadata.

add datalayout test case to make sure it will not pass in old code

What's special about NVPTX is that NVPTX emits PTX, a virtual ISA, instead of real machine code (aka SASS). The CUDA driver JIT compiles PTX to machine code at runtime. Ideally, NVPTX should codegen machine code directly, but that requires SASS ISA to be public.

Therefore, i64 is a legal DL type for PTX, but just more expensive at runtime because the machine code needs two 32-bit registers to simulate an i64. Does LLVM's target-independent IR optimizer implicitly assume legal integer types are equally cheap? I don't think that's a right assumption. IIRC, AMDGPU also has similar issues where widening integers can hurt performance [https://llvm.org/bugs/show_bug.cgi?id=21148]. We solved that by disabling indvar widening if 64-bit arithmetics are more expensive than 32-bit.

I am not sure how viable the second approach (i.e. narrowing the operations as much as possible) is. While widening is sound, narrowing can be unsound in many cases, not mentioning the complexity of bit tracking etc. Anyhow, it doesn't feel right to undo something in certain targets rather than not doing that in the target-independent phase (with TTI checks of course) at the first place.

Alternatively, what do you think about modifying ShouldChangeType to disallow such conversion for NVPTX instead of introducing a cost model? ShouldChangeType sounds to me like a place we put legality checks. Although i64 is a DL legal type, such sext/zext can be considered "illegal" due to performance reasons.

LMK. Thanks!