This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Reduce 64-bit lshr by constant to 32-bit
ClosedPublic

Authored by arsenm on Jan 15 2016, 7:36 PM.

Details

Reviewers
tstellarAMD
Summary

64-bit shifts are very slow on some subtargets.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 45065.Jan 15 2016, 7:36 PM
arsenm retitled this revision from to AMDGPU: Reduce 64-bit lshr by constant to 32-bit.
arsenm updated this object.
arsenm added a reviewer: tstellarAMD.
arsenm added a subscriber: llvm-commits.
tstellarAMD accepted this revision.Jan 18 2016, 7:41 AM
tstellarAMD edited edge metadata.

With those fixed, LGTM.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2567–2569

Was this change meant for another commit?

2584–2586

The parentheses in this comment should be fixed to make it less confusing.

This revision is now accepted and ready to land.Jan 18 2016, 7:41 AM
arsenm added inline comments.Jan 18 2016, 12:29 PM
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2567–2569

Yes. Apparently you aren't supposed to use BUILD_PAIR/EXTRACT_ELEMENT after legalization, although we do it anyway and it happens to work. We currently have a mix of bitcast + build_vector and build_pair. I'm not sure we really want either though. BUILD_PAIR isn't supposed to work, and the vector operations confuse other basic optimizations. computeKnownBits doesn't look through vector extracts for example, although it could be special cased for extract from a build_vector with a constant index.

arsenm closed this revision.Jan 18 2016, 1:48 PM

r258090