This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046)
ClosedPublic

Authored by spatel on Aug 7 2017, 7:43 AM.

Details

Summary

I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141

The truncated rotate-by-variable patterns elude all of the existing transforms because of multiple uses and knowledge about demanded bits and knownbits that doesn't exist without the whole pattern. So we need an unfortunately large pattern match. But by simplifying this pattern in IR, the backend is already able to generate rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic. Note that rotate-by-constant doesn't have this problem - smaller folds should already produce the narrow IR ops.

For the motivating cases from the bug report, in addition to using narrow ops, we have a net win of two less instructions (kill 3 zext/trunc but add a mask op). I initially forgot that we need that mask, but Alive confirms that it would be wrong to leave the mask of the opposite shift amount off:
http://rise4fun.com/Alive/GSy
http://rise4fun.com/Alive/hif

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Aug 7 2017, 7:43 AM
spatel retitled this revision from [InstCombine] narrow rotate left/patterns to eliminate zext/trunc (PR34046) to [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).Aug 7 2017, 7:57 AM
efriedma edited edge metadata.Aug 7 2017, 12:27 PM

I spent a bit of time playing with the generated for for various implementations of rotate for 16-bit values. It looks like this transform actually makes things worse; e.g. on ARM the transformed version compiles to one more instruction. Granted, that's probably not important; I think the optimal lowering for a 16-bit rotate on ARM actually involves lowering it to a 32-bit rotate and fixing up the result, and that's probably a transform which doesn't make sense until legalization.

Also, on x86, it looks like we're missing an important pattern here: we generate the sequence andb $15, %cl; rolw %cl, %ax.

lib/Transforms/InstCombine/InstCombineCasts.cpp
468 ↗(On Diff #109986)

Would it make sense to use known bits here rather than explicitly checking for zext? This matters because we tend to transform zext(trunc(x)) -> x & c.

492 ↗(On Diff #109986)

It seems like generating the mask is worth doing here, if we think it'll be eliminated by later lowering.

craig.topper edited edge metadata.Aug 7 2017, 12:41 PM

I think we miss the and x, 15 because we use the same pattern we do for shifts where we look for 5-bit masking even in 16-bit mode because hardware doesn't ignore bit 4. For shift the implication of that is obvious, if bit 4 is set then the shift will fill with 0s on 16-bit shift. For rotate, I think it just means we'd rotate around a second time and still get the correct result, but I'm not sure.

Also, on x86, it looks like we're missing an important pattern here: we generate the sequence andb $15, %cl; rolw %cl, %ax.

Yes, I saw that too. This was noted back in:
https://bugs.llvm.org/show_bug.cgi?id=17332#c6
...so we likely need to add to the tablegen pattern-matching.

lib/Transforms/InstCombine/InstCombineCasts.cpp
468 ↗(On Diff #109986)

Yes, that would be a general more solution.

492 ↗(On Diff #109986)

OK - I wasn't sure if the extra instruction would make this transform less desirable. But yes, if we add the mask ourselves, then I think we'll be able to handle more sloppy versions of the incoming code. I'll add more tests.

spatel added a comment.Aug 7 2017, 2:03 PM

I spent a bit of time playing with the generated for for various implementations of rotate for 16-bit values. It looks like this transform actually makes things worse; e.g. on ARM the transformed version compiles to one more instruction. Granted, that's probably not important; I think the optimal lowering for a 16-bit rotate on ARM actually involves lowering it to a 32-bit rotate and fixing up the result, and that's probably a transform which doesn't make sense until legalization.

I just looked at this, and I think it's a non-issue because the transform is guarded by shouldChangeType(). We shouldn't transform to i8/i16 for ARM/AArch64 because they don't list those types as legal in the datalayout:

$ ./clang -O1 rot16.c -S -o - -target aarch64 -emit-llvm
...
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

$ ./clang -O1 rot16.c -S -o - -target x86_64 -emit-llvm
...
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

spatel updated this revision to Diff 110088.Aug 7 2017, 2:27 PM

Patch updated:

  1. Add an assert for the shouldChangeType() guard that's in the caller function.
  2. Replace the explicit match for zext of the shift value with a call to MaskedValueIsZero().
  3. Remove the MaskedValueIsZero() check for the shift amount and generate our own masks to make the transform safe:

http://rise4fun.com/Alive/Qzm

  1. Add test to exercise #2 and #3.

Looks fine, but give it a couple of days to see if anyone else wants to review.

craig.topper added inline comments.Aug 8 2017, 4:38 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
484 ↗(On Diff #110088)

Since we don't support vectors here. Should this be getIntegerBitWidth()?

485 ↗(On Diff #110088)

Could you have used m_SpecificInt in the matchers above? to avoid this separate check?

spatel added inline comments.Aug 8 2017, 4:47 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
484 ↗(On Diff #110088)

But we do support vectors - see the 2nd regression test.

485 ↗(On Diff #110088)

Because we do support vectors, we can't do that?

I think I confused myself reading this assert. So the !isa<IntegerType> is for the vector case? Would it be more readable as isa<VectorType>?

assert((!isa<IntegerType>(Trunc.getSrcTy()) ||
        shouldChangeType(Trunc.getSrcTy(), Trunc.getType())) &&
       "Don't narrow to an illegal type");

m_SpecificInt should work on vectors right?

/// \brief Match a specific integer value or vector with all elements equal to
/// the value.
inline specific_intval m_SpecificInt(uint64_t V) { return specific_intval(V); }
spatel added a comment.Aug 9 2017, 6:21 AM
/// \brief Match a specific integer value or vector with all elements equal to
/// the value.
inline specific_intval m_SpecificInt(uint64_t V) { return specific_intval(V); }

Ah, never noticed/used that. So yes, let me use that and update the assert to be clearer. Thanks!

spatel updated this revision to Diff 110374.Aug 9 2017, 6:24 AM
spatel marked an inline comment as done.

Patch updated - no functional changes intended from the previous rev, but cleaner:

  1. Use isa<VectorType> in the assert and the callers' type check to make it clearer that vectors are allowed.
  2. Use m_SpecificInt in pattern matches to reduce code.
This revision is now accepted and ready to land.Aug 9 2017, 9:48 AM
This revision was automatically updated to reflect the committed changes.