This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering] Only demand a rotation's modulo amount bits
ClosedPublic

Authored by RKSimon on Mar 15 2020, 12:32 PM.

Details

Summary

ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.

Diff Detail

Event Timeline

RKSimon created this revision.Mar 15 2020, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2020, 12:32 PM
RKSimon marked 2 inline comments as done.Mar 15 2020, 12:34 PM
RKSimon added inline comments.
llvm/test/CodeGen/SystemZ/rot-01.ll
31

@uweigand Should this be 32 or 64?

llvm/test/CodeGen/SystemZ/rot-02.ll
7

@uweigand This comment doesn't seem to match the test - any suggestions?

uweigand added inline comments.Mar 16 2020, 2:08 AM
llvm/test/CodeGen/SystemZ/rot-01.ll
13

In order for this test to still test what it is intended to test (see below), this should probably be changed to 16 instead of 32 now.

31

This should be 32. The point of this test is that we actually need the AND, but we should use the most efficient instruction (when using the result of the AND only in a rotate, we can implement the AND via NILL instead of the NILF we'd otherwise require).

llvm/test/CodeGen/SystemZ/rot-02.ll
7

With these changes, only 5 bits count (instead of 6 bits) when operating on 32-bit variables. You should probably change the comment here to 5 bits, and then change the AND constant from 31 to 15 or so.

llvm/test/CodeGen/SystemZ/shift-04.ll
100–101

Now that we actually do mask the amount, the comment should be updated.

120

Now I'm wondering: why doesn't the masking also work here?

150

Or here?

160

Likewise.

llvm/test/CodeGen/SystemZ/shift-08.ll
101–102

Same comments as with shift-04.ll apply to this file.

RKSimon updated this revision to Diff 250518.Mar 16 2020, 4:25 AM

Tweaked systemz tests based on feedback from @uweigand

aykevl added a comment.EditedMar 16 2020, 7:32 AM

See my inline comment. I only checked rol8 on AVR but I think the same applies to ror8. Basically it seems like the transformation is correct but will make the operation unacceptably slower.

I'm not sure I can help you fix this, I'm still very new to backend development.

llvm/test/CodeGen/AVR/rot.ll
7 ↗(On Diff #250518)

If I'm reading this IR, correctly, it does a rotate left just as the name implies.
I think the transformation here is correct in that it will still produce the same output, but without the mask the rotate may take a lot longer. I think the AVR assembly could be written as the following pseudocode (assume 8-bit unsigned integers everywhere):

// r24 = val
// r22 = amt
def rol8(r24, r22):
    r22 &= 7           // this mask is removed by this patch
    if r22 == 0:
        return r24     // return value (LBB0_2)
    while 1:
        r24 = (r24 << 1) | (r24 >> 7) // rotate r24 left by 1 (lsl, adc)
        r22 -= 1
        if r22 == 0:   // brne .LBB0_1
            return r24 // .LBB0_2, ret

So if you're calling rol8(x, 200) it will take 200 iterations to rotate a number.

RKSimon added inline comments.Mar 16 2020, 8:39 AM
llvm/test/CodeGen/AVR/rot.ll
7 ↗(On Diff #250518)

I think this is a fault in the AVR backend - ROTL/ROTR/FSHL/FSHR all assume modulo amounts, so AVRTargetLowering::LowerShifts should mask accordingly. I'll add this to the patch shortly.

RKSimon updated this revision to Diff 250587.Mar 16 2020, 10:08 AM

Added explicit masking to AVX rotation lowering.

Thank you for the fix for AVR! It's unfamiliar code to me but it seems reasonable.

lebedev.ri accepted this revision.Mar 16 2020, 11:25 PM
lebedev.ri added a subscriber: lebedev.ri.

Looks good to me.

This revision is now accepted and ready to land.Mar 16 2020, 11:25 PM
This revision was automatically updated to reflect the committed changes.