Page MenuHomePhabricator

[SelectionDAG] Utilize ARM shift behavior
AbandonedPublic

Authored by shawnl on May 1 2019, 2:03 PM.

Details

Summary

LLVM-IR shifts have UB if the shift amount is equal to or greater to
the shift range, but a number of architectures have well-defined
behavior in this case.

Utilize the behavior of ARM for shl and lshr to remove some code,
as can be seen in the tests.

Fixes PR41363

Diff Detail

Event Timeline

shawnl created this revision.May 1 2019, 2:03 PM
nikic added a comment.May 1 2019, 2:11 PM

I suspect that it's not safe to do this in this form: While the shift instruction on the target might be well defined, this is still operating on ISD opcodes that consider out-of-bounds shifts to be undef. After you have eliminated the check, some other combine could come along and optimize the shift to undef based on that.

craig.topper added inline comments.
include/llvm/CodeGen/SelectionDAGNodes.h
729

Is this used?

Wouldn't it be better/safer to create ARMISD shift opcodes to handle this behaviour?

dmgreen added a subscriber: dmgreen.EditedMay 1 2019, 2:44 PM

Hello

I was under the impression that the shift was by the bottom byte amount. i.e the mask is 255, and a shift of 256 is the same as a shift of 0. I have not tried it though.

For aarch64 the mask may to be the size of the datatype. The armarm will contain the correct information if you can decipher the pseudo code.

bjope added a comment.May 1 2019, 3:06 PM

Wouldn't it be better/safer to create ARMISD shift opcodes to handle this behaviour?

I think so too. Even if this could be something that other targets could benefit from as well. And we need something safer since as @nikic mentioned this transform seem to be wrong (as it leaves the undefined shift unguarded, and later transforms might find that the shift count is out-of-bounds and remove the shift).

About the idea in general:
Our downstream target also accepts shift counts that are "out-of-bounds", so this looks like something we could benefit from. Although our target also does not have separate instructions for left/right shift. So basically a negative shift count in an SHL will become a right shift if we just lower the SHL into a target shift instruction. So the solution here would not work right out-of-the-box for our target (we would need some special "shift behavior"). So it might be difficult to make this generic enough to safely be reused by other targets (the different shift behaviors must be specified in detail).

About introducing target specific opcodes:
One tricky part is to decide when to introduce such target specific shifts. As soon as something is turned into a target specific ISD node you are "on you own" to handle further combines etc for that ISD node (including value tracking support etc). So usually you want to do it quite late. But then there is a risk that it is harder to detect the pattern (if other combines/legalization etc already has lowered/folded the SETCC etc into something else).

Wouldn't it be better/safer to create ARMISD shift opcodes to handle this behaviour?

Agreed; we make assumptions about ISD::SHL etc. all over; changing that would be a lot of work, and might end up pessimizing code overall.

I was under the impression that the shift was by the bottom byte amount. i.e the mask is 255, and a shift of 256 is the same as a shift of 0. I have not tried it though.

This is true for NEON vectors, not scalars, as far as I can tell.

For aarch64 the mask may to be the size of the datatype. The armarm will contain the correct information if you can decipher the pseudo code.

Yes, lslv etc. masks the shift amount by the operand size.

shawnl marked an inline comment as done.EditedMay 1 2019, 3:25 PM

It is certainly not safe to do in LLVM-IR, but there has to be some point at which this can be done, and at this point it is still easy to match the pattern (although it does not match the -O0 pattern with an icmp instead of a select).

But yes, an ARMSHL/ARMLSR would be much safer.

include/llvm/CodeGen/SelectionDAGNodes.h
729

Oh, it is not any more...it will be removed.

I was under the impression that the shift was by the bottom byte amount. i.e the mask is 255, and a shift of 256 is the same as a shift of 0. I have not tried it though.

This is true for NEON vectors, not scalars, as far as I can tell.

The 8.1-m armarm seems to claim for LSL "The variable number of bits is read from the bottom byte of a register". It's an alias for MOV and there's a "shift_n = UInt(R[s]<7:0>);" in the pseudo code. I have been know to mis-read these things in the past though..

It's not the end of the world, but would mean that we would need to know that the shift amount is less that 256 as far as I understand.

The 8.1-m armarm seems to claim for LSL "The variable number of bits is read from the bottom byte of a register"

You're right, sorry, I misread armarm.

shawnl added a comment.May 1 2019, 6:09 PM

It's not the end of the world, but would mean that we would need to know that the shift amount is less that 256 as far as I understand.

It only works for this case:

if (n == 0)
return 0;
else
return (op0) << (64 - n);

shawnl planned changes to this revision.May 1 2019, 6:12 PM
RKSimon resigned from this revision.Jun 25 2020, 12:20 AM
shawnl abandoned this revision.Jun 25 2020, 2:02 AM