Page MenuHomePhabricator

[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed
ClosedPublic

Authored by sameconrad on Jun 2 2018, 6:17 PM.

Details

Summary

Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.

Patch by: sameconrad (Sam Conrad)

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lebedev.ri added inline comments.Jun 3 2018, 1:11 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5096–5100 ↗(On Diff #149623)

Also, it is intentional that you extract from LHS and store it into a variable named RHS and vice-versa?
If yes, add a comment please.

sameconrad added inline comments.Jun 3 2018, 3:14 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5091–5102 ↗(On Diff #149623)

I'll add some extra comments here and update in a bit, but I don't think we're losing out on anything.
The idea here is that if MatchRotateHalf matches one half of the OR as a shl/srl but not the other, we try to extract the needed shift from the unmatched side to complete the rotate. The case where neither side matched (ie NeedRotLHS && NeedRotRHS) is handled above, so at this point either no side is missing and neither of these if cases will fire, or only one side is missing, in which case we try to extract the shift from the missing side (using the shift that did match from the opposite side to compute the shift amount and opcode needed, hence we pass LHSShift when trying to extract from the RHS and vice versa). If extraction fails at that point then we definitely can't complete the rotate and bail.

For instance if NeedRotRHS==true and LHSShift is (srl v 10) and bitwidth(v)==32, then extractShiftFromMulOrDiv will compute that we need to extract (shl v 22) from RHS based on the shift amount and opcode in LHSShift.

sameconrad updated this revision to Diff 149658.Jun 3 2018, 4:14 PM

Update tests with update_llc_test_checks
Improve comments in MatchRotate

sameconrad marked an inline comment as done.Jun 3 2018, 4:15 PM
sameconrad added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5096–5100 ↗(On Diff #149623)

Updated with some clarifying comments

sameconrad marked an inline comment as done.Jun 3 2018, 4:16 PM
lebedev.ri added inline comments.Jun 3 2018, 11:34 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5057 ↗(On Diff #149658)

Is this always considered worth doing,
or the check whether this should be done is on the caller's side?

test/CodeGen/X86/rotate-extract.ll
2 ↗(On Diff #149658)

You could also copy this into test/CodeGen/AArch64/rotate-extract.ll,
since that should also profit from the transform.

sameconrad updated this revision to Diff 149879.Jun 4 2018, 5:24 PM

Added AArch64 test

sameconrad marked an inline comment as done.Jun 4 2018, 5:37 PM
sameconrad added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5057 ↗(On Diff #149658)

It looks like MatchRotate always gets called from DAGCombiner::visitOr, I don't think there's any sort of profitability check, so it seems to assume its always worthwhile (however I'm still pretty new to LLVM and don't know if there are other places this could be done). In any case, the new code would transform 4 instructions to 2 (2 mul/udiv ops, 1 shift, 1 or -> one mul/udiv op and 1 rotate).

sameconrad updated this revision to Diff 149902.Jun 4 2018, 8:58 PM

Quick update to improve assert messages

The tests probably need committing with trunk's current codegen, so this patch then shows the diff.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4822 ↗(On Diff #149902)

You can merge these if statements:

if (Op.getOpcode() == ISD::AND &&
    DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {
4831 ↗(On Diff #149902)

Does this need to be a DAGCombiner method any more? Just reduce it to a static.

4842 ↗(On Diff #149902)

Again, what do you gain from this being a DAGCombiner method?

4850 ↗(On Diff #149902)

Do we have any tests that use the Mask?

4878 ↗(On Diff #149902)

This is more convoluted than it needs to be - the GetConstant isn't really helping and comparing against APInt() is a bad idea

Would it be better to just do (not sure about this):

// Attempt to find non-zero constants for opposing shift and the mul/div +shift.
ConstantSDNode *OppShiftCst = isConstOrConstSplat(OpposingShift.getOperand(1));
ConstantSDNode *ShiftCst = isConstOrConstSplat(ShiftLHS.getOperand(1));
ConstantSDNode *MulOrDivCst = isConstOrConstSplat(MulOrDiv.getOperand(1));
if (!OppShiftCst || !ShiftCst !! MulOrDivCst ||
    !OppShiftCst->getAPIntValue() || !ShiftCst->getAPIntValue() || !MulOrDivCst->getAPIntValue())
  return SDValue();

Also, we should bail if the OppShiftCst is out of range.

4885 ↗(On Diff #149902)

Is this better?

APInt ShiftExtractDiv = APInt::getOneBitSet(VTWidth, NewShiftAmt;.getZextValue());
4901 ↗(On Diff #149902)

Worth doing this earlier?

5116 ↗(On Diff #149902)

This could be reduced to:

if (NeedRotRHS)
  RHSShift = extractShiftFromMulOrUDiv(LHSShift, RHS, RHSMask, DL);
if (NeedRotLHS)
  LHSShift = extractShiftFromMulOrUDiv(RHSShift, LHS, LHSMask, DL);
if (!RHSShift || !LHSShift)
  return nullptr;

BTW, that reminds me.
@sameconrad while you are looking at rotates in backend, can you please also check (at least add tests,
if there aren't any yet) that backend properly handles the cases where mul/div were transformed into shifts?

lebedev.ri added inline comments.Jun 13 2018, 11:09 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
490 ↗(On Diff #149902)

In other words, the cases where c0, c1, c2 were power-of-two, and thus instcombine turned them into shifts.
So additional [straight-forward] cases are:

///   (or (shrl (shl v c0') c1) (shl v c2')) ->
///   (or (shrl (shl v c0') c1) (shl (shl v c0') c3))
/// and
///   (or (shrl v c0') (shl (shrl v c1') c2)) ->
///   (or (shrl (shrl v c1') c3) (shl (shrl v c1') c2))

There is also an additional problem when we are/will convert shrl+shl / shl+rhrl into masking...

Clean up code quality
Add tests with bitmask case

Small bit of code cleanup that was missed

sameconrad added inline comments.Jun 13 2018, 6:58 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4850 ↗(On Diff #149902)

Added a mask test

4878 ↗(On Diff #149902)

Fixed up most of this but wasn't sure what you meant by out of range, do you mean if the existing shift amount is greater than the bitwidth?

sameconrad added a comment.EditedJun 13 2018, 7:02 PM

BTW, that reminds me.
@sameconrad while you are looking at rotates in backend, can you please also check (at least add tests,
if there aren't any yet) that backend properly handles the cases where mul/div were transformed into shifts?

I can take a look at that, will do that in a separate patch.

Actually nvm, I see what you're saying. I'll try to update this patch in a bit for that case.

BTW, that reminds me.
@sameconrad while you are looking at rotates in backend, can you please also check (at least add tests,
if there aren't any yet) that backend properly handles the cases where mul/div were transformed into shifts?

I can take a look at that, will do that in a separate patch.

Actually nvm, I see what you're saying. I'll try to update this patch in a bit for that case.

You can refer to the tests in D48229 for the examples of patterns that are related.

Update to also handle cases where we need to extract a smaller shift from a shl/srl.

BTW, that reminds me.
@sameconrad while you are looking at rotates in backend, can you please also check (at least add tests,
if there aren't any yet) that backend properly handles the cases where mul/div were transformed into shifts?

I can take a look at that, will do that in a separate patch.

Actually nvm, I see what you're saying. I'll try to update this patch in a bit for that case.

You can refer to the tests in D48229 for the examples of patterns that are related.

Updated the patch to handle extracting from a shl/srl.
For handling srl+shl->bitmask cases I'd prefer to do a follow-up patch since I need a bit more time to look at it.

BTW, that reminds me.
@sameconrad while you are looking at rotates in backend, can you please also check (at least add tests,
if there aren't any yet) that backend properly handles the cases where mul/div were transformed into shifts?

I can take a look at that, will do that in a separate patch.

Actually nvm, I see what you're saying. I'll try to update this patch in a bit for that case.

You can refer to the tests in D48229 for the examples of patterns that are related.

Updated the patch to handle extracting from a shl/srl.

For handling srl+shl->bitmask cases I'd prefer to do a follow-up patch since I need a bit more time to look at it.

Yes, absolutely. Currently we don't produce that pattern yet in all the cases, so it is of less importance!

lebedev.ri added inline comments.Jun 17 2018, 11:57 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4831–4841 ↗(On Diff #151651)

This should be:

//   (or (shrl (mul v c0) c1)      (mul v c2)) ->
//   (or (shrl (mul v c0) c1) (shl (mul v c3) c4))
//
//   (or       (udiv v c0)     (shl (udiv v c1) c2)) ->
//   (or (shrl (udiv v c3) c4) (shl (udiv v c1) c2))
//
//   (or (shrl (shl v c0) c1)      (shl v c2)) ->
//   (or (shrl (shl v c0) c1) (shl (shl v c3) c4))
//
//   (or       (shrl v c0)     (shl (shrl v c1) c2)) ->
//   (or (shrl (shrl v c1) c3) (shl (shrl v c1) c2))
//
// NEW:
//   (or (shrl (shl v c0) c1)      (mul v c2)) ->
//   (or (shrl (shl v c0) c1) (shl (mul v c3) c4))
//
//   (or       (shrl v c0)     (shl (div v c1) c2)) ->
//   (or (shrl (shrl v c1) c3) (shl (div v c1) c2))

We may not always succeed in converting mul/div into a shift,

4842–4843 ↗(On Diff #151651)

You want to add a comment that it will be called for both the LHS and RHS,
so it does not have to worry about looking at the other side here.

sameconrad added inline comments.Jun 18 2018, 7:00 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4831–4841 ↗(On Diff #151651)

Could you clarify a bit? The four cases shown in the original code comment were meant to be the valid expansions the function may produce, whereas the added cases don't seem to be valid expansions that could be used to form rotates. For example in your comment the first case was changed to show the LHS of the original shift as (mul v c0) but the LHS of the new shift as (mul v c3) which seems to imply c0 != c3. If the function encountered that it would return an empty SDValue because a rotate can't be formed if the LHS of the existing shift and the leftover LHS of the extracted shift don't match.

(or (shrl (shl v c0) c1) (mul v c2)) ->
// (or (shrl (shl v c0) c1) (shl (mul v c3) c4))

Would InstCombine would generate a case like this that would still provide a valid expansion after extracting the shift (ie where the resulting (mul v c3) is equivalent to (shl v c0))?
In its current state this function should see that the existing shift's LHS (shl v c0) has an opcode that doesn't match the extract op (mul v c3) and just return the SDValue(), because it expects that if InstCombine had merged a shl with another shl the result should have been a greater shl rather than a mul (and thus just assumes that after extracting we couldn't have a leftover op that matches (shl v c0)). I had assumed that if InstCombine merges a mul by a power of 2 with a shift it would produce a greater shift rather than a mul by a greater power of 2 but I would have to test that, if it doesn't and produces a mul then I could probably add some logic to check if the leftover mul can be converted to a shift after extraction to match the existing shift's LHS.

lebedev.ri added inline comments.Jun 18 2018, 11:55 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4831–4841 ↗(On Diff #151651)

If the function encountered that it would return an empty SDValue because a rotate can't be formed if the LHS of the existing shift and the leftover LHS of the extracted shift don't match.

So that is the requirement on the end result? That explains it. Then i was wrong.
Can you please state that in the comment, somehow, in addition to the Attempts to expand:.

Would InstCombine would generate a case like this that would still provide a valid expansion after extracting the shift (ie where the resulting (mul v c3) is equivalent to (shl v c0))?

Now that i actually think about, no, i think it's not possible.

I had assumed that if InstCombine merges a mul by a power of 2 with a shift it would produce a greater shift

Yes, pretty sure that is what will happen.

Try to fix up clarity of comments

Friendly ping. Any more issues with this?

I can add some nits, but i'm not confident reviewing this as a whole..

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4901 ↗(On Diff #149902)

+1 this should be done right ExtractFrom is modified.

4863–4876 ↗(On Diff #152011)

This does basically the same (but opposite) thing twice.
I would personally try to do something like

auto parse = [&IsMulOrDiv, &Opcode, OppShift, ExtractFrom](unsigned Op0, unsigned Op1, unsigned Op2) -> bool {
  if (OppShift.getOpcode() != Op0)
    return false;
  IsMulOrDiv = ExtractFrom.getOpcode() == Op1;
  if (IsMulOrDiv || ExtractFrom.getOpcode() == Op2)
    Opcode = Op2;
  return true;
};
if(!(parse(ISD::SRL, ISD::MUL,  ISD::SHL) ||
     parse(ISD::SHL, ISD::UDIV, ISD::SRL)) ||
   Opcode == ISD::DELETED_NODE)
  return SDValue();

Even if it's the same LOC, it significantly reduced duplication,
allowing to focus on the actual details..

4895–4897 ↗(On Diff #152011)

What does getAPIntValue() check? We don't want shifts/mul/div by 0?

RKSimon added inline comments.Jun 27 2018, 6:27 AM
test/CodeGen/X86/rotate-extract.ll
10 ↗(On Diff #152011)

Please can you add tests for a mixture of i8, i16, i32 and i64 integers - no need for duplication of all these tests just have decent range.

Ideally we'd include some vector tests as well, possibly in a separate test file.

sameconrad updated this revision to Diff 153659.Jul 1 2018, 6:39 PM

Address code style nits
Update scalar tests to use a variety of types/bitwidths
Add X86 vector tests

sameconrad updated this revision to Diff 153660.Jul 1 2018, 6:40 PM

Forgot to add vector test file

Some more nits.
Overall, this seems ok, but i don't feel confident reviewing this as a whole :/

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4901 ↗(On Diff #149902)

Not done.

4895–4897 ↗(On Diff #152011)

I didn't receive a reply here

4964 ↗(On Diff #153660)

Nit: unneeded {}

5037–5039 ↗(On Diff #153660)

Can't you check OppShiftLHS.getOperand(0) != ExtractFrom.getOperand(0) here?

5080 ↗(On Diff #153660)

bitwidth

5292–5294 ↗(On Diff #153660)

Do you need those extra variables to to track this?
Maybe you can s/NeedRotLHS/!LHSShift/, and drop the variables?
Otherwise i fear later on they may start meaning different things.

test/CodeGen/X86/rotate-extract.ll
83 ↗(On Diff #153660)

These are all rol's, and in aarch64 they are all ror's, even though the tests appear identical, interesting.
Any way to produce both ror and rol instructions for both arches?

efriedma added inline comments.Jul 6 2018, 12:49 PM
test/CodeGen/X86/rotate-extract.ll
83 ↗(On Diff #153660)

AArch64 doesn't have a rol instruction.

sameconrad updated this revision to Diff 154474.Jul 6 2018, 5:53 PM

Address comments

sameconrad marked 15 inline comments as done.Jul 6 2018, 6:10 PM
sameconrad added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4842–4843 ↗(On Diff #151651)

This is mentioned in the comments I added in MatchRotate

4901 ↗(On Diff #149902)

I moved the opcode check to the beginning of the function, I believe that was the original request here? The review comments get misaligned after updating the code

4895–4897 ↗(On Diff #152011)

Yes, just checks that is not 0. A rotate by zero would just be a noop, so it didn't seem to make sense to try and extract anything in that case.

5037–5039 ↗(On Diff #153660)

I moved the opcode check to right after stripConstantMask like you mentioned in your other comment, so this operand(0) check comes right after that now. I think we should verify the ExtractFrom opcode is a shift/mul/udiv before checking operands, otherwise we don't know what the operand count of the instruction is.

5292–5294 ↗(On Diff #153660)

Thanks for pointing that out. Those bools were completely unneeded.

test/CodeGen/X86/rotate-extract.ll
83 ↗(On Diff #153660)

The ISD selection in MatchRotate just looks like HasROTL ? ISD::ROTL : ISD::ROTR, so it looks will just always choose a rotl if available and fall back to a rotr like it does on AArch64. Since X86 has rotl it will always select that, I don't see any way to vary it within the same ISA.

It seems everyone else is busy with other differentials, and don't have time to spare to review this :/

Have you run this on a llvm test suite?

I don't have further comments, and i could accept this, if testsuite passes,
but it would be more comfortable if someone else could review this, too.

sameconrad marked 2 inline comments as done.Jul 18 2018, 6:16 PM

It seems everyone else is busy with other differentials, and don't have time to spare to review this :/

Have you run this on a llvm test suite?

I don't have further comments, and i could accept this, if testsuite passes,
but it would be more comfortable if someone else could review this, too.

Sorry about the late response, missed this comment before. I just ran the test-suite and it looks like all 2555 passed without issue.

lebedev.ri accepted this revision.Jul 18 2018, 11:15 PM

It seems everyone else is busy with other differentials, and don't have time to spare to review this :/

Have you run this on a llvm test suite?

I don't have further comments, and i could accept this, if testsuite passes,
but it would be more comfortable if someone else could review this, too.

Sorry about the late response, missed this comment before. I just ran the test-suite and it looks like all 2555 passed without issue.

Ok then, LG.
Please wait a bit (until monday?) just in case others may want to comment, and then feel free to commit.

This revision is now accepted and ready to land.Jul 18 2018, 11:15 PM

I've added your tests to trunk with current codegen so you need to rebase, which will show the effects of your patch.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5049 ↗(On Diff #154474)

(style) full stops at the end of comments.

5063 ↗(On Diff #154474)

It'd be great if this could support non-uniform constant vectors but a TODO comment is fine for now.

Rebase after tests were committed
Add full stops to comments
Add todo for handling non-splat vectors

sameconrad marked 2 inline comments as done.Jul 19 2018, 5:31 PM

Updated after rebasing
If this looks good someone else will have to commit since I don't have commit access
Thanks!

Last-minute thought - should this be limited to some specific value types?
On X86, the RCL/RCR/ROL/ROR only exist for 8/16/32/64 bit widths.

RKSimon added inline comments.Jul 20 2018, 12:37 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5371 ↗(On Diff #156406)

@lebedev.ri This should ensure legality of the ROT

Updated after rebasing
If this looks good someone else will have to commit since I don't have commit access
Thanks!

Two other landed differentials over the last year, might as well ask for the commit rights and land it yourself :)

@sameconrad Are you getting commit access (and in time for the Aug 1 branch)?

@sameconrad Are you getting commit access (and in time for the Aug 1 branch)?

I requested it but haven't heard anything back yet, so I'm not sure if I'll be able to commit by August 1st.

xbolva00 edited the summary of this revision. (Show Details)Jul 30 2018, 9:47 AM
This revision was automatically updated to reflect the committed changes.