Page MenuHomePhabricator

[InstCombine] Compacting or instructions whose operands are shift instructions
AbandonedPublic

Authored by opaparo on Sep 19 2017, 7:12 AM.

Details

Summary

Adding two kinds of transformations:

  1. Change order of or instructions to group together identical opcodes and constants:
    • (or (or (op1 A C1)(op2 B C2)) (or (op1 D C1)(op2 E C2))) --> (or (or (op1 A C1)(op1 D C1)) (or (op2 B C2)(op2 E C2)))
  2. Or of shifts and ands:
    • ((V<<C3)&C1) | ((V<<C4)&C2) --> ((V&C5)<<C3) | ((V&C5)<<C4), if C5 = C1>>C3 == C2>>C4, for both logical shifts
    • ((V&C5)<<C3) | ((V<<C4)&C2) --> ((V&C5)<<C3) | ((V&C5)<<C4)
    • ((V<<C3)&C1) | ((V&C5)<<C4) --> ((V&C5)<<C3) | ((V&C5)<<C4)

For the first transformation note that although op1 and op2 are not limited to shifts, cases where they are shifts will greatly benefit from it.

Diff Detail

Event Timeline

opaparo created this revision.Sep 19 2017, 7:12 AM
craig.topper added inline comments.Sep 27 2017, 9:53 AM
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2237

Shift of Ands are canonicalized to And of Shifts unless the 'and' has an additional use. I don't think your test cases cover that.

2255

I don't think there's any reason to make these lambda's "static"

2305

Why the replaceAllUsesWith call? We don't normally do this in InstCombine. Are you trying to replace other uses than this OR? Do you have test cases for that?

spatel edited edge metadata.

I haven't looked much at the reassociation pass, but this seems like something that would be simpler to canonicalize there. Did you look at adding a rule there?

Eg, in the "or_or_shift" test, you have a tree of 'or' ops. If you reassociate those so that similar shifts are paired together, instcombine would manage to fold that using existing patterns (although maybe not optimally yet):

define i32 @or_or_shift(i32 %x) local_unnamed_addr #0  {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = shl nuw nsw i32 %2, 6
  %6 = lshr exact i32 %2, 1
  %7 = or i32 %3, %5    <--- shl with shl
  %8 = or i32 %4, %6    <--- lshr with lshr
  %9 = or i32 %7, %8
  ret i32 %9
}

$ ./opt -instcombine reass.ll -S

define i32 @or_or_shift(i32 %x) local_unnamed_addr {
  %1 = shl i32 %x, 6
  %2 = and i32 %1, 384
  %3 = lshr i32 %x, 1
  %4 = and i32 %3, 3
  %5 = or i32 %2, %4
  ret i32 %5
}
m_zuckerman added inline comments.Oct 10 2017, 1:25 PM
test/Transforms/InstCombine/or-xor.ll
419

(((x ^ C1) | (y ^ C2)) | ((x | y) & C2)) - > (((x ^ C1) | (y ^ C1)) | ((x | y) & C2))

opaparo updated this revision to Diff 120051.Oct 24 2017, 5:06 AM
  1. Removed the lambda MatchShiftOfAnd as Shift of And is canonicalized to And of Shift, so the lambda MatchAndOfShift is sufficient.
  2. The lambda MatchAndOfShift is now not static.
  3. Removed unnecessary calls to replaceAllUsesWith.
  4. Fixed test documentation.
opaparo marked 4 inline comments as done.Oct 24 2017, 5:10 AM
opaparo added inline comments.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2237

The lambda MatchShiftOfAnd was indeed unnecessary. I've removed it.

2305

The calls to replaceAllUsesWith were indeed unnecessary. I removed them.

opaparo marked 2 inline comments as done.Oct 24 2017, 5:11 AM

I haven't looked much at the reassociation pass, but this seems like something that would be simpler to canonicalize there. Did you look at adding a rule there?

Eg, in the "or_or_shift" test, you have a tree of 'or' ops. If you reassociate those so that similar shifts are paired together, instcombine would manage to fold that using existing patterns (although maybe not optimally yet):

define i32 @or_or_shift(i32 %x) local_unnamed_addr #0  {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = shl nuw nsw i32 %2, 6
  %6 = lshr exact i32 %2, 1
  %7 = or i32 %3, %5    <--- shl with shl
  %8 = or i32 %4, %6    <--- lshr with lshr
  %9 = or i32 %7, %8
  ret i32 %9
}

$ ./opt -instcombine reass.ll -S

define i32 @or_or_shift(i32 %x) local_unnamed_addr {
  %1 = shl i32 %x, 6
  %2 = and i32 %1, 384
  %3 = lshr i32 %x, 1
  %4 = and i32 %3, 3
  %5 = or i32 %2, %4
  ret i32 %5
}

I've looked into the reassociation pass a bit.
The kind of reassociation I'm interested in is somewhat different than the one implemented in this pass:
The pass takes a tree of a reassociable operator, and transforms it into a "linear tree", e.g. for addition a_0 + (a_1 + (a_2 + ... (a_n-1 + a_n))...). The tree is built such that the latter operands have lower "ranks". The ranking is such that, for example, constants have rank of zero which means they will be at the end of the tree which will cause them to eventually fold.
The kind of reassociation I'm looking for is different and is of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3).
Even when I changed the pass's code such that same-opcode-instructions-tree-operands will be grouped together, I got for my or_or_shift, which is:

define i32 @or_or_shift(i32 %x) local_unnamed_addr #0  {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = or i32 %3, %4
  %6 = shl nuw nsw i32 %2, 6
  %7 = lshr exact i32 %2, 1
  %8 = or i32 %6, %7
  %9 = or i32 %5, %8
  ret i32 %9
}

This output:

define i32 @or_or_shift(i32 %x) local_unnamed_addr {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = shl nuw nsw i32 %2, 6
  %6 = lshr exact i32 %2, 1
  %7 = or i32 %6, %4
  %8 = or i32 %7, %3
  %9 = or i32 %8, %5
  ret i32 %9
}

%7, being the leaf or in the tree, actually does InstCombine nicely. The rest, however, do not.
As I see it, if I wish to embed this change in the reassociation pass, I have two options:

  1. Change the reassociation pass code such that the tree's topology will fit my needs.
  2. Add more InstCombine code to handle the newly formed structure (Something like (or (or A (op B C1)) (op D C1)) --> (or A (or (op B C1) (op D C1))), which is still a kind of reassociation).

I strongly prefer not to change the reassociation pass's tree topology as I have a feeling it is greatly relied on. And if I add a reassociation to InstCombine, why not add my original one instead of adding that one in addition to the changes in the reassociation pass?

Please let me know your thoughts on this matter.

I haven't looked much at the reassociation pass, but this seems like something that would be simpler to canonicalize there. Did you look at adding a rule there?

Eg, in the "or_or_shift" test, you have a tree of 'or' ops. If you reassociate those so that similar shifts are paired together, instcombine would manage to fold that using existing patterns (although maybe not optimally yet):

define i32 @or_or_shift(i32 %x) local_unnamed_addr #0  {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = shl nuw nsw i32 %2, 6
  %6 = lshr exact i32 %2, 1
  %7 = or i32 %3, %5    <--- shl with shl
  %8 = or i32 %4, %6    <--- lshr with lshr
  %9 = or i32 %7, %8
  ret i32 %9
}

$ ./opt -instcombine reass.ll -S

define i32 @or_or_shift(i32 %x) local_unnamed_addr {
  %1 = shl i32 %x, 6
  %2 = and i32 %1, 384
  %3 = lshr i32 %x, 1
  %4 = and i32 %3, 3
  %5 = or i32 %2, %4
  ret i32 %5
}

I've looked into the reassociation pass a bit.
The kind of reassociation I'm interested in is somewhat different than the one implemented in this pass:
The pass takes a tree of a reassociable operator, and transforms it into a "linear tree", e.g. for addition a_0 + (a_1 + (a_2 + ... (a_n-1 + a_n))...). The tree is built such that the latter operands have lower "ranks". The ranking is such that, for example, constants have rank of zero which means they will be at the end of the tree which will cause them to eventually fold.
The kind of reassociation I'm looking for is different and is of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3).
Even when I changed the pass's code such that same-opcode-instructions-tree-operands will be grouped together, I got for my or_or_shift, which is:

define i32 @or_or_shift(i32 %x) local_unnamed_addr #0  {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = or i32 %3, %4
  %6 = shl nuw nsw i32 %2, 6
  %7 = lshr exact i32 %2, 1
  %8 = or i32 %6, %7
  %9 = or i32 %5, %8
  ret i32 %9
}

This output:

define i32 @or_or_shift(i32 %x) local_unnamed_addr {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = shl nuw nsw i32 %2, 6
  %6 = lshr exact i32 %2, 1
  %7 = or i32 %6, %4
  %8 = or i32 %7, %3
  %9 = or i32 %8, %5
  ret i32 %9
}

%7, being the leaf or in the tree, actually does InstCombine nicely. The rest, however, do not.
As I see it, if I wish to embed this change in the reassociation pass, I have two options:

  1. Change the reassociation pass code such that the tree's topology will fit my needs.
  2. Add more InstCombine code to handle the newly formed structure (Something like (or (or A (op B C1)) (op D C1)) --> (or A (or (op B C1) (op D C1))), which is still a kind of reassociation).

I strongly prefer not to change the reassociation pass's tree topology as I have a feeling it is greatly relied on. And if I add a reassociation to InstCombine, why not add my original one instead of adding that one in addition to the changes in the reassociation pass?

Please let me know your thoughts on this matter.

Thanks for looking at and explaining Reassociate's current behavior. I don't know what would be best, so cc'ing some others that might have a better idea.

davide requested changes to this revision.Oct 26 2017, 10:18 AM
davide added inline comments.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2291

IMHO, this got a little too far.
FWIW, matchers should look just a few instructions at the time and not whole trees.
Is there a way to split this up or do differently?
I really don't want reassociate to become the next instcombine.

This revision now requires changes to proceed.Oct 26 2017, 10:18 AM
opaparo abandoned this revision.Dec 14 2017, 6:06 AM

These changes will be handled by D41233, D39421 and perhaps a few more future changes.