This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Restrict "(X & 0xFF00) + xx00 --> (X + xx00) & 0xFF00"
AbandonedPublic

Authored by piotr on Jul 15 2022, 3:38 AM.

Details

Summary

Add m_OneUse check on X, in addition to the existing one for the AND.

In case the input expression forms part of address computations
(base_addr with offset), the transformation obstructs the original
pattern (base_addr+offset) that could naturally be handled in its
original form by the backends.

Diff Detail

Event Timeline

piotr created this revision.Jul 15 2022, 3:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 3:38 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
piotr requested review of this revision.Jul 15 2022, 3:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 3:38 AM
piotr added a subscriber: foad.Jul 15 2022, 3:58 AM

The existing pattern can cause problems for backends, because it potentially obstructs the (base_addr+offset) pattern, which is not always possible to fix at the selection level (if the pattern exists across basic blocks for instance).

While the additional m_OneUse check fixes an important case for AMDGPU backend and should be fairly uncontentious, given the existing check, I think it is worth considering whether the transformation is useful at all.

I did a bit of an archaeology, and before https://reviews.llvm.org/rG080e6bc2050e28ae198d82f0e934ca7b4548c3b7 there was a comment suggesting that the pattern was a part of another fold. Apart from the two tests that got changed in that commit, only one more pattern fails if I remove the transformation entirely - perhaps that needs to be fixed in some other way (?).

I did not attempt to remove it, as I do not have the full picture of the impact of such a change, but I can prepare a patch and help in testing (e.g. on AMDGPU workloads) if this would be a desirable change to make.

Lastly, here's an example (courtesy of @foad) that shows worse code in another backend (X86) if run with the instcombine:

define i8 @f(i8 *%ptr, i64 %idx) {
  %idx1 = and i64 %idx, -4
  %idx2 = add i64 %idx1, 16
  %gep = getelementptr i8, i8* %ptr, i64 %idx2
  %val = load i8, i8* %gep
  ret i8 %val
}
RKSimon added a comment.EditedJul 15 2022, 4:21 AM

(pedantic) The title says 'relax' but if anything this is an additional limitation

llvm/test/Transforms/InstCombine/add.ll
773

pre-commit this to trunk to show current codegen and rebase the patch to show the diff

piotr retitled this revision from [InstCombine] Relax "(X & 0xFF00) + xx00 --> (X + xx00) & 0xFF00" to [InstCombine] Restrict "(X & 0xFF00) + xx00 --> (X + xx00) & 0xFF00".Jul 15 2022, 4:25 AM

Thanks - will pre-commit the test.

piotr updated this revision to Diff 444945.Jul 15 2022, 5:06 AM

Rebased on top of 2d9332646a9c.

foad added a subscriber: lattner.

My preference would be to completely remove this combine if possible (@piotr already knows this since we discussed it offline).

It was originally implemented by @lattner in https://github.com/llvm/llvm-project/commit/bff91d9a2e556b1aadf274c563cec80a483725a4 with the comment "This comes up when doing adds to bitfield elements" and a reference to this test in test/Transforms/InstCombine/and.ll:

define i8 @test27(i8 %A) {
  %B = and i8 %A, 4
  %C = sub i8 %B, 16
  %D = and i8 %C, -16
  %E = add i8 %D, 16
  ret i8 %E
}

So it looks like the intention is to be able to cancel out the ADD and the SUB in ((B - 16) & -16) + 16 by pushing one of them into the AND. But if this important then we could also achieve it by pulling one of them out of the AND, i.e. doing the opposite combine with a patch like this:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index a8f2cd79830a..998aae20c9f7 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -1843,6 +1843,13 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
         Value *NewAnd = Builder.CreateAnd(X, Op1);
         return BinaryOperator::CreateXor(NewAnd, Op1);
       }
+
+      // (X + 16) & -4 --> (X & -4) + 16
+      if (Op0->hasOneUse() &&
+          C->isNegative() && C->isShiftedMask() && *AddC == (*AddC & *C)) {
+        Value *NewAnd = Builder.CreateAnd(X, Op1);
+        return BinaryOperator::CreateAdd(NewAnd, ConstantInt::get(Ty, *AddC));
+      }
     }
 
     // ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if C2 fits in the

The advantage of doing it this way round is that it won't interfere with "base + constant offset" patterns that might be matched as addressing modes.

What's a good way of evaluating the effect of patches like this on codegen? The CodeGen lit tests aren't affected because they don't run InstCombine.

lattner added a comment.EditedJul 15 2022, 3:02 PM

I haven't worked at this level for many years, but what I used to do is to build the entire test-suite to tree full of .s files with both patches, then diff them and sniff test which looks better.

nikic added a comment.Jul 18 2022, 8:02 AM

I don't think the patch as written makes sense (we should never have one-use limitations on the leafs of a pattern, because the leafs are not affected by the fold). Changing canonicalization direction seems fine though, I don't think we have a strong preference one way or another. We should check whether there are any sibling folds (e.g. with or) that would also have to change direction, so we consistently do logic before arithmetic.

piotr added a comment.Jul 18 2022, 8:46 AM

I agree the proper fix would be to change the direction of canonicalization. I will look into it.

piotr added a comment.Jul 19 2022, 5:57 AM

Discarding this patch for D130080.

piotr abandoned this revision.Jul 19 2022, 5:57 AM