This is an archive of the discontinued LLVM Phabricator instance.

[X86] MatchVectorAllZeroTest - add support for icmp(bitcast(icmp_ne(X,Y)),0) vector reduction patterns
ClosedPublic

Authored by RKSimon on Mar 30 2023, 10:06 AM.

Details

Summary

Many allof/anyof/noneof reduction patterns are canonicalized by bitcasting a vXi1 vector comparison result to iN and compared against 0/-1.

This patch adds support for recognizing a icmp_ne vector comparison against 0, which matches an 'whole vectors are equal' comparison pattern.

There are a few more steps to follow in future patches - we need to add support to MatchVectorAllZeroTest for comparing against -1 (in some cases), and this initial refactoring of LowerVectorAllZero to LowerVectorAllEqual needs to be extended so we can fully merge with the similar combineVectorSizedSetCCEquality code (which deals with scalar integer memcmp patterns).

Another step towards Issue #53419

Diff Detail

Event Timeline

RKSimon created this revision.Mar 30 2023, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 10:06 AM
RKSimon requested review of this revision.Mar 30 2023, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 10:06 AM
goldstein.w.n added inline comments.Mar 30 2023, 11:19 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
24429

Is there an issue creating a node that may potentially unused?

24503

Maybe just PeekThroughBitCast(Op).getOpcode() == ISD::SETCC?

24509

Shouldnt there be a Src.getOperand(1).isZeroConstant()?

llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
1031

why doesn't this micro-fuse anymore? Likewise below.

RKSimon added inline comments.Mar 30 2023, 12:53 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
24429

There shouldn't be - it's just a constant node so has no dependencies to the rest of the DAG, and will get cleaned up later after the combine call. Although there's the remote chance that we reuse an existing node and spoil a hasOneUse check before cleanup - but on constants that's very unlikely.

24503

We'd need to repeat the PeekThroughBitCast call - is that clearer or not? Very minor tbh, I'm happy to do either.

24509

No, the inner comparison doesn't need to be a match against zero - its just the outer comparison that needs to match zero to form the reduction pattern - that's why its a LowerVectorAllEqual call and not LowerVectorAllZero like the existing patterns.

goldstein.w.n added inline comments.Mar 30 2023, 1:01 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
24503

Cant you do:

SDValue Src = PeekThroughBitCast(Op);
If(Mask.isAllOnes() && Src.getOpcode() == ISD::SETCC ....

?

But generally think thats preferable just because bitcast(bitcast(....)) or no bitcast / etc...

RKSimon updated this revision to Diff 509949.Mar 31 2023, 2:38 AM

Use peekThroughBitcasts and add assert to make it clear we're matching a reduction by bitcasting a vector comparison result to a scalar value for comparison with zero

pengfei accepted this revision.Mar 31 2023, 6:31 AM

Test case changes look good to me.

llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
1031

I found broadcast change in several places. It increases code size but reduces data size. I'm not sure which one is better.

This revision is now accepted and ready to land.Mar 31 2023, 6:31 AM
RKSimon added inline comments.Mar 31 2023, 7:17 AM
llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
1031

I'd dearly like to move most of the constant build vector data lowering out of DAG as it makes far too many premature judgements.

We could just lower it to a regular constant pool if we have to (keeping it as BUILD_VECTOR nodes would be even better like we do zero/allones) - and leave a later pass to judge whether we should make it a load/broadcast/extend/etc. depending on uses, folding, register pressure etc. We could even materialize some easy uniform constants (e.g. shifted masks + gfni) in code.

goldstein.w.n added inline comments.Mar 31 2023, 9:04 AM
llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
1031

Agreed!

alexfh added a subscriber: alexfh.Apr 5 2023, 9:22 AM

After this commit we started seeing clang crashes with "Do not know how to split this operator's operand" (likely coming from DAGTypeLegalizer::SplitVectorOperand,
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2831). I'm trying to get a reduced test case. This can take non-trivial time, since we're seeing this in a ThinLTO build.

alexfh added a comment.Apr 5 2023, 9:59 AM

An assertions-enabled clang build generates this:
assert.h assertion failed at llvm/include/llvm/CodeGen/ValueTypes.h:434 in EVT llvm::EVT::getHalfNumVectorElementsVT(LLVMContext &) const: EltCnt.isKnownEven() && "Splitting vector, but not in half!"

Seems related, but I'm not 100% sure it's the same issue.

This is an llvm-reduce'd test case:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i1 @q(<4 x i256> %0) {
entry:
  %1 = icmp ne <4 x i256> %0, zeroinitializer
  %2 = bitcast <4 x i1> %1 to i4
  %3 = icmp eq i4 %2, 0
  br i1 %3, label %l2, label %l1

l1:                        ; preds = %entry
  ret i1 false

l2:                                  ; preds = %entry
  ret i1 false
}
$ clang -c reduced.ll
assert.h assertion failed at llvm/include/llvm/CodeGen/ValueTypes.h:434 in EVT llvm::EVT::getHalfNumVectorElementsVT(LLVMContext &) const: EltCnt.isKnownEven() && "Splitting vector, but not in half!"
*** Check failure stack trace: ***
    @     0x55fb590ae6a4  __assert_fail
    @     0x55fb5633411d  llvm::EVT::getHalfNumVectorElementsVT()
    @     0x55fb57abc892  llvm::SelectionDAG::GetSplitDestVTs()
    @     0x55fb5715222d  LowerVectorAllEqual()
    @     0x55fb570aab19  MatchVectorAllEqualTest()
    @     0x55fb5713d07b  combineSetCC()
    @     0x55fb571141e2  llvm::X86TargetLowering::PerformDAGCombine()
    @     0x55fb579a7e98  (anonymous namespace)::DAGCombiner::combine()
    @     0x55fb579a55ed  llvm::SelectionDAG::Combine()
    @     0x55fb57b2534e  llvm::SelectionDAGISel::CodeGenAndEmitDAG()
    @     0x55fb57b240dc  llvm::SelectionDAGISel::SelectAllBasicBlocks()
    @     0x55fb57b20b0e  llvm::SelectionDAGISel::runOnMachineFunction()
    @     0x55fb572d357e  (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction()
    @     0x55fb577a8e45  llvm::MachineFunctionPass::runOnFunction()
    @     0x55fb58d7585d  llvm::FPPassManager::runOnFunction()
    @     0x55fb58d7d844  llvm::FPPassManager::runOnModule()
    @     0x55fb58d75fec  llvm::legacy::PassManagerImpl::run()
    @     0x55fb53ee8265  clang::EmitBackendOutput()
    @     0x55fb53ee3355  clang::CodeGenAction::ExecuteAction()
    @     0x55fb54b21863  clang::FrontendAction::Execute()
    @     0x55fb54a953cd  clang::CompilerInstance::ExecuteAction()
    @     0x55fb53ac0768  clang::ExecuteCompilerInvocation()
    @     0x55fb53ab4611  cc1_main()
    @     0x55fb53ab0908  ExecuteCC1Tool()
    @     0x55fb53aaf66e  clang_main()
    @     0x55fb53aac9c4  main
    @     0x7f2387986633  __libc_start_main
    @     0x55fb53aac92a  _start

Thanks @alexfh I have an idea what's going on and will deal with it tomorrow.

alexfh added a comment.Apr 5 2023, 1:17 PM

Thanks @alexfh I have an idea what's going on and will deal with it tomorrow.

Thanks! It would be nice to have a fix or revert soon, since we're blocked on this.

bgraur added a subscriber: bgraur.Apr 6 2023, 1:05 AM

Thanks @alexfh I have an idea what's going on and will deal with it tomorrow.

Thanks! It would be nice to have a fix or revert soon, since we're blocked on this.

@RKSimon could you please consider reverting as that should be the fastest and safest solution?
You would unblock us and be able to work on the fix with no time pressure.
Thank you in advance!

Thanks @alexfh I have an idea what's going on and will deal with it tomorrow.

Thanks! It would be nice to have a fix or revert soon, since we're blocked on this.

@RKSimon could you please consider reverting as that should be the fastest and safest solution?
You would unblock us and be able to work on the fix with no time pressure.
Thank you in advance!

I just put a quick fix together with another issue. I'm fine with either land it or revert this.

I should have the fix done shortly - please don't start reverting anything....

alexfh added a comment.Apr 6 2023, 2:27 AM

I should have the fix done shortly - please don't start reverting anything....

Glad to hear that! In case you wonder, there's a C++ test case as well:

$ cat q.ii
struct V {
  unsigned : 28;
  unsigned p_ : 4;
  unsigned : 32;
  unsigned : 32;
  unsigned : 56;
  unsigned q_ : 4;
  unsigned : 2;
  unsigned r_ : 1;
  unsigned : 1;
  unsigned s : 27;
  unsigned t : 1;
  unsigned : 4;
  unsigned : 1;
  unsigned : 75;
  unsigned p() { return p_; }
  unsigned q() { return q_; }
  unsigned r() { return r_; }
};

bool x(V a, V b) {
  return a.p() != b.p() || a.s || a.q() != b.q() || a.r() != b.r() || a.t;
}
$ ./clang  -O2 --target=x86_64-unknown-linux-gnu -fsanitize=address -c q.ii -o bad.o
Stack dump:
0.      Program arguments: ./clang -O2 --target=x86_64-unknown-linux-gnu -fsanitize=address -c q.ii -o bad.o
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q.ii'.
4.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@_Z1x1VS_'
...

And here: https://gcc.godbolt.org/z/P51G8xrxG