This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Should postpone zero check folding if the compare argument is a call
AbandonedPublic

Authored by alex-t on Mar 29 2023, 6:23 AM.

Download Raw Diff

Details

Reviewers

nikic
clubby789

Summary

https://reviews.llvm.org/D140798 introduces a regression in AMDGPU backend.
In case zero check is folded to llvm.usub.sat before inline, it covers opportunities for other inst-combines that may have higher precedence.
As a result, we get suboptimal code.

Diff Detail

Event Timeline

alex-t created this revision.Mar 29 2023, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 6:23 AM

Herald added subscribers: kosarev, StephenFan, hiraditya, tpr. · View Herald Transcript

alex-t requested review of this revision.Mar 29 2023, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 6:23 AM

Example:

define i32 @zot(i32 noundef %arg) {
bb:
  %inst = icmp eq i32 %arg, 0
  %inst1 = call i32 @llvm.cttz.i32(i32 %arg, i1 true)
  %inst2 = select i1 %inst, i32 -1, i32 %inst1
  %inst3 = add nsw i32 %inst2, 1
  ret i32 %inst3
}

define i32 @snork(i32 %arg, i32 noundef %arg1) {
bb:
  %inst = xor i32 %arg1, -1
  %inst2 = call i32 @zot(i32 noundef %inst)
  %inst3 = icmp ne i32 %inst2, 0
  %inst4 = sub i32 %inst2, 1
  %inst5 = select i1 %inst3, i32 %inst4, i32 0
  ret i32 %inst5
}

opt -S -passes=instcombine test.ll

define i32 @zot(i32 noundef %arg) {
bb:
  %inst = icmp eq i32 %arg, 0
  %inst1 = call i32 @llvm.cttz.i32(i32 %arg, i1 true), !range !0
  %inst1.op = add nuw nsw i32 %inst1, 1
  %inst3 = select i1 %inst, i32 0, i32 %inst1.op
  ret i32 %inst3
}

define i32 @snork(i32 %arg, i32 noundef %arg1) {
bb:
  %inst = xor i32 %arg1, -1
  %inst2 = call i32 @zot(i32 noundef %inst)
  %inst5 = call i32 @llvm.usub.sat.i32(i32 %inst2, i32 1)
  ret i32 %inst5
}

"opt -S -passes=module-inline" gets

define i32 @snork(i32 %arg, i32 noundef %arg1) {
bb:
  %inst = xor i32 %arg1, -1
  %inst.i = icmp eq i32 %inst, 0
  %inst1.i = call i32 @llvm.cttz.i32(i32 %inst, i1 true), !range !0
  %inst1.op.i = add nuw nsw i32 %inst1.i, 1
  %inst3.i = select i1 %inst.i, i32 0, i32 %inst1.op.i
  %inst5 = call i32 @llvm.usub.sat.i32(i32 %inst3.i, i32 1)
  ret i32 %inst5
}

the next inst-combine cannot do much

define i32 @snork(i32 %arg, i32 noundef %arg1) {
bb:
  %inst = xor i32 %arg1, -1
  %inst.i = icmp eq i32 %arg1, -1
  %inst1.i = call i32 @llvm.cttz.i32(i32 %inst, i1 true), !range !0
  %inst1.op.i = add nuw nsw i32 %inst1.i, 1
  %inst3.i = select i1 %inst.i, i32 0, i32 %inst1.op.i
  %inst5 = call i32 @llvm.usub.sat.i32(i32 %inst3.i, i32 1)
  ret i32 %inst5
}

The result before the change looks better:

define i32 @snork(i32 %arg, i32 noundef %arg1) {
bb:
  %inst = xor i32 %arg1, -1
  %inst.i = icmp eq i32 %arg1, -1
  %inst1.i = call i32 @llvm.cttz.i32(i32 %inst, i1 true), !range !0
  %inst5 = select i1 %inst.i, i32 0, i32 %inst1.i
  ret i32 %inst5
}

Furthermore, we could see yet a worse difference if we track it down to the next inline pass which inlines ctlz stuff...

My idea is just to postpone the zero check folding before the compare argument, which is a call, is inlined.
If it is, the next inst-combine invocation will have a chance to do a better job, if not - it will do the same as before.

alex-t mentioned this in D140798: [InstCombine] Fold zero check followed by decrement to usub.sat.Mar 29 2023, 6:29 AM

Harbormaster completed remote builds in B222494: Diff 509332.Mar 29 2023, 10:33 AM

As far as I can tell, this is just a matter of missing transforms for usub_sat. I've applied https://github.com/llvm/llvm-project/commit/6261adfa51f2d93c8258e7b2d9811d607f3e50a5 and https://github.com/llvm/llvm-project/commit/3f53a58597bc84418f8b83934ef2f5c5d615626b to fix the issue from your example. Let me know if you see further issues.

This revision now requires changes to proceed.Mar 31 2023, 7:59 AM

@nikic, thanks a lot, I have cherry-picked that commits, which helps.
I suspected we were "lucky" enough to hit the middle of an actively developed feature.

alex-t abandoned this revision.Apr 3 2023, 9:56 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

3 lines

Diff 509332

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines	static Value canonicalizeSaturatedSubtract(const ICmpInst ICI,
}		}

if (!match(FalseVal, m_Zero()))		if (!match(FalseVal, m_Zero()))
return nullptr;		return nullptr;

// ugt 0 is canonicalized to ne 0 and requires special handling		// ugt 0 is canonicalized to ne 0 and requires special handling
// (a != 0) ? a + -1 : 0 -> usub.sat(a, 1)		// (a != 0) ? a + -1 : 0 -> usub.sat(a, 1)
if (Pred == ICmpInst::ICMP_NE) {		if (Pred == ICmpInst::ICMP_NE) {
if (match(B, m_Zero()) && match(TrueVal, m_Add(m_Specific(A), m_AllOnes())))		if (!isa<CallInst>(A) && match(B, m_Zero()) &&
		match(TrueVal, m_Add(m_Specific(A), m_AllOnes())))
return Builder.CreateBinaryIntrinsic(Intrinsic::usub_sat, A,		return Builder.CreateBinaryIntrinsic(Intrinsic::usub_sat, A,
ConstantInt::get(A->getType(), 1));		ConstantInt::get(A->getType(), 1));
return nullptr;		return nullptr;
}		}

if (!ICmpInst::isUnsigned(Pred))		if (!ICmpInst::isUnsigned(Pred))
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 2,880 Lines • Show Last 20 Lines