This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Support/
-
Support/
2/3
KnownBits.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
sdiv64.ll
-
srem64.ll
-
Transforms/InstCombine/
-
InstCombine/
-
icmp-mul.ll

Differential D114962

[Support] improve known bits analysis for multiply by power-of-2 (1 set bit)
ClosedPublic

Authored by spatel on Dec 2 2021, 8:35 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
RKSimon
craig.topper
rampitec
foad
arsenm
efriedma

Commits

rGe9179a6a029a: [Support] improve known bits analysis for multiply by power-of-2 (1 set bit)

Summary

This can be viewed as recognizing that multiply-by-power-of-2 doesn't have a carry into the top bit of an M-bit * N-bit number.

Enhancing canonicalization of mul -> select might also handle some of these if we were ok with increasing instruction count with casts in some cases.

This doesn't help https://llvm.org/PR49055 , but it's a simpler pattern that we miss.
Note: "-sccp" already gets these examples using a constant range analysis.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Dec 2 2021, 8:35 AM

Herald added subscribers: dexonsmith, hiraditya, mcrosier. · View Herald TranscriptDec 2 2021, 8:35 AM

spatel requested review of this revision.Dec 2 2021, 8:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2021, 8:35 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B137147: Diff 391333.Dec 2 2021, 9:12 AM

efriedma added a subscriber: efriedma.Dec 2 2021, 1:28 PM

efriedma added inline comments.

llvm/lib/Support/KnownBits.cpp
442	You could generalize further in two ways: if one operand is known to be 0 or 1, we can copy all the known zero bits from the other operand to the result, not just the leading zeros. If either operand has at most one bit set, we can use `LeadZ = std::max(LHSLeadZ + RHSLeadZ + 1, BitWidth) - BitWidth;`. Not sure either one is actually useful, though.

spatel added inline comments.Dec 2 2021, 2:46 PM

llvm/lib/Support/KnownBits.cpp
442	Thanks, Eli! If we all bits are known, then we should simplify before we get here? I like the 2nd idea - that might get me closer to the solution for the motivating problem, and if I'm seeing it correctly, it can be implemented with just one extra line of code. Let me come up with some more tests to exercise that...

spatel added inline comments.Dec 2 2021, 2:48 PM

llvm/lib/Support/KnownBits.cpp
442	Disregard the question - I misread the suggestion.

Patch updated:
Generalized to power-of-2 (one possibly set bit) and added tests to exercise.
This turned up a couple of AMDGPU codegen diffs because we use knownbits down there. Those look like improvements to me, but adding some more potential reviewers to confirm.

Herald added subscribers: kerbowa, nhaehnle, wdng, jvesely. · View Herald TranscriptDec 3 2021, 9:33 AM

Harbormaster completed remote builds in B137374: Diff 391659.Dec 3 2021, 9:33 AM

spatel retitled this revision from [Support] improve known bits analysis for multiply with 1-bit op (bool) to [Support] improve known bits analysis for multiply by power-of-2 (1 set bit).Dec 3 2021, 9:45 AM

spatel edited the summary of this revision. (Show Details)

spatel added a reviewer: efriedma.

AMDGPU changes are progressions. Thanks!

spatel mentioned this in rGdccddb268be8: [InstCombine] add tests for icmp with mul op with known bits; NFC.Dec 5 2021, 6:57 AM

LGTM - @efriedma any more comments?

Is there an exhaustive test for this method?

In D114962#3176934, @lebedev.ri wrote:

Is there an exhaustive test for this method?

I found this:
https://github.com/llvm/llvm-project/blob/e9fae0f19eec1fce746101b410d2345f0fbf89b4/llvm/unittests/Support/KnownBitsTest.cpp#L233

And it did catch a bug in an early draft of this patch, so it seems to be working.

In D114962#3176934, @lebedev.ri wrote:

Is there an exhaustive test for this method?

Yes - in KnownBitsTest BinaryExhaustive

LGTM, thank you.

This revision is now accepted and ready to land.Dec 7 2021, 11:26 AM

This revision was landed with ongoing or failed builds.Dec 8 2021, 8:50 AM

Closed by commit rGe9179a6a029a: [Support] improve known bits analysis for multiply by power-of-2 (1 set bit) (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGe9179a6a029a: [Support] improve known bits analysis for multiply by power-of-2 (1 set bit).

Revision Contents

Path

Size

llvm/

lib/

Support/

KnownBits.cpp

9 lines

test/

CodeGen/

AMDGPU/

sdiv64.ll

9 lines

srem64.ll

9 lines

Transforms/

InstCombine/

icmp-mul.ll

56 lines

Diff 392802

llvm/lib/Support/KnownBits.cpp

Show First 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	KnownBits KnownBits::mul(const KnownBits &LHS, const KnownBits &RHS,
bool SelfMultiply) {		bool SelfMultiply) {
unsigned BitWidth = LHS.getBitWidth();		unsigned BitWidth = LHS.getBitWidth();
assert(BitWidth == RHS.getBitWidth() && !LHS.hasConflict() &&		assert(BitWidth == RHS.getBitWidth() && !LHS.hasConflict() &&
!RHS.hasConflict() && "Operand mismatch");		!RHS.hasConflict() && "Operand mismatch");
assert((!SelfMultiply \|\| (LHS.One == RHS.One && LHS.Zero == RHS.Zero)) &&		assert((!SelfMultiply \|\| (LHS.One == RHS.One && LHS.Zero == RHS.Zero)) &&
"Self multiplication knownbits mismatch");		"Self multiplication knownbits mismatch");

// Compute a conservative estimate for high known-0 bits.		// Compute a conservative estimate for high known-0 bits.
		// TODO: This could be generalized to number of sign bits (negative numbers).
unsigned LHSLeadZ = LHS.countMinLeadingZeros();		unsigned LHSLeadZ = LHS.countMinLeadingZeros();
unsigned RHSLeadZ = RHS.countMinLeadingZeros();		unsigned RHSLeadZ = RHS.countMinLeadingZeros();
unsigned LeadZ = std::max(LHSLeadZ + RHSLeadZ, BitWidth) - BitWidth;
		// If either operand is a power-of-2, the multiply is only shifting bits in
		// the other operand (there can't be a carry into the M+N bit of the result).
		// Note: if we know that a value is entirely 0, that should simplify below.
		bool BonusLZ = LHS.countMaxPopulation() == 1 \|\| RHS.countMaxPopulation() == 1;

		unsigned LeadZ = std::max(LHSLeadZ + RHSLeadZ + BonusLZ, BitWidth) - BitWidth;
assert(LeadZ <= BitWidth && "More zeros than bits?");		assert(LeadZ <= BitWidth && "More zeros than bits?");

// The result of the bottom bits of an integer multiply can be		// The result of the bottom bits of an integer multiply can be
// inferred by looking at the bottom bits of both operands and		// inferred by looking at the bottom bits of both operands and
// multiplying them together.		// multiplying them together.
// We can infer at least the minimum number of known trailing bits		// We can infer at least the minimum number of known trailing bits
// of both operands. Depending on number of trailing zeros, we can		// of both operands. Depending on number of trailing zeros, we can
// infer more bits, because (ab) <=> ((a/m) (b/n)) * (m*n) assuming		// infer more bits, because (ab) <=> ((a/m) (b/n)) * (m*n) assuming
// a and b are divisible by m and n respectively.		// a and b are divisible by m and n respectively.
		efriedmaUnsubmitted Not Done Reply Inline Actions You could generalize further in two ways: if one operand is known to be 0 or 1, we can copy all the known zero bits from the other operand to the result, not just the leading zeros. If either operand has at most one bit set, we can use `LeadZ = std::max(LHSLeadZ + RHSLeadZ + 1, BitWidth) - BitWidth;`. Not sure either one is actually useful, though. efriedma: You could generalize further in two ways: 1. if one operand is known to be 0 or 1, we can copy…
		spatelAuthorUnsubmitted Done Reply Inline Actions Thanks, Eli! If we all bits are known, then we should simplify before we get here? I like the 2nd idea - that might get me closer to the solution for the motivating problem, and if I'm seeing it correctly, it can be implemented with just one extra line of code. Let me come up with some more tests to exercise that... spatel: Thanks, Eli! If we all bits are known, then we should simplify before we get here? I like the…
		spatelAuthorUnsubmitted Done Reply Inline Actions Disregard the question - I misread the suggestion. spatel: Disregard the question - I misread the suggestion.
// We then calculate how many of those bits are inferrable and set		// We then calculate how many of those bits are inferrable and set
// the output. For example, the i8 mul:		// the output. For example, the i8 mul:
// a = XXXX1100 (12)		// a = XXXX1100 (12)
// b = XXXX1110 (14)		// b = XXXX1110 (14)
// We know the bottom 3 bits are zero since the first can be divided by		// We know the bottom 3 bits are zero since the first can be divided by
// 4 and the second by 2, thus having ((12/4) * (14/2)) * (2*4).		// 4 and the second by 2, thus having ((12/4) * (14/2)) * (2*4).
// Applying the multiplication to the trimmed arguments gets:		// Applying the multiplication to the trimmed arguments gets:
// XX11 (3)		// XX11 (3)
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sdiv64.ll

	Show First 20 Lines • Show All 1,522 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_xor_b32_e32 v0, v0, v2			; GCN-NEXT: v_xor_b32_e32 v0, v0, v2
	; GCN-NEXT: v_cvt_f32_u32_e32 v3, v0			; GCN-NEXT: v_cvt_f32_u32_e32 v3, v0
	; GCN-NEXT: v_cvt_f32_u32_e32 v4, v1			; GCN-NEXT: v_cvt_f32_u32_e32 v4, v1
	; GCN-NEXT: v_sub_i32_e32 v5, vcc, 0, v0			; GCN-NEXT: v_sub_i32_e32 v5, vcc, 0, v0
	; GCN-NEXT: v_subb_u32_e32 v6, vcc, 0, v1, vcc			; GCN-NEXT: v_subb_u32_e32 v6, vcc, 0, v1, vcc
	; GCN-NEXT: v_mac_f32_e32 v3, 0x4f800000, v4			; GCN-NEXT: v_mac_f32_e32 v3, 0x4f800000, v4
	; GCN-NEXT: v_rcp_f32_e32 v3, v3			; GCN-NEXT: v_rcp_f32_e32 v3, v3
	; GCN-NEXT: v_mov_b32_e32 v12, 0			; GCN-NEXT: v_mov_b32_e32 v12, 0
	; GCN-NEXT: s_mov_b32 s4, 0x8000
	; GCN-NEXT: v_mul_f32_e32 v3, 0x5f7ffffc, v3			; GCN-NEXT: v_mul_f32_e32 v3, 0x5f7ffffc, v3
	; GCN-NEXT: v_mul_f32_e32 v4, 0x2f800000, v3			; GCN-NEXT: v_mul_f32_e32 v4, 0x2f800000, v3
	; GCN-NEXT: v_trunc_f32_e32 v4, v4			; GCN-NEXT: v_trunc_f32_e32 v4, v4
	; GCN-NEXT: v_mac_f32_e32 v3, 0xcf800000, v4			; GCN-NEXT: v_mac_f32_e32 v3, 0xcf800000, v4
	; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3			; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3
	; GCN-NEXT: v_cvt_u32_f32_e32 v4, v4			; GCN-NEXT: v_cvt_u32_f32_e32 v4, v4
	; GCN-NEXT: v_mul_hi_u32 v7, v5, v3			; GCN-NEXT: v_mul_hi_u32 v7, v5, v3
	; GCN-NEXT: v_mul_lo_u32 v8, v5, v4			; GCN-NEXT: v_mul_lo_u32 v8, v5, v4
	Show All 33 Lines
	; GCN-NEXT: v_addc_u32_e32 v10, vcc, 0, v11, vcc			; GCN-NEXT: v_addc_u32_e32 v10, vcc, 0, v11, vcc
	; GCN-NEXT: v_mul_lo_u32 v6, v4, v6			; GCN-NEXT: v_mul_lo_u32 v6, v4, v6
	; GCN-NEXT: v_add_i32_e32 v5, vcc, v9, v5			; GCN-NEXT: v_add_i32_e32 v5, vcc, v9, v5
	; GCN-NEXT: v_addc_u32_e32 v5, vcc, v10, v8, vcc			; GCN-NEXT: v_addc_u32_e32 v5, vcc, v10, v8, vcc
	; GCN-NEXT: v_addc_u32_e32 v7, vcc, v7, v12, vcc			; GCN-NEXT: v_addc_u32_e32 v7, vcc, v7, v12, vcc
	; GCN-NEXT: v_add_i32_e32 v5, vcc, v5, v6			; GCN-NEXT: v_add_i32_e32 v5, vcc, v5, v6
	; GCN-NEXT: v_addc_u32_e32 v6, vcc, 0, v7, vcc			; GCN-NEXT: v_addc_u32_e32 v6, vcc, 0, v7, vcc
	; GCN-NEXT: v_add_i32_e32 v3, vcc, v3, v5			; GCN-NEXT: v_add_i32_e32 v3, vcc, v3, v5
	; GCN-NEXT: v_addc_u32_e32 v4, vcc, v4, v6, vcc			; GCN-NEXT: v_addc_u32_e32 v3, vcc, v4, v6, vcc
	; GCN-NEXT: v_lshrrev_b32_e32 v5, 17, v4
	; GCN-NEXT: v_lshlrev_b32_e32 v4, 15, v4
	; GCN-NEXT: v_lshrrev_b32_e32 v3, 17, v3			; GCN-NEXT: v_lshrrev_b32_e32 v3, 17, v3
	; GCN-NEXT: v_add_i32_e32 v3, vcc, v3, v4
	; GCN-NEXT: v_addc_u32_e32 v3, vcc, 0, v5, vcc
	; GCN-NEXT: v_mul_lo_u32 v4, v1, v3			; GCN-NEXT: v_mul_lo_u32 v4, v1, v3
	; GCN-NEXT: v_mul_hi_u32 v5, v0, v3			; GCN-NEXT: v_mul_hi_u32 v5, v0, v3
	; GCN-NEXT: v_add_i32_e32 v4, vcc, v5, v4			; GCN-NEXT: v_add_i32_e32 v4, vcc, v5, v4
	; GCN-NEXT: v_mul_lo_u32 v5, v0, v3			; GCN-NEXT: v_mul_lo_u32 v5, v0, v3
	; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v4			; GCN-NEXT: v_sub_i32_e32 v6, vcc, 0, v4
	; GCN-NEXT: v_sub_i32_e32 v5, vcc, s4, v5			; GCN-NEXT: v_sub_i32_e32 v5, vcc, 0x8000, v5
	; GCN-NEXT: v_subb_u32_e64 v6, s[4:5], v6, v1, vcc			; GCN-NEXT: v_subb_u32_e64 v6, s[4:5], v6, v1, vcc
	; GCN-NEXT: v_sub_i32_e64 v7, s[4:5], v5, v0			; GCN-NEXT: v_sub_i32_e64 v7, s[4:5], v5, v0
	; GCN-NEXT: v_subbrev_u32_e64 v6, s[4:5], 0, v6, s[4:5]			; GCN-NEXT: v_subbrev_u32_e64 v6, s[4:5], 0, v6, s[4:5]
	; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1			; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v6, v1
	; GCN-NEXT: v_cndmask_b32_e64 v8, 0, -1, s[4:5]			; GCN-NEXT: v_cndmask_b32_e64 v8, 0, -1, s[4:5]
	; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v0			; GCN-NEXT: v_cmp_ge_u32_e64 s[4:5], v7, v0
	; GCN-NEXT: v_cndmask_b32_e64 v7, 0, -1, s[4:5]			; GCN-NEXT: v_cndmask_b32_e64 v7, 0, -1, s[4:5]
	; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], v6, v1			; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], v6, v1
	▲ Show 20 Lines • Show All 459 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/srem64.ll

	Show First 20 Lines • Show All 1,695 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_xor_b32_e32 v0, v0, v2			; GCN-NEXT: v_xor_b32_e32 v0, v0, v2
	; GCN-NEXT: v_cvt_f32_u32_e32 v2, v0			; GCN-NEXT: v_cvt_f32_u32_e32 v2, v0
	; GCN-NEXT: v_cvt_f32_u32_e32 v3, v1			; GCN-NEXT: v_cvt_f32_u32_e32 v3, v1
	; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v0			; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v0
	; GCN-NEXT: v_subb_u32_e32 v5, vcc, 0, v1, vcc			; GCN-NEXT: v_subb_u32_e32 v5, vcc, 0, v1, vcc
	; GCN-NEXT: v_mac_f32_e32 v2, 0x4f800000, v3			; GCN-NEXT: v_mac_f32_e32 v2, 0x4f800000, v3
	; GCN-NEXT: v_rcp_f32_e32 v2, v2			; GCN-NEXT: v_rcp_f32_e32 v2, v2
	; GCN-NEXT: v_mov_b32_e32 v11, 0			; GCN-NEXT: v_mov_b32_e32 v11, 0
	; GCN-NEXT: s_mov_b32 s4, 0x8000
	; GCN-NEXT: v_mul_f32_e32 v2, 0x5f7ffffc, v2			; GCN-NEXT: v_mul_f32_e32 v2, 0x5f7ffffc, v2
	; GCN-NEXT: v_mul_f32_e32 v3, 0x2f800000, v2			; GCN-NEXT: v_mul_f32_e32 v3, 0x2f800000, v2
	; GCN-NEXT: v_trunc_f32_e32 v3, v3			; GCN-NEXT: v_trunc_f32_e32 v3, v3
	; GCN-NEXT: v_mac_f32_e32 v2, 0xcf800000, v3			; GCN-NEXT: v_mac_f32_e32 v2, 0xcf800000, v3
	; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2			; GCN-NEXT: v_cvt_u32_f32_e32 v2, v2
	; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3			; GCN-NEXT: v_cvt_u32_f32_e32 v3, v3
	; GCN-NEXT: v_mul_hi_u32 v6, v4, v2			; GCN-NEXT: v_mul_hi_u32 v6, v4, v2
	; GCN-NEXT: v_mul_lo_u32 v7, v4, v3			; GCN-NEXT: v_mul_lo_u32 v7, v4, v3
	Show All 33 Lines
	; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v10, vcc			; GCN-NEXT: v_addc_u32_e32 v9, vcc, 0, v10, vcc
	; GCN-NEXT: v_mul_lo_u32 v5, v3, v5			; GCN-NEXT: v_mul_lo_u32 v5, v3, v5
	; GCN-NEXT: v_add_i32_e32 v4, vcc, v8, v4			; GCN-NEXT: v_add_i32_e32 v4, vcc, v8, v4
	; GCN-NEXT: v_addc_u32_e32 v4, vcc, v9, v7, vcc			; GCN-NEXT: v_addc_u32_e32 v4, vcc, v9, v7, vcc
	; GCN-NEXT: v_addc_u32_e32 v6, vcc, v6, v11, vcc			; GCN-NEXT: v_addc_u32_e32 v6, vcc, v6, v11, vcc
	; GCN-NEXT: v_add_i32_e32 v4, vcc, v4, v5			; GCN-NEXT: v_add_i32_e32 v4, vcc, v4, v5
	; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v6, vcc			; GCN-NEXT: v_addc_u32_e32 v5, vcc, 0, v6, vcc
	; GCN-NEXT: v_add_i32_e32 v2, vcc, v2, v4			; GCN-NEXT: v_add_i32_e32 v2, vcc, v2, v4
	; GCN-NEXT: v_addc_u32_e32 v3, vcc, v3, v5, vcc			; GCN-NEXT: v_addc_u32_e32 v2, vcc, v3, v5, vcc
	; GCN-NEXT: v_lshrrev_b32_e32 v4, 17, v3
	; GCN-NEXT: v_lshlrev_b32_e32 v3, 15, v3
	; GCN-NEXT: v_lshrrev_b32_e32 v2, 17, v2			; GCN-NEXT: v_lshrrev_b32_e32 v2, 17, v2
	; GCN-NEXT: v_add_i32_e32 v2, vcc, v2, v3
	; GCN-NEXT: v_addc_u32_e32 v2, vcc, 0, v4, vcc
	; GCN-NEXT: v_mul_lo_u32 v3, v1, v2			; GCN-NEXT: v_mul_lo_u32 v3, v1, v2
	; GCN-NEXT: v_mul_hi_u32 v4, v0, v2			; GCN-NEXT: v_mul_hi_u32 v4, v0, v2
	; GCN-NEXT: v_mul_lo_u32 v2, v0, v2			; GCN-NEXT: v_mul_lo_u32 v2, v0, v2
	; GCN-NEXT: v_add_i32_e32 v3, vcc, v4, v3			; GCN-NEXT: v_add_i32_e32 v3, vcc, v4, v3
	; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v3			; GCN-NEXT: v_sub_i32_e32 v4, vcc, 0, v3
	; GCN-NEXT: v_sub_i32_e32 v2, vcc, s4, v2			; GCN-NEXT: v_sub_i32_e32 v2, vcc, 0x8000, v2
	; GCN-NEXT: v_subb_u32_e64 v4, s[4:5], v4, v1, vcc			; GCN-NEXT: v_subb_u32_e64 v4, s[4:5], v4, v1, vcc
	; GCN-NEXT: v_sub_i32_e64 v5, s[4:5], v2, v0			; GCN-NEXT: v_sub_i32_e64 v5, s[4:5], v2, v0
	; GCN-NEXT: v_subbrev_u32_e64 v6, s[6:7], 0, v4, s[4:5]			; GCN-NEXT: v_subbrev_u32_e64 v6, s[6:7], 0, v4, s[4:5]
	; GCN-NEXT: v_cmp_ge_u32_e64 s[6:7], v6, v1			; GCN-NEXT: v_cmp_ge_u32_e64 s[6:7], v6, v1
	; GCN-NEXT: v_cndmask_b32_e64 v7, 0, -1, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v7, 0, -1, s[6:7]
	; GCN-NEXT: v_cmp_ge_u32_e64 s[6:7], v5, v0			; GCN-NEXT: v_cmp_ge_u32_e64 s[6:7], v5, v0
	; GCN-NEXT: v_cndmask_b32_e64 v8, 0, -1, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v8, 0, -1, s[6:7]
	; GCN-NEXT: v_cmp_eq_u32_e64 s[6:7], v6, v1			; GCN-NEXT: v_cmp_eq_u32_e64 s[6:7], v6, v1
	▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-mul.ll

	Show First 20 Lines • Show All 678 Lines • ▼ Show 20 Lines
	;			;
	%B13 = mul nsw i32 %arg, -65536			%B13 = mul nsw i32 %arg, -65536
	%C10 = icmp ne i32 mul (i32 or (i32 zext (i1 icmp eq (i32* @g, i32* null) to i32), i32 65537), i32 -65536), %B13			%C10 = icmp ne i32 mul (i32 or (i32 zext (i1 icmp eq (i32* @g, i32* null) to i32), i32 65537), i32 -65536), %B13
	ret i1 %C10			ret i1 %C10
	}			}

	define i1 @mul_of_bool(i32 %x, i8 %y) {			define i1 @mul_of_bool(i32 %x, i8 %y) {
	; CHECK-LABEL: @mul_of_bool(			; CHECK-LABEL: @mul_of_bool(
	; CHECK-NEXT: [[B:%.]] = and i32 [[X:%.]], 1			; CHECK-NEXT: ret i1 false
	; CHECK-NEXT: [[Z:%.]] = zext i8 [[Y:%.]] to i32
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[B]], [[Z]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 255
	; CHECK-NEXT: ret i1 [[R]]
	;			;
	%b = and i32 %x, 1			%b = and i32 %x, 1
	%z = zext i8 %y to i32			%z = zext i8 %y to i32
	%m = mul i32 %b, %z			%m = mul i32 %b, %z
	%r = icmp ugt i32 %m, 255			%r = icmp ugt i32 %m, 255
	ret i1 %r			ret i1 %r
	}			}

	define i1 @mul_of_bool_commute(i32 %x, i32 %y) {			define i1 @mul_of_bool_commute(i32 %x, i32 %y) {
	; CHECK-LABEL: @mul_of_bool_commute(			; CHECK-LABEL: @mul_of_bool_commute(
	; CHECK-NEXT: [[X1:%.]] = and i32 [[X:%.]], 1			; CHECK-NEXT: ret i1 false
	; CHECK-NEXT: [[Y8:%.]] = and i32 [[Y:%.]], 255
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Y8]], [[X1]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 255
	; CHECK-NEXT: ret i1 [[R]]
	;			;
	%x1 = and i32 %x, 1			%x1 = and i32 %x, 1
	%y8 = and i32 %y, 255			%y8 = and i32 %y, 255
	%m = mul i32 %y8, %x1			%m = mul i32 %y8, %x1
	%r = icmp ugt i32 %m, 255			%r = icmp ugt i32 %m, 255
	ret i1 %r			ret i1 %r
	}			}

	define i1 @mul_of_bools(i32 %x, i32 %y) {			define i1 @mul_of_bools(i32 %x, i32 %y) {
	; CHECK-LABEL: @mul_of_bools(			; CHECK-LABEL: @mul_of_bools(
	; CHECK-NEXT: [[X1:%.]] = and i32 [[X:%.]], 1			; CHECK-NEXT: ret i1 true
	; CHECK-NEXT: [[Y1:%.]] = and i32 [[Y:%.]], 1
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[X1]], [[Y1]]
	; CHECK-NEXT: [[R:%.*]] = icmp ult i32 [[M]], 2
	; CHECK-NEXT: ret i1 [[R]]
	;			;
	%x1 = and i32 %x, 1			%x1 = and i32 %x, 1
	%y1 = and i32 %y, 1			%y1 = and i32 %y, 1
	%m = mul i32 %x1, %y1			%m = mul i32 %x1, %y1
	%r = icmp ult i32 %m, 2			%r = icmp ult i32 %m, 2
	ret i1 %r			ret i1 %r
	}			}

				; negative test - not a mask of low bit

	define i1 @not_mul_of_bool(i32 %x, i8 %y) {			define i1 @not_mul_of_bool(i32 %x, i8 %y) {
	; CHECK-LABEL: @not_mul_of_bool(			; CHECK-LABEL: @not_mul_of_bool(
	; CHECK-NEXT: [[Q:%.]] = and i32 [[X:%.]], 3			; CHECK-NEXT: [[Q:%.]] = and i32 [[X:%.]], 3
	; CHECK-NEXT: [[Z:%.]] = zext i8 [[Y:%.]] to i32			; CHECK-NEXT: [[Z:%.]] = zext i8 [[Y:%.]] to i32
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Q]], [[Z]]			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Q]], [[Z]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 255			; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 255
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%q = and i32 %x, 3			%q = and i32 %x, 3
	%z = zext i8 %y to i32			%z = zext i8 %y to i32
	%m = mul i32 %q, %z			%m = mul i32 %q, %z
	%r = icmp ugt i32 %m, 255			%r = icmp ugt i32 %m, 255
	ret i1 %r			ret i1 %r
	}			}

				; negative test - not a single low bit

	define i1 @not_mul_of_bool_commute(i32 %x, i32 %y) {			define i1 @not_mul_of_bool_commute(i32 %x, i32 %y) {
	; CHECK-LABEL: @not_mul_of_bool_commute(			; CHECK-LABEL: @not_mul_of_bool_commute(
	; CHECK-NEXT: [[X30:%.]] = lshr i32 [[X:%.]], 30			; CHECK-NEXT: [[X30:%.]] = lshr i32 [[X:%.]], 30
	; CHECK-NEXT: [[Y8:%.]] = and i32 [[Y:%.]], 255			; CHECK-NEXT: [[Y8:%.]] = and i32 [[Y:%.]], 255
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Y8]], [[X30]]			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Y8]], [[X30]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 255			; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 255
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%x30 = lshr i32 %x, 30			%x30 = lshr i32 %x, 30
	%y8 = and i32 %y, 255			%y8 = and i32 %y, 255
	%m = mul i32 %y8, %x30			%m = mul i32 %y8, %x30
	%r = icmp ugt i32 %m, 255			%r = icmp ugt i32 %m, 255
	ret i1 %r			ret i1 %r
	}			}

				; negative test - no leading zeros for 's'
				; TODO: If analysis was generalized for sign bits, we could reduce this to false.

	define i1 @mul_of_bool_no_lz_other_op(i32 %x, i8 %y) {			define i1 @mul_of_bool_no_lz_other_op(i32 %x, i8 %y) {
	; CHECK-LABEL: @mul_of_bool_no_lz_other_op(			; CHECK-LABEL: @mul_of_bool_no_lz_other_op(
	; CHECK-NEXT: [[B:%.]] = and i32 [[X:%.]], 1			; CHECK-NEXT: [[B:%.]] = and i32 [[X:%.]], 1
	; CHECK-NEXT: [[S:%.]] = sext i8 [[Y:%.]] to i32			; CHECK-NEXT: [[S:%.]] = sext i8 [[Y:%.]] to i32
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[B]], [[S]]			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[B]], [[S]]
	; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[M]], 127			; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[M]], 127
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%b = and i32 %x, 1			%b = and i32 %x, 1
	%s = sext i8 %y to i32			%s = sext i8 %y to i32
	%m = mul nuw nsw i32 %b, %s			%m = mul nuw nsw i32 %b, %s
	%r = icmp sgt i32 %m, 127			%r = icmp sgt i32 %m, 127
	ret i1 %r			ret i1 %r
	}			}

				; high and low bits are known 0

	define i1 @mul_of_pow2(i32 %x, i8 %y) {			define i1 @mul_of_pow2(i32 %x, i8 %y) {
	; CHECK-LABEL: @mul_of_pow2(			; CHECK-LABEL: @mul_of_pow2(
	; CHECK-NEXT: [[B:%.]] = and i32 [[X:%.]], 2			; CHECK-NEXT: ret i1 false
	; CHECK-NEXT: [[Z:%.]] = zext i8 [[Y:%.]] to i32
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[B]], [[Z]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 510
	; CHECK-NEXT: ret i1 [[R]]
	;			;
	%b = and i32 %x, 2			%b = and i32 %x, 2
	%z = zext i8 %y to i32			%z = zext i8 %y to i32
	%m = mul i32 %b, %z			%m = mul i32 %b, %z
	%r = icmp ugt i32 %m, 510			%r = icmp ugt i32 %m, 510
	ret i1 %r			ret i1 %r
	}			}

				; high and low bits are known 0

	define i1 @mul_of_pow2_commute(i32 %x, i32 %y) {			define i1 @mul_of_pow2_commute(i32 %x, i32 %y) {
	; CHECK-LABEL: @mul_of_pow2_commute(			; CHECK-LABEL: @mul_of_pow2_commute(
	; CHECK-NEXT: [[X4:%.]] = and i32 [[X:%.]], 4			; CHECK-NEXT: ret i1 false
	; CHECK-NEXT: [[Y8:%.]] = and i32 [[Y:%.]], 255
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Y8]], [[X4]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 1020
	; CHECK-NEXT: ret i1 [[R]]
	;			;
	%x4 = and i32 %x, 4			%x4 = and i32 %x, 4
	%y8 = and i32 %y, 255			%y8 = and i32 %y, 255
	%m = mul i32 %y8, %x4			%m = mul i32 %y8, %x4
	%r = icmp ugt i32 %m, 1020			%r = icmp ugt i32 %m, 1020
	ret i1 %r			ret i1 %r
	}			}

				; only bit 7 can be set by the multiply

	define i32 @mul_of_pow2s(i32 %x, i32 %y) {			define i32 @mul_of_pow2s(i32 %x, i32 %y) {
	; CHECK-LABEL: @mul_of_pow2s(			; CHECK-LABEL: @mul_of_pow2s(
	; CHECK-NEXT: [[X8:%.]] = and i32 [[X:%.]], 8			; CHECK-NEXT: ret i32 128
	; CHECK-NEXT: [[Y16:%.]] = and i32 [[Y:%.]], 16
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[X8]], [[Y16]]
	; CHECK-NEXT: [[BIT7:%.*]] = or i32 [[M]], 128
	; CHECK-NEXT: ret i32 [[BIT7]]
	;			;
	%x8 = and i32 %x, 8			%x8 = and i32 %x, 8
	%y16 = and i32 %y, 16			%y16 = and i32 %y, 16
	%m = mul i32 %x8, %y16			%m = mul i32 %x8, %y16
	%bit7 = or i32 %m, 128			%bit7 = or i32 %m, 128
	ret i32 %bit7			ret i32 %bit7
	}			}

				; negative test - 6 * 255 = 1530 (but constant range analysis can get this)

	define i1 @not_mul_of_pow2(i32 %x, i8 %y) {			define i1 @not_mul_of_pow2(i32 %x, i8 %y) {
	; CHECK-LABEL: @not_mul_of_pow2(			; CHECK-LABEL: @not_mul_of_pow2(
	; CHECK-NEXT: [[Q:%.]] = and i32 [[X:%.]], 6			; CHECK-NEXT: [[Q:%.]] = and i32 [[X:%.]], 6
	; CHECK-NEXT: [[Z:%.]] = zext i8 [[Y:%.]] to i32			; CHECK-NEXT: [[Z:%.]] = zext i8 [[Y:%.]] to i32
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Q]], [[Z]]			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Q]], [[Z]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 1530			; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 1530
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%q = and i32 %x, 6			%q = and i32 %x, 6
	%z = zext i8 %y to i32			%z = zext i8 %y to i32
	%m = mul i32 %q, %z			%m = mul i32 %q, %z
	%r = icmp ugt i32 %m, 1530			%r = icmp ugt i32 %m, 1530
	ret i1 %r			ret i1 %r
	}			}

				; negative test - 12 * 255 = 3060 (but constant range analysis can get this)

	define i1 @not_mul_of_pow2_commute(i32 %x, i32 %y) {			define i1 @not_mul_of_pow2_commute(i32 %x, i32 %y) {
	; CHECK-LABEL: @not_mul_of_pow2_commute(			; CHECK-LABEL: @not_mul_of_pow2_commute(
	; CHECK-NEXT: [[X30:%.]] = and i32 [[X:%.]], 12			; CHECK-NEXT: [[X30:%.]] = and i32 [[X:%.]], 12
	; CHECK-NEXT: [[Y8:%.]] = and i32 [[Y:%.]], 255			; CHECK-NEXT: [[Y8:%.]] = and i32 [[Y:%.]], 255
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Y8]], [[X30]]			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[Y8]], [[X30]]
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 3060			; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[M]], 3060
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%x30 = and i32 %x, 12			%x30 = and i32 %x, 12
	%y8 = and i32 %y, 255			%y8 = and i32 %y, 255
	%m = mul i32 %y8, %x30			%m = mul i32 %y8, %x30
	%r = icmp ugt i32 %m, 3060			%r = icmp ugt i32 %m, 3060
	ret i1 %r			ret i1 %r
	}			}

				; negative test - no leading zeros for 's'
				; TODO: If analysis was generalized for sign bits, we could reduce this to false.

	define i1 @mul_of_pow2_no_lz_other_op(i32 %x, i8 %y) {			define i1 @mul_of_pow2_no_lz_other_op(i32 %x, i8 %y) {
	; CHECK-LABEL: @mul_of_pow2_no_lz_other_op(			; CHECK-LABEL: @mul_of_pow2_no_lz_other_op(
	; CHECK-NEXT: [[B:%.]] = and i32 [[X:%.]], 2			; CHECK-NEXT: [[B:%.]] = and i32 [[X:%.]], 2
	; CHECK-NEXT: [[S:%.]] = sext i8 [[Y:%.]] to i32			; CHECK-NEXT: [[S:%.]] = sext i8 [[Y:%.]] to i32
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[B]], [[S]]			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i32 [[B]], [[S]]
	; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[M]], 254			; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[M]], 254
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%b = and i32 %x, 2			%b = and i32 %x, 2
	%s = sext i8 %y to i32			%s = sext i8 %y to i32
	%m = mul nuw nsw i32 %b, %s			%m = mul nuw nsw i32 %b, %s
	%r = icmp sgt i32 %m, 254			%r = icmp sgt i32 %m, 254
	ret i1 %r			ret i1 %r
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[Support] improve known bits analysis for multiply by power-of-2 (1 set bit)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 392802

llvm/lib/Support/KnownBits.cpp

llvm/test/CodeGen/AMDGPU/sdiv64.ll

llvm/test/CodeGen/AMDGPU/srem64.ll

llvm/test/Transforms/InstCombine/icmp-mul.ll

[Support] improve known bits analysis for multiply by power-of-2 (1 set bit)
ClosedPublic