Download Raw Diff

Details

Reviewers

efriedma
arsenm
hfinkel
nemanjai
RKSimon
xbolva00

Commits

rG2afe86411847: [DAG] Add SimplifyDemandedBits support for BSWAP

Summary

This affects a couple of changes that I need some target-specific advice on:

aarch64 - we're losing this as the zext is being simplified to aext, so the canonicalization fails to confirm that the upper bits are zero. I can try adding a zext(bswap(trunc(x))) variant if that'd be useful as an alternative?

amdgpu - I think these are all benign.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Feb 10 2019, 11:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2019, 11:34 AM

Herald added subscribers: jsji, kristof.beyls, tpr and 5 others. · View Herald Transcript

and+rev16/rev32 isn't really any better than rev+lsr; that's fine as far as it goes. But please make sure we have coverage for cases where the zero-extension is free (e.g. the operand is a load, or a zeroext value, or the result of i32 arithmetic).

Hmm, this is one of those cases where it'd be awesome to have a Godbolt for the tests.

AMDGPU/bitreverse.ll looks good to me. AMDGPU/bswap.ll is _probably_ fine but I'd feel more comfortable seeing the assembly in full.

arsenm added inline comments.Feb 22 2019, 7:06 AM

test/CodeGen/AMDGPU/bitreverse.ll
117–118 ↗	(On Diff #186158)	This increased the instruction count?

RKSimon mentioned this in rL354863: [AMDGPU] Regenerate bswap/bitreverse tests..Feb 26 2019, 3:02 AM

RKSimon mentioned this in rG2ccc120d191f: [AMDGPU] Regenerate bswap/bitreverse tests..

rebase

RKSimon mentioned this in rL354869: [AArch64] Add 'free' zext bswap tests..Feb 26 2019, 4:05 AM

RKSimon mentioned this in rGe42be1eae230: [AArch64] Add 'free' zext bswap tests..

RKSimon mentioned this in rL354872: [AArch64] Add arithmetic zext bswap tests..Feb 26 2019, 5:21 AM

RKSimon mentioned this in rGd4a406e4998d: [AArch64] Add arithmetic zext bswap tests..

updated with the new aarch64 tests

the zext(load()) test fails to recognise the upper bits are zero as it becomes anyext(load())
the arithmetic ubfx tests do still exercise the rotr(bswap()) combine

just need to address the powerpc regression now

LGTM

This revision is now accepted and ready to land.Feb 26 2019, 7:40 AM

@nemanjai Do you have any comments on the powerpc issue? You created the original test case at rL347288.

efriedma added inline comments.Feb 26 2019, 1:27 PM

test/CodeGen/AArch64/arm64-rev.ll
102 ↗	(On Diff #188350)	This is a regression... but it's sort of orthogonal to this patch, I guess. It's okay for now.

RKSimon marked an inline comment as done.Feb 27 2019, 3:13 AM

RKSimon added inline comments.

test/CodeGen/AArch64/arm64-rev.ll
102 ↗	(On Diff #188350)	I've raised https://bugs.llvm.org/show_bug.cgi?id=40881 to cover this - its a common issue as we make SimplifyDemandedBits more capable.

RKSimon mentioned this in D58070: [DAGCombine] Prune unnused nodes..Feb 28 2019, 9:07 AM

RKSimon mentioned this in rL355532: [PowerPC] Use real pointers instead of undef.Mar 6 2019, 10:48 AM

RKSimon mentioned this in rG417f8c5be4d0: [PowerPC] Use real pointers instead of undef.Mar 6 2019, 10:52 AM

RKSimon mentioned this in rL355933: Regenerate sign_extend.ll test..Mar 12 2019, 9:00 AM

RKSimon mentioned this in rGa6013c028637: Regenerate sign_extend.ll test..

rebase

RKSimon mentioned this in rL360534: [DAG] Add SimplifyDemandedBits support for BITREVERSE.May 11 2019, 1:53 PM

RKSimon mentioned this in rG605a840747be: [DAG] Add SimplifyDemandedBits support for BITREVERSE.May 11 2019, 1:57 PM

rebased now that BITREVERSE support has been committed in rL360534

sidorovd mentioned this in rG6160b9742a29: [DAG] Add SimplifyDemandedBits support for BITREVERSE.May 30 2019, 9:11 AM

sidorovd mentioned this in rG124f15b85328: [DAG] Add SimplifyDemandedBits support for BITREVERSE.May 30 2019, 10:12 AM

spatel mentioned this in rGb39009bf1dc9: [DAGCombiner] improve readability.Dec 12 2019, 10:23 AM

spatel mentioned this in rG8963332c3327: [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc.Dec 12 2019, 3:55 PM

spatel mentioned this in rG2f0c7fd2dbd0: [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try).Dec 13 2019, 11:19 AM

Commandeering to rebase.

Herald added subscribers: steven.zhang, • wuzish, mcrosier. · View Herald TranscriptDec 14 2019, 7:35 AM

Patch updated:
No code changes, but the PPC regression disappears after rG2f0c7fd2dbd0, so I think this is safe to commit now.

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 14 2019, 7:37 AM

Seems fine.

Please update commit message.

spatel edited the summary of this revision. (Show Details)Dec 14 2019, 8:11 AM

Closed by commit rG2afe86411847: [DAG] Add SimplifyDemandedBits support for BSWAP (authored by spatel). · Explain WhyDec 15 2019, 6:00 AM

This revision was automatically updated to reflect the committed changes.

Thanks for completing this @spatel !

Diff 233961

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,538 Lines • ▼ Show 20 Lines	case ISD::BITREVERSE: {
APInt DemandedSrcBits = DemandedBits.reverseBits();		APInt DemandedSrcBits = DemandedBits.reverseBits();
if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedElts, Known2, TLO,		if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedElts, Known2, TLO,
Depth + 1))		Depth + 1))
return true;		return true;
Known.One = Known2.One.reverseBits();		Known.One = Known2.One.reverseBits();
Known.Zero = Known2.Zero.reverseBits();		Known.Zero = Known2.Zero.reverseBits();
break;		break;
}		}
		case ISD::BSWAP: {
		SDValue Src = Op.getOperand(0);
		APInt DemandedSrcBits = DemandedBits.byteSwap();
		if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedElts, Known2, TLO,
		Depth + 1))
		return true;
		Known.One = Known2.One.byteSwap();
		Known.Zero = Known2.Zero.byteSwap();
		break;
		}
case ISD::SIGN_EXTEND_INREG: {		case ISD::SIGN_EXTEND_INREG: {
SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
EVT ExVT = cast<VTSDNode>(Op.getOperand(1))->getVT();		EVT ExVT = cast<VTSDNode>(Op.getOperand(1))->getVT();
unsigned ExVTBits = ExVT.getScalarSizeInBits();		unsigned ExVTBits = ExVT.getScalarSizeInBits();

// If we only care about the highest bit, don't bother shifting right.		// If we only care about the highest bit, don't bother shifting right.
if (DemandedBits.isSignMask()) {		if (DemandedBits.isSignMask()) {
unsigned NumSignBits = TLO.DAG.ComputeNumSignBits(Op0);		unsigned NumSignBits = TLO.DAG.ComputeNumSignBits(Op0);
▲ Show 20 Lines • Show All 5,911 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-rev.ll

Show All 33 Lines	entry:
ret i64 %0		ret i64 %0
}		}

; Canonicalize (srl (bswap x), 16) to (rotr (bswap x), 16) if the high 16-bits		; Canonicalize (srl (bswap x), 16) to (rotr (bswap x), 16) if the high 16-bits
; of %a are zero. This optimizes rev + lsr 16 to rev16.		; of %a are zero. This optimizes rev + lsr 16 to rev16.
define i32 @test_rev_w_srl16(i16 %a) {		define i32 @test_rev_w_srl16(i16 %a) {
; CHECK-LABEL: test_rev_w_srl16:		; CHECK-LABEL: test_rev_w_srl16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: and w8, w0, #0xffff		; CHECK-NEXT: rev w8, w0
; CHECK-NEXT: rev16 w0, w8		; CHECK-NEXT: lsr w0, w8, #16
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; FALLBACK-LABEL: test_rev_w_srl16:		; FALLBACK-LABEL: test_rev_w_srl16:
; FALLBACK: // %bb.0: // %entry		; FALLBACK: // %bb.0: // %entry
; FALLBACK-NEXT: and w8, w0, #0xffff		; FALLBACK-NEXT: and w8, w0, #0xffff
; FALLBACK-NEXT: rev w8, w8		; FALLBACK-NEXT: rev w8, w8
; FALLBACK-NEXT: lsr w0, w8, #16		; FALLBACK-NEXT: lsr w0, w8, #16
; FALLBACK-NEXT: ret		; FALLBACK-NEXT: ret
entry:		entry:
%0 = zext i16 %a to i32		%0 = zext i16 %a to i32
%1 = tail call i32 @llvm.bswap.i32(i32 %0)		%1 = tail call i32 @llvm.bswap.i32(i32 %0)
%2 = lshr i32 %1, 16		%2 = lshr i32 %1, 16
ret i32 %2		ret i32 %2
}		}

define i32 @test_rev_w_srl16_load(i16 *%a) {		define i32 @test_rev_w_srl16_load(i16 *%a) {
; CHECK-LABEL: test_rev_w_srl16_load:		; CHECK-LABEL: test_rev_w_srl16_load:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldrh w8, [x0]		; CHECK-NEXT: ldrh w8, [x0]
; CHECK-NEXT: rev16 w0, w8		; CHECK-NEXT: rev w8, w8
		; CHECK-NEXT: lsr w0, w8, #16
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; FALLBACK-LABEL: test_rev_w_srl16_load:		; FALLBACK-LABEL: test_rev_w_srl16_load:
; FALLBACK: // %bb.0: // %entry		; FALLBACK: // %bb.0: // %entry
; FALLBACK-NEXT: ldrh w8, [x0]		; FALLBACK-NEXT: ldrh w8, [x0]
; FALLBACK-NEXT: rev w8, w8		; FALLBACK-NEXT: rev w8, w8
; FALLBACK-NEXT: lsr w0, w8, #16		; FALLBACK-NEXT: lsr w0, w8, #16
; FALLBACK-NEXT: ret		; FALLBACK-NEXT: ret
Show All 29 Lines	entry:
ret i32 %4		ret i32 %4
}		}

; Canonicalize (srl (bswap x), 32) to (rotr (bswap x), 32) if the high 32-bits		; Canonicalize (srl (bswap x), 32) to (rotr (bswap x), 32) if the high 32-bits
; of %a are zero. This optimizes rev + lsr 32 to rev32.		; of %a are zero. This optimizes rev + lsr 32 to rev32.
define i64 @test_rev_x_srl32(i32 %a) {		define i64 @test_rev_x_srl32(i32 %a) {
; CHECK-LABEL: test_rev_x_srl32:		; CHECK-LABEL: test_rev_x_srl32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov w8, w0		; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
; CHECK-NEXT: rev32 x0, x8		; CHECK-NEXT: rev x8, x0
		; CHECK-NEXT: lsr x0, x8, #32
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; FALLBACK-LABEL: test_rev_x_srl32:		; FALLBACK-LABEL: test_rev_x_srl32:
; FALLBACK: // %bb.0: // %entry		; FALLBACK: // %bb.0: // %entry
; FALLBACK-NEXT: // kill: def $w0 killed $w0 def $x0		; FALLBACK-NEXT: // kill: def $w0 killed $w0 def $x0
; FALLBACK-NEXT: ubfx x8, x0, #0, #32		; FALLBACK-NEXT: ubfx x8, x0, #0, #32
; FALLBACK-NEXT: rev x8, x8		; FALLBACK-NEXT: rev x8, x8
; FALLBACK-NEXT: lsr x0, x8, #32		; FALLBACK-NEXT: lsr x0, x8, #32
; FALLBACK-NEXT: ret		; FALLBACK-NEXT: ret
entry:		entry:
%0 = zext i32 %a to i64		%0 = zext i32 %a to i64
%1 = tail call i64 @llvm.bswap.i64(i64 %0)		%1 = tail call i64 @llvm.bswap.i64(i64 %0)
%2 = lshr i64 %1, 32		%2 = lshr i64 %1, 32
ret i64 %2		ret i64 %2
}		}

define i64 @test_rev_x_srl32_load(i32 *%a) {		define i64 @test_rev_x_srl32_load(i32 *%a) {
; CHECK-LABEL: test_rev_x_srl32_load:		; CHECK-LABEL: test_rev_x_srl32_load:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldr w8, [x0]		; CHECK-NEXT: ldr w8, [x0]
; CHECK-NEXT: rev32 x0, x8		; CHECK-NEXT: rev x8, x8
		; CHECK-NEXT: lsr x0, x8, #32
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; FALLBACK-LABEL: test_rev_x_srl32_load:		; FALLBACK-LABEL: test_rev_x_srl32_load:
; FALLBACK: // %bb.0: // %entry		; FALLBACK: // %bb.0: // %entry
; FALLBACK-NEXT: ldr w8, [x0]		; FALLBACK-NEXT: ldr w8, [x0]
; FALLBACK-NEXT: rev x8, x8		; FALLBACK-NEXT: rev x8, x8
; FALLBACK-NEXT: lsr x0, x8, #32		; FALLBACK-NEXT: lsr x0, x8, #32
; FALLBACK-NEXT: ret		; FALLBACK-NEXT: ret
▲ Show 20 Lines • Show All 474 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/bswap.ll

	Show First 20 Lines • Show All 726 Lines • ▼ Show 20 Lines
	; SI-NEXT: v_bfi_b32 v0, s4, v0, v1			; SI-NEXT: v_bfi_b32 v0, s4, v0, v1
	; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: s_setpc_b64 s[30:31]			; SI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; VI-LABEL: missing_truncate_promote_bswap:			; VI-LABEL: missing_truncate_promote_bswap:
	; VI: ; %bb.0: ; %bb			; VI: ; %bb.0: ; %bb
	; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; VI-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; VI-NEXT: v_alignbit_b32 v1, v0, v0, 8			; VI-NEXT: v_alignbit_b32 v1, v0, v0, 8
	; VI-NEXT: v_alignbit_b32 v0, v0, v0, 24			; VI-NEXT: v_alignbit_b32 v0, v0, v0, 24
	; VI-NEXT: s_mov_b32 s4, 0xff00ff			; VI-NEXT: s_mov_b32 s4, 0xff00ff
	; VI-NEXT: v_bfi_b32 v0, s4, v0, v1			; VI-NEXT: v_bfi_b32 v0, s4, v0, v1
	; VI-NEXT: v_cvt_f32_f16_sdwa v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; VI-NEXT: v_cvt_f32_f16_sdwa v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = trunc i32 %arg to i16			%tmp = trunc i32 %arg to i16
	%tmp1 = call i16 @llvm.bswap.i16(i16 %tmp)			%tmp1 = call i16 @llvm.bswap.i16(i16 %tmp)
	%tmp2 = bitcast i16 %tmp1 to half			%tmp2 = bitcast i16 %tmp1 to half
	%tmp3 = fpext half %tmp2 to float			%tmp3 = fpext half %tmp2 to float
	ret float %tmp3			ret float %tmp3
	}			}

llvm/test/CodeGen/X86/combine-bswap.ll

Show All 34 Lines	; X64-NEXT: retq
%b = call i32 @llvm.bswap.i32(i32 %a0)		%b = call i32 @llvm.bswap.i32(i32 %a0)
%c = call i32 @llvm.bswap.i32(i32 %b)		%c = call i32 @llvm.bswap.i32(i32 %b)
ret i32 %c		ret i32 %c
}		}

define i32 @test_demandedbits_bswap(i32 %a0) nounwind {		define i32 @test_demandedbits_bswap(i32 %a0) nounwind {
; X86-LABEL: test_demandedbits_bswap:		; X86-LABEL: test_demandedbits_bswap:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl $-16777216, %eax # imm = 0xFF000000		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: bswapl %eax		; X86-NEXT: bswapl %eax
; X86-NEXT: andl $-65536, %eax # imm = 0xFFFF0000		; X86-NEXT: andl $-65536, %eax # imm = 0xFFFF0000
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_demandedbits_bswap:		; X64-LABEL: test_demandedbits_bswap:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %edi, %eax		; X64-NEXT: movl %edi, %eax
; X64-NEXT: orl $-16777216, %eax # imm = 0xFF000000
; X64-NEXT: bswapl %eax		; X64-NEXT: bswapl %eax
; X64-NEXT: andl $-65536, %eax # imm = 0xFFFF0000		; X64-NEXT: andl $-65536, %eax # imm = 0xFFFF0000
; X64-NEXT: retq		; X64-NEXT: retq
%b = or i32 %a0, 4278190080		%b = or i32 %a0, 4278190080
%c = call i32 @llvm.bswap.i32(i32 %b)		%c = call i32 @llvm.bswap.i32(i32 %b)
%d = and i32 %c, 4294901760		%d = and i32 %c, 4294901760
ret i32 %d		ret i32 %d
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Add SimplifyDemandedBits support for BSWAP
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 233961

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/AArch64/arm64-rev.ll

llvm/test/CodeGen/AMDGPU/bswap.ll

llvm/test/CodeGen/X86/combine-bswap.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Add SimplifyDemandedBits support for BSWAPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 233961

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/AArch64/arm64-rev.ll

llvm/test/CodeGen/AMDGPU/bswap.ll

llvm/test/CodeGen/X86/combine-bswap.ll

[DAG] Add SimplifyDemandedBits support for BSWAP
ClosedPublic