This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/X86/X86ISelLowering.cpp
37020 ↗	(On Diff #184089)	The operation is similar to what's done here.
37054 ↗	(On Diff #184089)	It's not obvious to me that it would work. The and opcode itself uses it, so there is no need to redo it here. We also need to know if masking took place to know what we can accept as LHS for the sub opcode.

craig.topper added inline comments.Jan 29 2019, 7:40 PM

lib/Target/X86/X86ISelLowering.cpp
37054 ↗	(On Diff #184089)	I know this code (Bits - 1) check is used in other places earlier in this function, but is that valid for i16 SHLD/SHRD? SHLD/SHRD hardware mask shift amount to 5 bits on i16/i32 and 6 bits on i64.

We might be better off just working on getting this code moved into DAGCombine (https://bugs.llvm.org/show_bug.cgi?id=40081) and use generic funnel shifts.

@RKSimon That sound reasonable. The main motivation for this patch is to fix a regression introduced by D57367 , so would that be possible to get at least a concept ack on it ?

lib/Target/X86/X86ISelLowering.cpp
37054 ↗	(On Diff #184089)	I do not know if this is valid for i16. If that isn't, this code is bogous already. Would you have a test case I can rely on to do this ?

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 5:23 PM

craig.topper added inline comments.Feb 7 2019, 3:40 PM

lib/Target/X86/X86ISelLowering.cpp
37054 ↗	(On Diff #184089)	@RKSimon what are you thoughts on the existing uses of (Bits - 1) in this function for the i16 case? Those seem wrong to me.

RKSimon added inline comments.Feb 8 2019, 1:28 AM

lib/Target/X86/X86ISelLowering.cpp
37054 ↗	(On Diff #184089)	Yes I think its wrong - we're most likely being saved by the fact that its so tricky to create i16 double shifts from code (PR35155). As I said I'd much prefer to kill all this code and move it to FSHL/FSHR in DAGCombine (PR40081) - I'll take a look.

RKSimon mentioned this in rL353626: [X86] CombineOr - fold to generic funnel shifts.Feb 9 2019, 12:34 PM

RKSimon mentioned this in rG6bf7b30b10f5: [X86] CombineOr - fold to generic funnel shifts.

RKSimon added inline comments.Feb 9 2019, 12:52 PM

lib/Target/X86/X86ISelLowering.cpp
37054 ↗	(On Diff #184089)	Unravelling all of this code is proving trickier than I hoped due to various custom/legalization issues. I've committed rL353626 which folds to FSHL/FSHR and a fix for the i16 issue.

@deadalnix Please can you rebase this?

rebase on top of rL353626

Harbormaster completed remote builds in B28396: Diff 187855.Feb 21 2019, 1:32 PM

LGTM

This revision is now accepted and ready to land.Feb 22 2019, 1:39 AM

Rebase and add test cases taht do not depend on D57367

Harbormaster completed remote builds in B28707: Diff 189022.Mar 1 2019, 6:35 PM

Closed by commit rL355260: [X86] Improve use of SHLD/SHRD (authored by deadalnix). · Explain WhyMar 1 2019, 6:43 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

6 lines

test/

CodeGen/

X86/

shift-double-x86_64.ll

10 lines

shift-double.ll

24 lines

Diff 189023

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 37,419 Lines • ▼ Show 20 Lines	static SDValue combineOr(SDNode *N, SelectionDAG &DAG,
// OR( SHL( X, C ), SRL( SRL( Y, 1 ), XOR( C, 31 ) ) ) -> FSHL( X, Y, C )		// OR( SHL( X, C ), SRL( SRL( Y, 1 ), XOR( C, 31 ) ) ) -> FSHL( X, Y, C )
// OR( SRL( X, C ), SHL( SHL( Y, 1 ), XOR( C, 31 ) ) ) -> FSHR( Y, X, C )		// OR( SRL( X, C ), SHL( SHL( Y, 1 ), XOR( C, 31 ) ) ) -> FSHR( Y, X, C )
// OR( SHL( X, AND( C, 31 ) ), SRL( Y, AND( 0 - C, 31 ) ) ) -> FSHL( X, Y, C )		// OR( SHL( X, AND( C, 31 ) ), SRL( Y, AND( 0 - C, 31 ) ) ) -> FSHL( X, Y, C )
// OR( SRL( X, AND( C, 31 ) ), SHL( Y, AND( 0 - C, 31 ) ) ) -> FSHR( Y, X, C )		// OR( SRL( X, AND( C, 31 ) ), SHL( Y, AND( 0 - C, 31 ) ) ) -> FSHR( Y, X, C )
if (ShAmt1.getOpcode() == ISD::SUB) {		if (ShAmt1.getOpcode() == ISD::SUB) {
SDValue Sum = ShAmt1.getOperand(0);		SDValue Sum = ShAmt1.getOperand(0);
if (auto *SumC = dyn_cast<ConstantSDNode>(Sum)) {		if (auto *SumC = dyn_cast<ConstantSDNode>(Sum)) {
SDValue ShAmt1Op1 = ShAmt1.getOperand(1);		SDValue ShAmt1Op1 = ShAmt1.getOperand(1);
		if (ShAmt1Op1.getOpcode() == ISD::AND &&
		isa<ConstantSDNode>(ShAmt1Op1.getOperand(1)) &&
		ShAmt1Op1.getConstantOperandVal(1) == (Bits - 1)) {
		ShMsk1 = ShAmt1Op1;
		ShAmt1Op1 = ShAmt1Op1.getOperand(0);
		}
if (ShAmt1Op1.getOpcode() == ISD::TRUNCATE)		if (ShAmt1Op1.getOpcode() == ISD::TRUNCATE)
ShAmt1Op1 = ShAmt1Op1.getOperand(0);		ShAmt1Op1 = ShAmt1Op1.getOperand(0);
if ((SumC->getAPIntValue() == Bits \|\|		if ((SumC->getAPIntValue() == Bits \|\|
(SumC->getAPIntValue() == 0 && ShMsk1)) &&		(SumC->getAPIntValue() == 0 && ShMsk1)) &&
ShAmt1Op1 == ShAmt0)		ShAmt1Op1 == ShAmt0)
return GetFunnelShift(Op0, Op1, ShAmt0);		return GetFunnelShift(Op0, Op1, ShAmt0);
}		}
} else if (auto *ShAmt1C = dyn_cast<ConstantSDNode>(ShAmt1)) {		} else if (auto *ShAmt1C = dyn_cast<ConstantSDNode>(ShAmt1)) {
▲ Show 20 Lines • Show All 6,119 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/shift-double-x86_64.ll

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%sh_hi = lshr i64 %hi, %bits		%sh_hi = lshr i64 %hi, %bits
%sh = or i64 %sh_lo, %sh_hi		%sh = or i64 %sh_lo, %sh_hi
ret i64 %sh		ret i64 %sh
}		}

define i64 @test8(i64 %hi, i64 %lo, i64 %bits) nounwind {		define i64 @test8(i64 %hi, i64 %lo, i64 %bits) nounwind {
; CHECK-LABEL: test8:		; CHECK-LABEL: test8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
		; CHECK-NEXT: movq %rdx, %rcx
; CHECK-NEXT: movq %rdi, %rax		; CHECK-NEXT: movq %rdi, %rax
; CHECK-NEXT: movl %edx, %ecx		; CHECK-NEXT: # kill: def $cl killed $cl killed $rcx
; CHECK-NEXT: andb $63, %cl		; CHECK-NEXT: shldq %cl, %rsi, %rax
; CHECK-NEXT: negb %cl
; CHECK-NEXT: shrq %cl, %rsi
; CHECK-NEXT: movl %edx, %ecx
; CHECK-NEXT: shlq %cl, %rax
; CHECK-NEXT: orq %rsi, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%tbits = trunc i64 %bits to i8		%tbits = trunc i64 %bits to i8
%tand = and i8 %tbits, 63		%tand = and i8 %tbits, 63
%tand64 = sub i8 64, %tand		%tand64 = sub i8 64, %tand
%and = zext i8 %tand to i64		%and = zext i8 %tand to i64
%and64 = zext i8 %tand64 to i64		%and64 = zext i8 %tand64 to i64
%sh_lo = lshr i64 %lo, %and64		%sh_lo = lshr i64 %lo, %and64
%sh_hi = shl i64 %hi, %and		%sh_hi = shl i64 %hi, %and
%sh = or i64 %sh_lo, %sh_hi		%sh = or i64 %sh_lo, %sh_hi
ret i64 %sh		ret i64 %sh
}		}

llvm/trunk/test/CodeGen/X86/shift-double.ll

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%sh_hi = lshr i32 %hi, %bits		%sh_hi = lshr i32 %hi, %bits
%sh = or i32 %sh_lo, %sh_hi		%sh = or i32 %sh_lo, %sh_hi
ret i32 %sh		ret i32 %sh
}		}

define i32 @test18(i32 %hi, i32 %lo, i32 %bits) nounwind {		define i32 @test18(i32 %hi, i32 %lo, i32 %bits) nounwind {
; X86-LABEL: test18:		; X86-LABEL: test18:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: pushl %esi		; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
		; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi		; X86-NEXT: shldl %cl, %edx, %eax
; X86-NEXT: movb {{[0-9]+}}(%esp), %dl
; X86-NEXT: movl %edx, %ecx
; X86-NEXT: andb $31, %cl
; X86-NEXT: negb %cl
; X86-NEXT: shrl %cl, %esi
; X86-NEXT: movl %edx, %ecx
; X86-NEXT: shll %cl, %eax
; X86-NEXT: orl %esi, %eax
; X86-NEXT: popl %esi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test18:		; X64-LABEL: test18:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %edi, %eax
; X64-NEXT: movl %edx, %ecx
; X64-NEXT: andb $31, %cl
; X64-NEXT: negb %cl
; X64-NEXT: shrl %cl, %esi
; X64-NEXT: movl %edx, %ecx		; X64-NEXT: movl %edx, %ecx
; X64-NEXT: shll %cl, %eax		; X64-NEXT: movl %edi, %eax
; X64-NEXT: orl %esi, %eax		; X64-NEXT: # kill: def $cl killed $cl killed $ecx
		; X64-NEXT: shldl %cl, %esi, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%tbits = trunc i32 %bits to i8		%tbits = trunc i32 %bits to i8
%tand = and i8 %tbits, 31		%tand = and i8 %tbits, 31
%tand64 = sub i8 32, %tand		%tand64 = sub i8 32, %tand
%and = zext i8 %tand to i32		%and = zext i8 %tand to i32
%and64 = zext i8 %tand64 to i32		%and64 = zext i8 %tand64 to i32
%sh_lo = lshr i32 %lo, %and64		%sh_lo = lshr i32 %lo, %and64
%sh_hi = shl i32 %hi, %and		%sh_hi = shl i32 %hi, %and
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improve use of SHLD/SHRDClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 189023

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/shift-double-x86_64.ll

llvm/trunk/test/CodeGen/X86/shift-double.ll

[X86] Improve use of SHLD/SHRD
ClosedPublic