This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
LegalizeIntegerTypes.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
shift-i256.ll

Differential D4978

Fix generic shift expansion when shift amount is 0
ClosedPublic

Authored by loladiro on Aug 19 2014, 6:22 PM.

Download Raw Diff

Details

Reviewers

chfast
resistor

Commits

rG57c2f7c75648: Fix generic shift expansion when shift amount is 0
rL235370: Fix generic shift expansion when shift amount is 0

Summary

This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.

This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.

Diff Detail

Repository: rL LLVM

Event Timeline

loladiro updated this revision to Diff 12687.Aug 19 2014, 6:22 PM

loladiro retitled this revision from to Fix generic shift expansion when shift amount is 0.

loladiro updated this object.

loladiro edited the test plan for this revision. (Show Details)

loladiro set the repository for this revision to rL LLVM.

loladiro added a subscriber: Unknown Object (MLST).

bump

Bump. Any thoughts on whether this is the correct way to do this or a different approach would be better?

chfast added a subscriber: chfast.Oct 2 2014, 8:21 AM

Do you know why shift by 0 produces wrong results?

Yes, see the linked bug report:

If the shift amount is 0 `shrdl	%cl, %esi, %ebx` acts as `or` (since it's doing shift mod 32), but LLVM seems to assume that it's shifting it out to the left.

You are right, that fix works for my case too.

In original code, e.g. when lowering i128 to 2 x i64, and shift amount is 0, one of the words needs to be zerod with shift by 64. That will not work as X86 architecture would execute it as shift by (64 mod 64 = 0).

I have an idea for improvement: if shift amount is zero there is no need to compute anything more. The select that takes isZero into account can be placed up front.

Can I be an reviewer for that change?

chfast added a reviewer: chfast.Mar 3 2015, 10:26 AM

chfast accepted this revision.Mar 3 2015, 10:28 AM

chfast edited edge metadata.

This revision is now accepted and ready to land.Mar 3 2015, 10:28 AM

LGTM

I'd like to get approval from the SelectionDAG code owner, which seems to be @resistor?

Shallow, drop-by comment: does it make sense to generate a DAG that can be combined into a BEXTR when possible?

chfast mentioned this in D7752: Big shift test.Mar 10 2015, 2:08 AM

In D4978#136194, @sanjoy wrote:

Shallow, drop-by comment: does it make sense to generate a DAG that can be combined into a BEXTR when possible?

Can you explain more?

In D4978#140508, @chfast wrote:

In D4978#136194, @sanjoy wrote:

Shallow, drop-by comment: does it make sense to generate a DAG that can be combined into a BEXTR when possible?

Can you explain more?

For a left shift, I think HiResult can be HiInput << ShiftAmt | BEXTR LoInput, Start = (WordSize - ShiftAmt), Len = ShiftAmt. This will do the right thing for ShiftAmt = 0.

I think many x86 CPUs do not support BEXTR so either this will have be a target dependent thing, or have to be a pattern that the DAG combiner will fold into a BEXTR.

In any case, this is very minor.

In D4978#140660, @sanjoy wrote:

For a left shift, I think HiResult can be HiInput << ShiftAmt | BEXTR LoInput, Start = (WordSize - ShiftAmt), Len = ShiftAmt. This will do the right thing for ShiftAmt = 0.

I think many x86 CPUs do not support BEXTR so either this will have be a target dependent thing, or have to be a pattern that the DAG combiner will fold into a BEXTR.

In any case, this is very minor.

BEXTR would be useful only if shift amount is constant. I think this case is opposite.

Ping

LGTM.

As the patch is accepted, can I commit it?

Yes, but I'd really like to include the test you wrote, which is why I haven't committed it yet.

I asked about that test on LLVMdev today. There are opinions that it should check assembly generated instead of runtime results. I think we can check for "select" node in the assembly but the test will be implementation and target specific. What do you think? I can send you a test if you agree.

I think it's fine having this as a codegen test that checks that the generated code on, say X86 is correct.

Closed by commit rL235370: Fix generic shift expansion when shift amount is 0 (authored by chfast). · Explain WhyApr 20 2015, 11:31 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

LegalizeIntegerTypes.cpp

16 lines

test/

CodeGen/

X86/

shift-i256.ll

18 lines

Diff 24099

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 1,541 Lines • ▼ Show 20 Lines	ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi) {
SDValue InL, InH;		SDValue InL, InH;
GetExpandedInteger(N->getOperand(0), InL, InH);		GetExpandedInteger(N->getOperand(0), InL, InH);

SDValue NVBitsNode = DAG.getConstant(NVTBits, ShTy);		SDValue NVBitsNode = DAG.getConstant(NVTBits, ShTy);
SDValue AmtExcess = DAG.getNode(ISD::SUB, dl, ShTy, Amt, NVBitsNode);		SDValue AmtExcess = DAG.getNode(ISD::SUB, dl, ShTy, Amt, NVBitsNode);
SDValue AmtLack = DAG.getNode(ISD::SUB, dl, ShTy, NVBitsNode, Amt);		SDValue AmtLack = DAG.getNode(ISD::SUB, dl, ShTy, NVBitsNode, Amt);
SDValue isShort = DAG.getSetCC(dl, getSetCCResultType(ShTy),		SDValue isShort = DAG.getSetCC(dl, getSetCCResultType(ShTy),
Amt, NVBitsNode, ISD::SETULT);		Amt, NVBitsNode, ISD::SETULT);
		SDValue isZero = DAG.getSetCC(dl, getSetCCResultType(ShTy),
		Amt, DAG.getConstant(0, ShTy),
		ISD::SETEQ);

SDValue LoS, HiS, LoL, HiL;		SDValue LoS, HiS, LoL, HiL;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default: llvm_unreachable("Unknown shift");		default: llvm_unreachable("Unknown shift");
case ISD::SHL:		case ISD::SHL:
// Short: ShAmt < NVTBits		// Short: ShAmt < NVTBits
LoS = DAG.getNode(ISD::SHL, dl, NVT, InL, Amt);		LoS = DAG.getNode(ISD::SHL, dl, NVT, InL, Amt);
HiS = DAG.getNode(ISD::OR, dl, NVT,		HiS = DAG.getNode(ISD::OR, dl, NVT,
DAG.getNode(ISD::SHL, dl, NVT, InH, Amt),		DAG.getNode(ISD::SHL, dl, NVT, InH, Amt),
// FIXME: If Amt is zero, the following shift generates an undefined result
// on some architectures.
DAG.getNode(ISD::SRL, dl, NVT, InL, AmtLack));		DAG.getNode(ISD::SRL, dl, NVT, InL, AmtLack));

// Long: ShAmt >= NVTBits		// Long: ShAmt >= NVTBits
LoL = DAG.getConstant(0, NVT); // Lo part is zero.		LoL = DAG.getConstant(0, NVT); // Lo part is zero.
HiL = DAG.getNode(ISD::SHL, dl, NVT, InL, AmtExcess); // Hi from Lo part.		HiL = DAG.getNode(ISD::SHL, dl, NVT, InL, AmtExcess); // Hi from Lo part.

Lo = DAG.getSelect(dl, NVT, isShort, LoS, LoL);		Lo = DAG.getSelect(dl, NVT, isShort, LoS, LoL);
Hi = DAG.getSelect(dl, NVT, isShort, HiS, HiL);		Hi = DAG.getSelect(dl, NVT, isZero, InH,
		DAG.getSelect(dl, NVT, isShort, HiS, HiL));
return true;		return true;
case ISD::SRL:		case ISD::SRL:
// Short: ShAmt < NVTBits		// Short: ShAmt < NVTBits
HiS = DAG.getNode(ISD::SRL, dl, NVT, InH, Amt);		HiS = DAG.getNode(ISD::SRL, dl, NVT, InH, Amt);
LoS = DAG.getNode(ISD::OR, dl, NVT,		LoS = DAG.getNode(ISD::OR, dl, NVT,
DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),		DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),
// FIXME: If Amt is zero, the following shift generates an undefined result		// FIXME: If Amt is zero, the following shift generates an undefined result
// on some architectures.		// on some architectures.
DAG.getNode(ISD::SHL, dl, NVT, InH, AmtLack));		DAG.getNode(ISD::SHL, dl, NVT, InH, AmtLack));

// Long: ShAmt >= NVTBits		// Long: ShAmt >= NVTBits
HiL = DAG.getConstant(0, NVT); // Hi part is zero.		HiL = DAG.getConstant(0, NVT); // Hi part is zero.
LoL = DAG.getNode(ISD::SRL, dl, NVT, InH, AmtExcess); // Lo from Hi part.		LoL = DAG.getNode(ISD::SRL, dl, NVT, InH, AmtExcess); // Lo from Hi part.

Lo = DAG.getSelect(dl, NVT, isShort, LoS, LoL);		Lo = DAG.getSelect(dl, NVT, isZero, InL,
		DAG.getSelect(dl, NVT, isShort, LoS, LoL));
Hi = DAG.getSelect(dl, NVT, isShort, HiS, HiL);		Hi = DAG.getSelect(dl, NVT, isShort, HiS, HiL);
return true;		return true;
case ISD::SRA:		case ISD::SRA:
// Short: ShAmt < NVTBits		// Short: ShAmt < NVTBits
HiS = DAG.getNode(ISD::SRA, dl, NVT, InH, Amt);		HiS = DAG.getNode(ISD::SRA, dl, NVT, InH, Amt);
LoS = DAG.getNode(ISD::OR, dl, NVT,		LoS = DAG.getNode(ISD::OR, dl, NVT,
DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),		DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),
// FIXME: If Amt is zero, the following shift generates an undefined result
// on some architectures.
DAG.getNode(ISD::SHL, dl, NVT, InH, AmtLack));		DAG.getNode(ISD::SHL, dl, NVT, InH, AmtLack));

// Long: ShAmt >= NVTBits		// Long: ShAmt >= NVTBits
HiL = DAG.getNode(ISD::SRA, dl, NVT, InH, // Sign of Hi part.		HiL = DAG.getNode(ISD::SRA, dl, NVT, InH, // Sign of Hi part.
DAG.getConstant(NVTBits-1, ShTy));		DAG.getConstant(NVTBits-1, ShTy));
LoL = DAG.getNode(ISD::SRA, dl, NVT, InH, AmtExcess); // Lo from Hi part.		LoL = DAG.getNode(ISD::SRA, dl, NVT, InH, AmtExcess); // Lo from Hi part.

Lo = DAG.getSelect(dl, NVT, isShort, LoS, LoL);		Lo = DAG.getSelect(dl, NVT, isZero, InL,
		DAG.getSelect(dl, NVT, isShort, LoS, LoL));
Hi = DAG.getSelect(dl, NVT, isShort, HiS, HiL);		Hi = DAG.getSelect(dl, NVT, isShort, HiS, HiL);
return true;		return true;
}		}
}		}

void DAGTypeLegalizer::ExpandIntRes_ADDSUB(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_ADDSUB(SDNode *N,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
SDLoc dl(N);		SDLoc dl(N);
▲ Show 20 Lines • Show All 1,492 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/shift-i256.ll

	; RUN: llc < %s -march=x86			; RUN: llc < %s -march=x86 \| FileCheck %s
	; RUN: llc < %s -march=x86-64			; RUN: llc < %s -march=x86-64 -O0 \| FileCheck %s -check-prefix=CHECK-X64
				; RUN: llc < %s -march=x86-64 -O2 \| FileCheck %s -check-prefix=CHECK-X64

	define void @t(i256 %x, i256 %a, i256* nocapture %r) nounwind readnone {			; CHECK-LABEL: shift1
				define void @shift1(i256 %x, i256 %a, i256* nocapture %r) nounwind readnone {
	entry:			entry:
	%0 = ashr i256 %x, %a			%0 = ashr i256 %x, %a
	store i256 %0, i256* %r			store i256 %0, i256* %r
	ret void			ret void
	}			}

				; CHECK-LABEL: shift2
				define i256 @shift2(i256 %c) nounwind
				{
				%b = shl i256 1, %c ; %c must not be a constant
				; Special case when %c is 0:
				; CHECK-X64: testb [[REG:%r[0-9]+b]], [[REG]]
				; CHECK-X64: cmoveq
				ret i256 %b
				}