This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
ClosedPublic

Authored by bjope on Sep 28 2016, 1:25 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
mkuper
bogner

Commits

rG12559441bd37: [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look…
rL283347: [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look…

Summary

Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from.

Diff Detail

Event Timeline

bjope updated this revision to Diff 72643.Sep 28 2016, 1:25 AM

bjope retitled this revision from to [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT..

bjope updated this object.

bjope added reviewers: bogner, mkuper.

bjope added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptSep 28 2016, 1:25 AM

Herald added subscribers: nhaehnle, jyknight. · View Herald Transcript

RKSimon added a subscriber: RKSimon.Sep 28 2016, 1:36 AM

mkuper added inline comments.Oct 2 2016, 1:42 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2460	Note that this isn't as good as it could be - computeKnownBits for a vector returns the "lowest common denominator" for all vector elements. If you know, statically, which element you're looking for, you could, in theory, do better. I don't see a clean way to do this right now, but it'd perhaps worth to have a TODO. (There's already one in the BUILD_VECTOR code... :-) )
2741	I'm a bit surprised that ComputeNumSignBits does the right thing for vectors. In fact, I have a creeping suspicion it doesn't. Do you have a test for that specifically?

RKSimon added inline comments.Oct 2 2016, 6:50 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2460	FYI I looked into this when I added the BUILD_VECTOR support but was put off by the size of the refactor necessary for a full per-element result. A simpler option I've considered was to add a demanded elts argument - it still wouldn't return a per-element result but at least would be only the "lowest common denominator" of the elements we actually care about. But anyway, just a TODO for this patch makes sense.
2741	ComputeNumSignBits is being used in a few places with vector types already, not sure how well its tested but the x86 SITOFP i64->FP to i32->FP transform is using it successfully.

Added missing TODO about possible future optimizations.

Herald edited edge metadata. · View Herald TranscriptOct 3 2016, 12:15 AM

Herald added a subscriber: wdng. · View Herald Transcript

bjope added inline comments.Oct 3 2016, 12:55 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2460	The TODO has been added. Generally I guess BUILD_VECTOR could be more complicated to handle. But for EXTRACT_VECTOR_ELT we could add one argument (in all the recursive calls) telling which element we are looking for. I've not really looked into how messy such a solution would be.
2741	As Simon says, this method is used with vector types already. So I suspected that it did support vector types, and that there would be tests implemented already to verify that functionality ;-) My patch only makes it possible for an "automatic" transition from scalar->vector type during the recursion. It does not change the fact that this method can be called for any SDValue. If you suspect that it is doing something wrong for vectors, could you be more specific? By the way, the test I added ( test/CodeGen/SPARC/vector-extract-elt.ll ) is using ComputeKnownSignBits. But I think that it is a little bit annoying to depend on a target specific optimization to test these common helper functions in SelectionDAG. So is anyone has a tip on how to test these things using a different approach, please let me know.

The reason for my skepticism w.r.t ComputeNumSignBits was that it didn't seem to explicitly handle BUILD_VECTOR (or VSELECT, vector_shuffle, etc) explicitly.
But I guess that just gets handled through computeKnownBits.

Regarding testing - unfortunately, I don't believe there's a better way to do this right now.

LGTM.

This revision is now accepted and ready to land.Oct 3 2016, 1:26 AM

@bjope - do you need someone to check this in for you?

In D25007#562330, @spatel wrote:

@bjope - do you need someone to check this in for you?

I really appreciate that you want to help out, but I think it would be nice to see that my team can commit something that has passed review ourselves.

There is currently only one person in my team that has commit permissions, and he has been busy with other things.
But the plan is to commit both this one and my similar change in ValueTracking ( https://reviews.llvm.org/D24955 ) tomorrow.

(I've also requested permission from @lattner to be able to do commits myself... we'll have to wait and see if I'm trusted...)

Closed by commit rL283347: [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look… (authored by bjope). · Explain WhyOct 5 2016, 10:49 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

26 lines

test/

CodeGen/

AMDGPU/

amdgpu.private-memory.ll

3 lines

SPARC/

vector-extract-elt.ll

19 lines

X86/

pr21792.ll

4 lines

Diff 72643

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,442 Lines • ▼ Show 20 Lines	case ISD::EXTRACT_ELEMENT: {
KnownZero = KnownZero.getHiBits(KnownZero.getBitWidth() - Index * BitWidth);		KnownZero = KnownZero.getHiBits(KnownZero.getBitWidth() - Index * BitWidth);
KnownOne = KnownOne.getHiBits(KnownOne.getBitWidth() - Index * BitWidth);		KnownOne = KnownOne.getHiBits(KnownOne.getBitWidth() - Index * BitWidth);

// Remove high part of known bit mask		// Remove high part of known bit mask
KnownZero = KnownZero.trunc(BitWidth);		KnownZero = KnownZero.trunc(BitWidth);
KnownOne = KnownOne.trunc(BitWidth);		KnownOne = KnownOne.trunc(BitWidth);
break;		break;
}		}
		case ISD::EXTRACT_VECTOR_ELT: {
		const unsigned BitWidth = Op.getValueSizeInBits();
		const unsigned EltBitWidth = Op.getOperand(0).getScalarValueSizeInBits();
		// If BitWidth > EltBitWidth the value is anyext:ed. So we do not know
		// anything about the extended bits.
		if (BitWidth > EltBitWidth) {
		KnownZero = KnownZero.trunc(EltBitWidth);
		KnownOne = KnownOne.trunc(EltBitWidth);
		}
		computeKnownBits(Op.getOperand(0), KnownZero, KnownOne, Depth+1);
		mkuperUnsubmitted Not Done Reply Inline Actions Note that this isn't as good as it could be - computeKnownBits for a vector returns the "lowest common denominator" for all vector elements. If you know, statically, which element you're looking for, you could, in theory, do better. I don't see a clean way to do this right now, but it'd perhaps worth to have a TODO. (There's already one in the BUILD_VECTOR code... :-) ) mkuper: Note that this isn't as good as it could be - computeKnownBits for a vector returns the "lowest…
		RKSimonUnsubmitted Not Done Reply Inline Actions FYI I looked into this when I added the BUILD_VECTOR support but was put off by the size of the refactor necessary for a full per-element result. A simpler option I've considered was to add a demanded elts argument - it still wouldn't return a per-element result but at least would be only the "lowest common denominator" of the elements we actually care about. But anyway, just a TODO for this patch makes sense. RKSimon: FYI I looked into this when I added the BUILD_VECTOR support but was put off by the size of the…
		bjopeAuthorUnsubmitted Not Done Reply Inline Actions The TODO has been added. Generally I guess BUILD_VECTOR could be more complicated to handle. But for EXTRACT_VECTOR_ELT we could add one argument (in all the recursive calls) telling which element we are looking for. I've not really looked into how messy such a solution would be. bjope: The TODO has been added. Generally I guess BUILD_VECTOR could be more complicated to handle.
		if (BitWidth > EltBitWidth) {
		KnownZero = KnownZero.zext(BitWidth);
		KnownOne = KnownOne.zext(BitWidth);
		}
		break;
		}
case ISD::BSWAP: {		case ISD::BSWAP: {
computeKnownBits(Op.getOperand(0), KnownZero2, KnownOne2, Depth+1);		computeKnownBits(Op.getOperand(0), KnownZero2, KnownOne2, Depth+1);
KnownZero = KnownZero2.byteSwap();		KnownZero = KnownZero2.byteSwap();
KnownOne = KnownOne2.byteSwap();		KnownOne = KnownOne2.byteSwap();
break;		break;
}		}
case ISD::SMIN:		case ISD::SMIN:
case ISD::SMAX:		case ISD::SMAX:
▲ Show 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	case ISD::EXTRACT_ELEMENT: {
// little end. Sign starts at big end.		// little end. Sign starts at big end.
const int rIndex = Items - 1 -		const int rIndex = Items - 1 -
cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();		cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();

// If the sign portion ends in our element the subtraction gives correct		// If the sign portion ends in our element the subtraction gives correct
// result. Otherwise it gives either negative or > bitwidth result		// result. Otherwise it gives either negative or > bitwidth result
return std::max(std::min(KnownSign - rIndex * BitWidth, BitWidth), 0);		return std::max(std::min(KnownSign - rIndex * BitWidth, BitWidth), 0);
}		}
		case ISD::EXTRACT_VECTOR_ELT: {
		const unsigned BitWidth = Op.getValueSizeInBits();
		const unsigned EltBitWidth = Op.getOperand(0).getScalarValueSizeInBits();
		// If BitWidth > EltBitWidth the value is anyext:ed, and we do not know
		// anything about sign bits. But if the sizes match we can derive knowledge
		// about sign bits from the vector operand.
		if (BitWidth == EltBitWidth)
		return ComputeNumSignBits(Op.getOperand(0), Depth+1);
		mkuperUnsubmitted Not Done Reply Inline Actions I'm a bit surprised that ComputeNumSignBits does the right thing for vectors. In fact, I have a creeping suspicion it doesn't. Do you have a test for that specifically? mkuper: I'm a bit surprised that ComputeNumSignBits does the right thing for vectors. In fact, I have a…
		RKSimonUnsubmitted Not Done Reply Inline Actions ComputeNumSignBits is being used in a few places with vector types already, not sure how well its tested but the x86 SITOFP i64->FP to i32->FP transform is using it successfully. RKSimon: ComputeNumSignBits is being used in a few places with vector types already, not sure how well…
		bjopeAuthorUnsubmitted Not Done Reply Inline Actions As Simon says, this method is used with vector types already. So I suspected that it did support vector types, and that there would be tests implemented already to verify that functionality ;-) My patch only makes it possible for an "automatic" transition from scalar->vector type during the recursion. It does not change the fact that this method can be called for any SDValue. If you suspect that it is doing something wrong for vectors, could you be more specific? By the way, the test I added ( test/CodeGen/SPARC/vector-extract-elt.ll ) is using ComputeKnownSignBits. But I think that it is a little bit annoying to depend on a target specific optimization to test these common helper functions in SelectionDAG. So is anyone has a tip on how to test these things using a different approach, please let me know. bjope: As Simon says, this method is used with vector types already. So I suspected that it did…
		break;
		}
}		}

// If we are looking at the loaded value of the SDNode.		// If we are looking at the loaded value of the SDNode.
if (Op.getResNo() == 0) {		if (Op.getResNo() == 0) {
// Handle LOADX separately here. EXTLOAD case will fallthrough.		// Handle LOADX separately here. EXTLOAD case will fallthrough.
if (LoadSDNode *LD = dyn_cast<LoadSDNode>(Op)) {		if (LoadSDNode *LD = dyn_cast<LoadSDNode>(Op)) {
unsigned ExtType = LD->getExtensionType();		unsigned ExtType = LD->getExtensionType();
switch (ExtType) {		switch (ExtType) {
▲ Show 20 Lines • Show All 4,586 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/amdgpu.private-memory.ll

	Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines
	}			}

	; FUNC-LABEL: {{^}}short_array:			; FUNC-LABEL: {{^}}short_array:

	; R600: MOVA_INT			; R600: MOVA_INT

	; SI-PROMOTE-DAG: buffer_store_short v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ; encoding: [0x00,0x10,0x68,0xe0			; SI-PROMOTE-DAG: buffer_store_short v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ; encoding: [0x00,0x10,0x68,0xe0
	; SI-PROMOTE-DAG: buffer_store_short v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen offset:2 ; encoding: [0x02,0x10,0x68,0xe0			; SI-PROMOTE-DAG: buffer_store_short v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen offset:2 ; encoding: [0x02,0x10,0x68,0xe0
	; SI-PROMOTE: buffer_load_sshort v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}}			; Loaded value is 0 or 1, so sext will become zext, so we get buffer_load_ushort instead of buffer_load_sshort.
				; SI-PROMOTE: buffer_load_ushort v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}}
	define void @short_array(i32 addrspace(1)* %out, i32 %index) #0 {			define void @short_array(i32 addrspace(1)* %out, i32 %index) #0 {
	entry:			entry:
	%0 = alloca [2 x i16]			%0 = alloca [2 x i16]
	%1 = getelementptr inbounds [2 x i16], [2 x i16]* %0, i32 0, i32 0			%1 = getelementptr inbounds [2 x i16], [2 x i16]* %0, i32 0, i32 0
	%2 = getelementptr inbounds [2 x i16], [2 x i16]* %0, i32 0, i32 1			%2 = getelementptr inbounds [2 x i16], [2 x i16]* %0, i32 0, i32 1
	store i16 0, i16* %1			store i16 0, i16* %1
	store i16 1, i16* %2			store i16 1, i16* %2
	%3 = getelementptr inbounds [2 x i16], [2 x i16]* %0, i32 0, i32 %index			%3 = getelementptr inbounds [2 x i16], [2 x i16]* %0, i32 0, i32 %index
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

test/CodeGen/SPARC/vector-extract-elt.ll

This file was added.

				; RUN: llc -march=sparc < %s \| FileCheck %s


				; If computeKnownBits/computeKnownSignBits (in SelectionDAG) can do a simple
				; look-thru for extractelement then we we know that the add will yield a
				; non-negative result.
				define i1 @test1(<4 x i16>* %in) {
				; CHECK-LABEL: ! BB#0:
				; CHECK-NEXT: retl
				; CHECK-NEXT: sethi 0, %o0
				%vec2 = load <4 x i16>, <4 x i16>* %in, align 1
				%vec3 = lshr <4 x i16> %vec2, <i16 2, i16 2, i16 2, i16 2>
				%vec4 = sext <4 x i16> %vec3 to <4 x i32>
				%elt0 = extractelement <4 x i32> %vec4, i32 0
				%elt1 = extractelement <4 x i32> %vec4, i32 1
				%sum = add i32 %elt0, %elt1
				%bool = icmp slt i32 %sum, 0
				ret i1 %bool
				}

test/CodeGen/X86/pr21792.ll

Show All 28 Lines	entry:
%tmp16 = bitcast i8* %add.ptr46 to double*		%tmp16 = bitcast i8* %add.ptr46 to double*
%add.ptr51 = getelementptr inbounds i8, i8* bitcast (double* getelementptr inbounds ([256 x double], [256 x double]* @stuff, i64 0, i64 1) to i8*), i64 %idx.ext5		%add.ptr51 = getelementptr inbounds i8, i8* bitcast (double* getelementptr inbounds ([256 x double], [256 x double]* @stuff, i64 0, i64 1) to i8*), i64 %idx.ext5
%tmp17 = bitcast i8* %add.ptr51 to double*		%tmp17 = bitcast i8* %add.ptr51 to double*
call void @toto(double* %tmp4, double* %tmp5, double* %tmp6, double* %tmp7, double* %tmp16, double* %tmp17)		call void @toto(double* %tmp4, double* %tmp5, double* %tmp6, double* %tmp7, double* %tmp16, double* %tmp17)
ret void		ret void
; CHECK-LABEL: func:		; CHECK-LABEL: func:
; CHECK: pextrq $1, %xmm0,		; CHECK: pextrq $1, %xmm0,
; CHECK-NEXT: movd %xmm0, %r[[AX:..]]		; CHECK-NEXT: movd %xmm0, %r[[AX:..]]
; CHECK-NEXT: movslq %e[[AX]],		; CHECK-NEXT: movq %r[[AX]],
; CHECK-NEXT: sarq $32, %r[[AX]]		; CHECK-NEXT: shrq $32, %r9
}		}

declare void @toto(double, double, double, double, double, double)		declare void @toto(double, double, double, double, double, double)