This is an archive of the discontinued LLVM Phabricator instance.

Do not assume that FP vector operands are never legalized by expanding
ClosedPublic

Authored by nemanjai on Oct 17 2016, 7:49 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
bogner
kbarton
amehsan
hfinkel

Summary

This patch ensures that if a floating point vector operand is legalized by expanding, it is legalized through the stack rather than by calling DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand is a non-integer type.

This fixes PR 30715.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 74847.Oct 17 2016, 7:49 AM

nemanjai retitled this revision from to Do not assume that FP vector operands are never legalized by expanding.

nemanjai updated this object.

nemanjai added reviewers: • tstellarAMD, hfinkel, kbarton, amehsan.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

Herald added a subscriber: wdng. · View Herald TranscriptOct 17 2016, 7:49 AM

amehsan added inline comments.Oct 17 2016, 9:34 AM

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
337–340	So, for cases that your new condition is not satisfied, what change do we make in the generated code? (The question applies to other platforms as well). I believe there is no testcase in unit-tests for this, (otherwise it would have failed and would have been included in your changes). Still this may happen in benchmarks and real work loads. I think you may need to investigate how that code pattern changes and what is the potential performance impact of your change for that code pattern.
test/CodeGen/PowerPC/pr30715.ll
8	Please remove mangled name from the test file.

amehsan added inline comments.Oct 17 2016, 9:39 AM

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
337–340	I am asking because this function is called from places like DAGTypeLegalizer::ExpandFloatOperand. My question is the fact that you get a failure for floating point case, is applicable to all platforms or not?
test/CodeGen/PowerPC/pr30715.ll
8	Also limit the lines to 80 characters

nemanjai added inline comments.Oct 17 2016, 4:17 PM

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
337–340	If you expand the context in the diff a bit, you'll see where the assert that we trip on is located. Namely, line 321. If this function was ever called with a node whose result is a vector and first operand was a floating point type, that assert would have tripped. All this patch changes is that rather than ignoring the possibility that the operand is a non-integer type and then failing at the assert, we check for this and don't call IntegerToVector. So given that this conditional block could not possibly succeed if the result is a vector and the operand is non-integer, I don't see how we can investigate the performance impact of this change. Namely, cases where we would enter this block before and we don't now are exactly those that would not result in a successful compilation before. So the performance difference between the code we produce now and the code we would produce without this patch cannot be studied. And I think this makes sense because for example, we can easily expand something like: tX; v2i64 = bitcast tY tY: i128 = ... by splitting the i128 value and passing the two i64 values to a BUILD_VECTOR node. However, doing something similar to the equivalent FP node would require adding some machinery that I'm not sure would be worth the effort. Namely, handling this: tX: v2f64 = bitcast tY t81: ppcf128 = ... By splitting and using BUILD_VECTOR vs. through the stack is not something that is likely to come up frequently enough in real code that it would warrant the additional machinery. And of course, doing the same thing for f128 would be significantly more tricky (I don't know enough about IEEE binary-128 to know if it's even possible). So I believe that going through the stack here is the right thing to do. Finally, even if we find some code where this happens a lot for ppcf128, I am still not convinced that we would gain much by avoiding the stack since the actual operations on the type are actually done in software, so it is unlikely that this code is expecting particularly high performance. But that's just my opinion.
test/CodeGen/PowerPC/pr30715.ll
8	OK, I'll remove the mangled name. But I don't think we generally limit the IR in test cases to 80 characters - it's whatever Clang produces for the original test case.

amehsan added inline comments.Oct 17 2016, 4:22 PM

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
337–340	Yes, if the assert is inside this conditional this is fine. I did look at the code, but it was not quite obvious, or maybe I missed it.

nemanjai added a reviewer: bogner.Oct 18 2016, 7:12 AM

RKSimon added a subscriber: RKSimon.Oct 18 2016, 9:18 AM

LGTM
However, I'm not familiar with this logic here so someone else should take a look also to verify this change is OK. It basically assumes that the ExpandOp_BITCAST should be called on a vector of integers; if it is a vector of any other type there there will be no effect. @hfinkel or @tstellarAMD, would you be able to take a quick look? Thanks.

This revision is now accepted and ready to land.Oct 20 2016, 12:36 PM

As far as I can tell, ExpandOp_BITCAST can be called with two kinds of nodes: a bitcast where the operand is a large illegal integer type (DAGTypeLegalizer::ExpandIntegerOperand), and a bitcast where the operand is of type ppc_fp128 (DAGTypeLegalizer::ExpandFloatOperand; the name is slightly misleading, but it is in fact only used for ppc_fp128). So this should have no effect anywhere outside of PPC targets.

Given that, it's probably a good idea to reorganize the code so it's less likely someone will break ppc_fp128 in the future... but the bugfix is fine.

Committed revision 285231.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeTypesGeneric.cpp

3 lines

test/

CodeGen/

PowerPC/

pr30715.ll

74 lines

Diff 74847

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	if (DAG.getDataLayout().isBigEndian())
std::swap(Parts[0], Parts[1]);		std::swap(Parts[0], Parts[1]);
IntegerToVector(Parts[0], NumElements, Ops, EltVT);		IntegerToVector(Parts[0], NumElements, Ops, EltVT);
IntegerToVector(Parts[1], NumElements, Ops, EltVT);		IntegerToVector(Parts[1], NumElements, Ops, EltVT);
} else {		} else {
Ops.push_back(DAG.getNode(ISD::BITCAST, DL, EltVT, Op));		Ops.push_back(DAG.getNode(ISD::BITCAST, DL, EltVT, Op));
}		}
}		}

SDValue DAGTypeLegalizer::ExpandOp_BITCAST(SDNode *N) {		SDValue DAGTypeLegalizer::ExpandOp_BITCAST(SDNode *N) {
SDLoc dl(N);		SDLoc dl(N);
if (N->getValueType(0).isVector()) {		if (N->getValueType(0).isVector() &&
		N->getOperand(0).getValueType().isInteger()) {
		amehsanUnsubmitted Not Done Reply Inline Actions So, for cases that your new condition is not satisfied, what change do we make in the generated code? (The question applies to other platforms as well). I believe there is no testcase in unit-tests for this, (otherwise it would have failed and would have been included in your changes). Still this may happen in benchmarks and real work loads. I think you may need to investigate how that code pattern changes and what is the potential performance impact of your change for that code pattern. amehsan: So, for cases that your new condition is not satisfied, what change do we make in the generated…
		amehsanUnsubmitted Not Done Reply Inline Actions I am asking because this function is called from places like DAGTypeLegalizer::ExpandFloatOperand. My question is the fact that you get a failure for floating point case, is applicable to all platforms or not? amehsan: I am asking because this function is called from places like DAGTypeLegalizer…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions If you expand the context in the diff a bit, you'll see where the assert that we trip on is located. Namely, line 321. If this function was ever called with a node whose result is a vector and first operand was a floating point type, that assert would have tripped. All this patch changes is that rather than ignoring the possibility that the operand is a non-integer type and then failing at the assert, we check for this and don't call IntegerToVector. So given that this conditional block could not possibly succeed if the result is a vector and the operand is non-integer, I don't see how we can investigate the performance impact of this change. Namely, cases where we would enter this block before and we don't now are exactly those that would not result in a successful compilation before. So the performance difference between the code we produce now and the code we would produce without this patch cannot be studied. And I think this makes sense because for example, we can easily expand something like: tX; v2i64 = bitcast tY tY: i128 = ... by splitting the i128 value and passing the two i64 values to a BUILD_VECTOR node. However, doing something similar to the equivalent FP node would require adding some machinery that I'm not sure would be worth the effort. Namely, handling this: tX: v2f64 = bitcast tY t81: ppcf128 = ... By splitting and using BUILD_VECTOR vs. through the stack is not something that is likely to come up frequently enough in real code that it would warrant the additional machinery. And of course, doing the same thing for f128 would be significantly more tricky (I don't know enough about IEEE binary-128 to know if it's even possible). So I believe that going through the stack here is the right thing to do. Finally, even if we find some code where this happens a lot for ppcf128, I am still not convinced that we would gain much by avoiding the stack since the actual operations on the type are actually done in software, so it is unlikely that this code is expecting particularly high performance. But that's just my opinion. nemanjai: If you expand the context in the diff a bit, you'll see where the assert that we trip on is…
		amehsanUnsubmitted Not Done Reply Inline Actions Yes, if the assert is inside this conditional this is fine. I did look at the code, but it was not quite obvious, or maybe I missed it. amehsan: Yes, if the assert is inside this conditional this is fine. I did look at the code, but it was…
// An illegal expanding type is being converted to a legal vector type.		// An illegal expanding type is being converted to a legal vector type.
// Make a two element vector out of the expanded parts and convert that		// Make a two element vector out of the expanded parts and convert that
// instead, but only if the new vector type is legal (otherwise there		// instead, but only if the new vector type is legal (otherwise there
// is no point, and it might create expansion loops). For example, on		// is no point, and it might create expansion loops). For example, on
// x86 this turns v1i64 = BITCAST i64 into v1i64 = BITCAST v2i32.		// x86 this turns v1i64 = BITCAST i64 into v1i64 = BITCAST v2i32.
//		//
// FIXME: I'm not sure why we are first trying to split the input into		// FIXME: I'm not sure why we are first trying to split the input into
// a 2 element vector, so I'm leaving it here to maintain the current		// a 2 element vector, so I'm leaving it here to maintain the current
▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/pr30715.ll

				; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s

				%class.FullMatrix = type { i8 }
				%class.Vector = type { float* }

				$_ZNK10FullMatrixIgE5m_fn2IfEEvR6VectorIT_ERKS4_b = comdat any

				define weak_odr void @_ZNK10FullMatrixIgE5m_fn2IfEEvR6VectorIT_ERKS4_b(%class.FullMatrix* %this, %class.Vector* dereferenceable(8) %p1, %class.Vector* dereferenceable(8), i1 zeroext) {
				amehsanUnsubmitted Not Done Reply Inline Actions Please remove mangled name from the test file. amehsan: Please remove mangled name from the test file.
				amehsanUnsubmitted Not Done Reply Inline Actions Also limit the lines to 80 characters amehsan: Also limit the lines to 80 characters
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions OK, I'll remove the mangled name. But I don't think we generally limit the IR in test cases to 80 characters - it's whatever Clang produces for the original test case. nemanjai: OK, I'll remove the mangled name. But I don't think we generally limit the IR in test cases to…
				entry:
				%call = tail call signext i32 @_ZNK10FullMatrixIgE5m_fn1Ev(%class.FullMatrix* %this)
				%cmp10 = icmp sgt i32 %call, 0
				br i1 %cmp10, label %for.body.lr.ph, label %for.cond.cleanup

				for.body.lr.ph: ; preds = %entry
				%val.i = getelementptr inbounds %class.Vector, %class.Vector* %p1, i64 0, i32 0
				%2 = load float, float* %val.i, align 8
				%wide.trip.count = zext i32 %call to i64
				%min.iters.check = icmp ult i32 %call, 4
				br i1 %min.iters.check, label %for.body.preheader, label %min.iters.checked

				for.body.preheader: ; preds = %middle.block, %min.iters.checked, %for.body.lr.ph
				%indvars.iv.ph = phi i64 [ 0, %min.iters.checked ], [ 0, %for.body.lr.ph ], [ %n.vec, %middle.block ]
				br label %for.body

				min.iters.checked: ; preds = %for.body.lr.ph
				%3 = and i32 %call, 3
				%n.mod.vf = zext i32 %3 to i64
				%n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf
				%cmp.zero = icmp eq i64 %n.vec, 0
				br i1 %cmp.zero, label %for.body.preheader, label %vector.body.preheader

				vector.body.preheader: ; preds = %min.iters.checked
				br label %vector.body

				vector.body: ; preds = %vector.body.preheader, %vector.body
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ]
				%4 = getelementptr inbounds float, float* %2, i64 %index
				%5 = bitcast float* %4 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %5, align 4
				%6 = fpext <4 x float> %wide.load to <4 x ppc_fp128>
				%7 = fadd <4 x ppc_fp128> %6, undef
				%8 = fptrunc <4 x ppc_fp128> %7 to <4 x float>
				%9 = bitcast float* %4 to <4 x float>*
				store <4 x float> %8, <4 x float>* %9, align 4
				%index.next = add i64 %index, 4
				%10 = icmp eq i64 %index.next, %n.vec
				br i1 %10, label %middle.block, label %vector.body

				middle.block: ; preds = %vector.body
				%cmp.n = icmp eq i32 %3, 0
				br i1 %cmp.n, label %for.cond.cleanup, label %for.body.preheader

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %middle.block, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %indvars.iv.ph, %for.body.preheader ]
				%arrayidx.i = getelementptr inbounds float, float* %2, i64 %indvars.iv
				%11 = load float, float* %arrayidx.i, align 4
				%conv = fpext float %11 to ppc_fp128
				%add = fadd ppc_fp128 %conv, undef
				%conv4 = fptrunc ppc_fp128 %add to float
				store float %conv4, float* %arrayidx.i, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
				; CHECK: stxsdx
				; CHECK: lxvd2x
				}

				declare signext i32 @_ZNK10FullMatrixIgE5m_fn1Ev(%class.FullMatrix*) local_unnamed_addr #1