Download Raw Diff

Details

Reviewers

srhines
t.p.northover
pirama
fhahn
olista01
arsenm

Commits

rGd68fa1be57ae: [SelectionDAG] Fixed f16-from-vector promotion problem
rL322120: [SelectionDAG] Fixed f16-from-vector promotion problem

Summary

In the case of an fp_extend of v1f16 to v1f32 where the v1f16 is the
result of a bitcast from i16, avoid creating an illegal fp16_to_fp where
the input is not a vector and the result is a v1f32.

Diff Detail

Build Status

Buildable 13209
Build 13209: arc lint + arc unit

Event Timeline

tpr created this revision.Dec 12 2017, 12:07 PM

Harbormaster completed remote builds in B13030: Diff 126599.Dec 12 2017, 12:07 PM

Herald added a subscriber: nhaehnle. · View Herald TranscriptDec 12 2017, 12:07 PM

tpr added reviewers: pirama, fhahn, olista01.Dec 12 2017, 12:09 PM

arsenm added inline comments.Dec 12 2017, 12:33 PM

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
1977 ↗	(On Diff #126599)	I don't think here should be encountering a v1* anything. i.e. getTypeAction(v1f16) should not be a TypePromoteFloat?
1979 ↗	(On Diff #126599)	Shouldn't this be v1f16?
test/CodeGen/AMDGPU/unpack-half.ll
10	instnamer and check something
21	You can probably replace the intrinsics with a regular load and store

tpr added inline comments.Dec 12 2017, 2:41 PM

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
1977 ↗	(On Diff #126599)	OK, thanks, I'll have a look at that.
1979 ↗	(On Diff #126599)	I believe the idea of fp16_to_fp is that its input is an f16 but as an i16 type. That's what the code seems to be doing. This all seems to be to cope with f16 on hw that does not implement f16, so it does not do any of this on AMDGPU gfx800 and later because that implements hw f16.
test/CodeGen/AMDGPU/unpack-half.ll
10	Do you think I need to check something even though the point of this test is to check that the compiler does not get a fatal error? What do you mean "instnamer"?

arsenm added inline comments.Dec 12 2017, 3:01 PM

test/CodeGen/AMDGPU/unpack-half.ll
10	Usually tests without checks are frowned upon. For this it should reduce down to maybe one conversion instruction and the store. Run the pass instnamer on the test. I don't like committing tests with anonymous values since it makes them harder to modify in the future.

tpr added inline comments.Dec 13 2017, 1:29 AM

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
1977 ↗	(On Diff #126599)	getTypeAction(v1f16) is TypeScalarizeVector, but it is still getting here.

Ran instnamer on the test.

Also added a check for the v_cvt_f32_f16 instruction to the test.

tpr marked 4 inline comments as done.Dec 13 2017, 2:21 AM

tpr added inline comments.

test/CodeGen/AMDGPU/unpack-half.ll
21	That made it crash, probably because I didn't get the right address space or something. I think it's easier to leave it with the intrinsics from the original reproducer.

Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25

LLVM ERROR: Do not know how to scalarize this operator's operand!

and sure enough ScalarizeVectorOperand doesn't handle this.

test/CodeGen/AMDGPU/unpack-half.ll
21	Replacing this with %tmp = load volatile float, float addrspace(1)* undef and store volatile i32 %tmp6, i32 addrspace(1)* undef works for me

Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25

LLVM ERROR: Do not know how to scalarize this operator's operand!

and sure enough ScalarizeVectorOperand doesn't handle this.

Yes; that's the bug I'm fixing. :-)

In D41126#953997, @tpr wrote:

Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25

LLVM ERROR: Do not know how to scalarize this operator's operand!

and sure enough ScalarizeVectorOperand doesn't handle this.

Yes; that's the bug I'm fixing. :-)

But your patch is hacking around it in PromoteFloatRes_BITCAST rather than handling the missing scalarization it's complaining about there

It can't handle it because it is a fp16_to_fp whose input is v1i16 and result is f32. My theory was that that is illegal, and I needed to fix where that was being generated, which is what I have done.

Do you think my theory is wrong?

Sorry, I mean the other way round: the fp16_to_fp has input i16 and result v1f32, and my theory was that this is illegal.

In D41126#954337, @tpr wrote:

Sorry, I mean the other way round: the fp16_to_fp has input i16 and result v1f32, and my theory was that this is illegal.

OK, I see. ScalarizeVectorOperand doesn't handle this, but it's already broken at this point because it isn't v1*<->v1*. Maybe bitcast is special, but I still would not expect to see a v1 input here. You aren't just doing a promote here, you are also manually doing the scalarize. The equivalent bitcast promote for integer types does not worry about this, but both should have the same problem. I think something is off, and ScalarizeVecOp_BITCAST should be fixing this

OK, I'll have another look tomorrow. The reason I thought that was the right place to do it is that it is folding an i16 -> v1i16 bitcast in to the float promotion without thinking about the fact that the bitcast also adds vectorness.

Thanks.

So are you saying that having a i16 -> v1i16 bitcast there is bad?

Legalizing node: t28: v1i16 = truncate t26
Analyzing result type: v1i16
Scalarize node result 0: t28: v1i16 = truncate t26

Creating new node: t34: i16 = truncate t11
Legalizing node: t30: v1f16 = bitcast t28
Analyzing result type: v1f16
Scalarize node result 0: t30: v1f16 = bitcast t28

Creating new node: t35: f16 = bitcast t28
Legalizing node: t32: v1f32 = fp_extend t30
Analyzing result type: v1f32
Scalarize node result 0: t32: v1f32 = fp_extend t30

Creating new node: t36: f32 = fp_extend t35
Legalizing node: t35: f16 = bitcast t28
Analyzing result type: f16

When it scalarizes the t28 = trunc, it creates a new t34. But then when it scalarizes t30 = bitcast, it creates t35 = bitcast without bothering to scalarize its input t28 to the already existing t34. That's why I've got a i16 -> v1i16 bitcast. Is that bad?

V2: The fix is now to avoid vector scalarization creating a v1->scalar bitcast.

Addressed review comments on test.

Another test tidy-up.

tpr marked an inline comment as done.Dec 16 2017, 8:21 AM

arsenm added inline comments.Dec 18 2017, 9:27 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
172–178	There is already ScalarizeVecOp_BITCAST, so it seems the intent was this is a separate step. My question is more of why isn't that happening already in this example

tpr added inline comments.Dec 19 2017, 9:34 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
172–178	Looking at how a single node is handled in DAGTypeLegalizer::run: First it looks at the result type. If it needs legalizing, it calls the applicable function (ScalarizeVectorResult in our case), and then skips past the code that checks the operand types. It is assuming that the ScalarizeVecRes_* also scalarized the operand(s) where applicable. That seems to be true for everything except bitcast, presumably because the operand type is less predictable. So that is why it is not calling ScalarizeVecOp_BITCAST. Can you think of a better way of fixing this? Maybe special case bitcast so, after creating the new one in ScalarizeVecRes_BITCAST, it somehow gets added back to the worklist so its operand gets scanned? Or shall we stick with the fix I have?

If there are no further comments, I'll stick with the fix I now have and land it.

LGTM

This revision is now accepted and ready to land.Jan 9 2018, 7:59 AM

arsenm added inline comments.Jan 9 2018, 8:03 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175	Can you add an assert that the v1 type isn't legal?

Closed by commit rL322120: [SelectionDAG] Fixed f16-from-vector promotion problem (authored by tpr). · Explain WhyJan 9 2018, 1:37 PM

This revision was automatically updated to reflect the committed changes.

samparker added a subscriber: samparker.Jan 11 2018, 6:30 AM

samparker added inline comments.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175	Hi, The added assert is causing issues in our AArch64 tests... why is it necessary?

arsenm added inline comments.Jan 11 2018, 7:27 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175	It probably isn't necessary, but unlike the other operations handled here, it would make sense for bitcast to not separately scalarize its operand if v1 is a legal type. However, v1 as legal is a degenerate case which probably should not be allowed. It looks like AArch64 is using this as a hack for some reason with a FIXME about it.

samparker added inline comments.Jan 11 2018, 7:53 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175	Could you point me to this FIXME please? I'm still confused as to why we should prevent a scalar from being produced just because the vector is legal, is this because of how the legalizer is expected to operate?

arsenm added inline comments.Jan 11 2018, 7:58 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175	AArch64ISelLowering.cpp:596 if (Subtarget->hasNEON()) { // FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to // silliness like this: Bitcast is possibly special because it is a conversion. If the legalizer is expecting to leave some v1 operations in it, then it would probably be required to leave the bitcast in case some operation was expecting to legalize in terms of a bitcast from an 1 vector.

samparker mentioned this in D42097: [SelectionDAG] Convert assert to condtion.Jan 16 2018, 3:24 AM

Diff 127248

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines

	SDValue DAGTypeLegalizer::ScalarizeVecRes_MERGE_VALUES(SDNode *N,			SDValue DAGTypeLegalizer::ScalarizeVecRes_MERGE_VALUES(SDNode *N,
	unsigned ResNo) {			unsigned ResNo) {
	SDValue Op = DisintegrateMERGE_VALUES(N, ResNo);			SDValue Op = DisintegrateMERGE_VALUES(N, ResNo);
	return GetScalarizedVector(Op);			return GetScalarizedVector(Op);
	}			}

	SDValue DAGTypeLegalizer::ScalarizeVecRes_BITCAST(SDNode *N) {			SDValue DAGTypeLegalizer::ScalarizeVecRes_BITCAST(SDNode *N) {
				SDValue Op = N->getOperand(0);
				if (Op.getValueType().isVector()
				&& Op.getValueType().getVectorNumElements() == 1)
				Op = GetScalarizedVector(Op);
				arsenmUnsubmitted Not Done Reply Inline Actions Can you add an assert that the v1 type isn't legal? arsenm: Can you add an assert that the v1 type isn't legal?
				samparkerUnsubmitted Not Done Reply Inline Actions Hi, The added assert is causing issues in our AArch64 tests... why is it necessary? samparker: Hi, The added assert is causing issues in our AArch64 tests... why is it necessary?
				arsenmUnsubmitted Not Done Reply Inline Actions It probably isn't necessary, but unlike the other operations handled here, it would make sense for bitcast to not separately scalarize its operand if v1 is a legal type. However, v1 as legal is a degenerate case which probably should not be allowed. It looks like AArch64 is using this as a hack for some reason with a FIXME about it. arsenm: It probably isn't necessary, but unlike the other operations handled here, it would make sense…
				samparkerUnsubmitted Not Done Reply Inline Actions Could you point me to this FIXME please? I'm still confused as to why we should prevent a scalar from being produced just because the vector is legal, is this because of how the legalizer is expected to operate? samparker: Could you point me to this FIXME please? I'm still confused as to why we should prevent a…
				arsenmUnsubmitted Not Done Reply Inline Actions AArch64ISelLowering.cpp:596 if (Subtarget->hasNEON()) { // FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to // silliness like this: Bitcast is possibly special because it is a conversion. If the legalizer is expecting to leave some v1 operations in it, then it would probably be required to leave the bitcast in case some operation was expecting to legalize in terms of a bitcast from an 1 vector. arsenm: AArch64ISelLowering.cpp:596 if (Subtarget->hasNEON()) { // FIXME: v1f64 shouldn't be…
	EVT NewVT = N->getValueType(0).getVectorElementType();			EVT NewVT = N->getValueType(0).getVectorElementType();
	return DAG.getNode(ISD::BITCAST, SDLoc(N),			return DAG.getNode(ISD::BITCAST, SDLoc(N),
	NewVT, N->getOperand(0));			NewVT, Op);
				arsenmUnsubmitted Not Done Reply Inline Actions There is already ScalarizeVecOp_BITCAST, so it seems the intent was this is a separate step. My question is more of why isn't that happening already in this example arsenm: There is already ScalarizeVecOp_BITCAST, so it seems the intent was this is a separate step. My…
				tprAuthorUnsubmitted Not Done Reply Inline Actions Looking at how a single node is handled in DAGTypeLegalizer::run: First it looks at the result type. If it needs legalizing, it calls the applicable function (ScalarizeVectorResult in our case), and then skips past the code that checks the operand types. It is assuming that the ScalarizeVecRes_* also scalarized the operand(s) where applicable. That seems to be true for everything except bitcast, presumably because the operand type is less predictable. So that is why it is not calling ScalarizeVecOp_BITCAST. Can you think of a better way of fixing this? Maybe special case bitcast so, after creating the new one in ScalarizeVecRes_BITCAST, it somehow gets added back to the worklist so its operand gets scanned? Or shall we stick with the fix I have? tpr: Looking at how a single node is handled in DAGTypeLegalizer::run: First it looks at the result…
	}			}

	SDValue DAGTypeLegalizer::ScalarizeVecRes_BUILD_VECTOR(SDNode *N) {			SDValue DAGTypeLegalizer::ScalarizeVecRes_BUILD_VECTOR(SDNode *N) {
	EVT EltVT = N->getValueType(0).getVectorElementType();			EVT EltVT = N->getValueType(0).getVectorElementType();
	SDValue InOp = N->getOperand(0);			SDValue InOp = N->getOperand(0);
	// The BUILD_VECTOR operands may be of wider element types and			// The BUILD_VECTOR operands may be of wider element types and
	// we may need to truncate them back to the requested return type.			// we may need to truncate them back to the requested return type.
	if (EltVT.isInteger())			if (EltVT.isInteger())
	▲ Show 20 Lines • Show All 3,888 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/unpack-half.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx600 -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -march=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck %s

				; On gfx6 and gfx7, this test shows a bug in SelectionDAG where scalarizing the
				; extension of a vector of f16 generates an illegal node that errors later.

				; CHECK-LABEL: {{^}}main:
				; CHECK: v_cvt_f32_f16

				define amdgpu_gs void @main(i32 inreg %arg) local_unnamed_addr #0 {
				arsenmUnsubmitted Done Reply Inline Actions instnamer and check something arsenm: instnamer and check something
				tprAuthorUnsubmitted Not Done Reply Inline Actions Do you think I need to check something even though the point of this test is to check that the compiler does not get a fatal error? What do you mean "instnamer"? tpr: Do you think I need to check something even though the point of this test is to check that the…
				arsenmUnsubmitted Done Reply Inline Actions Usually tests without checks are frowned upon. For this it should reduce down to maybe one conversion instruction and the store. Run the pass instnamer on the test. I don't like committing tests with anonymous values since it makes them harder to modify in the future. arsenm: Usually tests without checks are frowned upon. For this it should reduce down to maybe one…
				.entry:
				%tmp = load volatile float, float addrspace(1)* undef
				%tmp1 = bitcast float %tmp to i32
				%im0.i = lshr i32 %tmp1, 16
				%tmp2 = insertelement <2 x i32> undef, i32 %im0.i, i32 1
				%tmp3 = trunc <2 x i32> %tmp2 to <2 x i16>
				%tmp4 = bitcast <2 x i16> %tmp3 to <2 x half>
				%tmp5 = fpext <2 x half> %tmp4 to <2 x float>
				%bc = bitcast <2 x float> %tmp5 to <2 x i32>
				%tmp6 = extractelement <2 x i32> %bc, i32 1
				store volatile i32 %tmp6, i32 addrspace(1)* undef
				arsenmUnsubmitted Done Reply Inline Actions You can probably replace the intrinsics with a regular load and store arsenm: You can probably replace the intrinsics with a regular load and store
				tprAuthorUnsubmitted Not Done Reply Inline Actions That made it crash, probably because I didn't get the right address space or something. I think it's easier to leave it with the intrinsics from the original reproducer. tpr: That made it crash, probably because I didn't get the right address space or something. I think…
				arsenmUnsubmitted Done Reply Inline Actions Replacing this with %tmp = load volatile float, float addrspace(1)* undef and store volatile i32 %tmp6, i32 addrspace(1)* undef works for me arsenm: Replacing this with %tmp = load volatile float, float addrspace(1)* undef and store…
				ret void
				}

				attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fixed f16-from-vector promotion problem
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 127248

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

test/CodeGen/AMDGPU/unpack-half.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fixed f16-from-vector promotion problemClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 127248

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

test/CodeGen/AMDGPU/unpack-half.ll

[SelectionDAG] Fixed f16-from-vector promotion problem
ClosedPublic