This is an archive of the discontinued LLVM Phabricator instance.

Differential D85804

[AMDGPU] Fix crash when dag-combining bitcast
ClosedPublic

Authored by ruiling on Aug 11 2020, 7:49 PM.

Download Raw Diff

Details

Reviewers

arsenm

Commits

rG18b1e675232b: [AMDGPU] Fix crash when dag-combining bitcast

Summary

From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).

Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5f594dda705ec80474dda4837f26dba)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ruiling created this revision.Aug 11 2020, 7:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2020, 7:49 PM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 7 others. · View Herald Transcript

ruiling requested review of this revision.Aug 11 2020, 7:49 PM

Herald added a subscriber: wdng. · View Herald TranscriptAug 11 2020, 7:49 PM

Harbormaster completed remote builds in B68047: Diff 284955.Aug 11 2020, 8:20 PM

arsenm added inline comments.Aug 12 2020, 5:34 AM

llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.ll
306	Does this need the constant canonicalize? I'd rather avoid relying on some specific constant folding behavior. I'm also not sure how this produces a 64-bit value

ruiling added inline comments.Aug 12 2020, 6:28 AM

llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.ll
306	I am not sure why the constant canonicalize was generated right now. The IR in the test-case will become a bitcast from fp32 constant to <1 x i32> in SelectionDAG, I cannot directly make an IR to bitcast a constantfp to <1 x i32>, that will be directly optimized off as a i32 constant after optimization. The code here checking for break condition is wrong, a bitcast from constantfp to <1 x i32> should not execute this piece of code handling 64bit bitcast. It should break-away early. But the break-condition checking expression is false because the DestVT is vector. So a 32bit bitcast goes into if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(Src)). That's why I modify the break-condition check from && to \|\|. I think a bitcast from fp32 constant to <2 x i16> or fp32 constant cast to <4 x i8> also need such fix.

LGTM. I'm slightly worried InstSimplify may get too smart and start folding out the intrinsic before it hits the DAG someday, but that probably won't happen

This revision is now accepted and ready to land.Aug 12 2020, 6:41 AM

I see. Is there other way to keep a bitcast from contantfp to i32 at LLVM IR level from being optimized off before the DAG?

In D85804#2213241, @ruiling wrote:

I see. Is there other way to keep a bitcast from contantfp to i32 at LLVM IR level from being optimized off before the DAG?

I don't have any great ideas. It can be difficult to figure out how to produce operations that will produce the right type combination at the right time in the DAG

Closed by commit rG18b1e675232b: [AMDGPU] Fix crash when dag-combining bitcast (authored by ruiling). · Explain WhyAug 12 2020, 7:24 PM

This revision was automatically updated to reflect the committed changes.

ruiling added a commit: rG18b1e675232b: [AMDGPU] Fix crash when dag-combining bitcast.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelLowering.cpp

16 lines

test/

CodeGen/

AMDGPU/

amdgcn.bitcast.ll

12 lines

Diff 284955

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 3,934 Lines • ▼ Show 20 Lines	if (DestVT.isVector()) {
CastedElts.push_back(DAG.getNode(ISD::BITCAST, DL, DestEltVT, Elt));		CastedElts.push_back(DAG.getNode(ISD::BITCAST, DL, DestEltVT, Elt));
}		}

return DAG.getBuildVector(DestVT, SL, CastedElts);		return DAG.getBuildVector(DestVT, SL, CastedElts);
}		}
}		}
}		}

if (DestVT.getSizeInBits() != 64 && !DestVT.isVector())		if (DestVT.getSizeInBits() != 64 \|\| !DestVT.isVector())
break;		break;

// Fold bitcasts of constants.		// Fold bitcasts of constants.
//		//
// v2i32 (bitcast i64:k) -> build_vector lo_32(k), hi_32(k)		// v2i32 (bitcast i64:k) -> build_vector lo_32(k), hi_32(k)
// TODO: Generalize and move to DAGCombiner		// TODO: Generalize and move to DAGCombiner
SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Src)) {		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Src)) {
if (Src.getValueType() == MVT::i64) {
SDLoc SL(N);		SDLoc SL(N);
uint64_t CVal = C->getZExtValue();		uint64_t CVal = C->getZExtValue();
SDValue BV = DAG.getNode(ISD::BUILD_VECTOR, SL, MVT::v2i32,		SDValue BV = DAG.getNode(ISD::BUILD_VECTOR, SL, MVT::v2i32,
DAG.getConstant(Lo_32(CVal), SL, MVT::i32),		DAG.getConstant(Lo_32(CVal), SL, MVT::i32),
DAG.getConstant(Hi_32(CVal), SL, MVT::i32));		DAG.getConstant(Hi_32(CVal), SL, MVT::i32));
return DAG.getNode(ISD::BITCAST, SL, DestVT, BV);		return DAG.getNode(ISD::BITCAST, SL, DestVT, BV);
}		}
}

if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(Src)) {		if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(Src)) {
const APInt &Val = C->getValueAPF().bitcastToAPInt();		const APInt &Val = C->getValueAPF().bitcastToAPInt();
SDLoc SL(N);		SDLoc SL(N);
uint64_t CVal = Val.getZExtValue();		uint64_t CVal = Val.getZExtValue();
SDValue Vec = DAG.getNode(ISD::BUILD_VECTOR, SL, MVT::v2i32,		SDValue Vec = DAG.getNode(ISD::BUILD_VECTOR, SL, MVT::v2i32,
DAG.getConstant(Lo_32(CVal), SL, MVT::i32),		DAG.getConstant(Lo_32(CVal), SL, MVT::i32),
DAG.getConstant(Hi_32(CVal), SL, MVT::i32));		DAG.getConstant(Hi_32(CVal), SL, MVT::i32));
▲ Show 20 Lines • Show All 776 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.ll

	Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	; FUNC-LABEL: {{^}}bitcast_v4f32_to_v2i64:			; FUNC-LABEL: {{^}}bitcast_v4f32_to_v2i64:
	; GCN: s_buffer_load_dwordx4			; GCN: s_buffer_load_dwordx4
	define <2 x i64> @bitcast_v4f32_to_v2i64(<2 x i64> %arg) {			define <2 x i64> @bitcast_v4f32_to_v2i64(<2 x i64> %arg) {
	%val = call <4 x float> @llvm.amdgcn.s.buffer.load.v4f32(<4 x i32> undef, i32 0, i32 0)			%val = call <4 x float> @llvm.amdgcn.s.buffer.load.v4f32(<4 x i32> undef, i32 0, i32 0)
	%cast = bitcast <4 x float> %val to <2 x i64>			%cast = bitcast <4 x float> %val to <2 x i64>
	%div = udiv <2 x i64> %cast, %arg			%div = udiv <2 x i64> %cast, %arg
	ret <2 x i64> %div			ret <2 x i64> %div
	}			}

				declare half @llvm.canonicalize.f16(half)

				; FUNC-LABEL: {{^}}bitcast_f32_to_v1i32:
				define amdgpu_kernel void @bitcast_f32_to_v1i32(i32 addrspace(1)* %out) {
				%f16 = call arcp afn half @llvm.canonicalize.f16(half 0xH03F0)
				arsenmUnsubmitted Not Done Reply Inline Actions Does this need the constant canonicalize? I'd rather avoid relying on some specific constant folding behavior. I'm also not sure how this produces a 64-bit value arsenm: Does this need the constant canonicalize? I'd rather avoid relying on some specific constant…
				ruilingAuthorUnsubmitted Done Reply Inline Actions I am not sure why the constant canonicalize was generated right now. The IR in the test-case will become a bitcast from fp32 constant to <1 x i32> in SelectionDAG, I cannot directly make an IR to bitcast a constantfp to <1 x i32>, that will be directly optimized off as a i32 constant after optimization. The code here checking for break condition is wrong, a bitcast from constantfp to <1 x i32> should not execute this piece of code handling 64bit bitcast. It should break-away early. But the break-condition checking expression is false because the DestVT is vector. So a 32bit bitcast goes into if (ConstantFPSDNode C = dyn_cast<ConstantFPSDNode>(Src)). That's why I modify the break-condition check from && to \|\|. I think a bitcast from fp32 constant to <2 x i16> or fp32 constant cast to <4 x i8> also need such fix. ruiling:* I am not sure why the constant canonicalize was generated right now. The IR in the test-case…
				%f32 = fpext half %f16 to float
				%v = bitcast float %f32 to <1 x i32>
				%v1 = extractelement <1 x i32> %v, i32 0
				store i32 %v1, i32 addrspace(1)* %out
				ret void
				}