This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Don't transform FSUB(-0, X) -> FNEG(X) in SelectionDAGBuilder.
ClosedPublic

Authored by cameron.mcinally on Jul 17 2020, 10:47 AM.

Details

Summary

This is a subset of the patch in D73978. Hopefully this change is easier to digest...

This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the DAG. There are also two small changes to handle the new FSUB(-0,X) similarly to FNEG(X).

I've left some NOTE comments in the tests to illustrate the assembly changes, since I'm not that familiar with AMDGPU. These are just to ease reviewing. I'll remove those comments if the asm looks acceptable.

Diff Detail

Event Timeline

arsenm added inline comments.Jul 20 2020, 6:58 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

I thought the point was this cannot be lowered this way?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll
39

This is a code size regression. We probably need

This should just change to use fneg. We could probably consider the fsub case in performFNegCombine though

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

This isn't as bad as it seems, I think. This is isCanonicalized(...), so we're just saying the the FSUB(-0,X) *might* not flush denormals (if DAGCombine wants to change it to an FNEG(X) later).

We could check the DenormalMode here though. That would let us know what will happen to the FSUB(-0,X) in DAGCombine. Assuming it's not a can of worms, I could do that. If it is a can-of-worms, might be better left for separate patch. It should be functional with this patch, just not optimal.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll
39

This should just change to use fneg.

Are you saying the IR in this test should just be updated to use FNEG?

I'll look to see if there's a clean way to combine FSUB with the FMED...

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll
39

This problem is really sticky (and maybe a design flaw in DAGCombine). The initial DAGs for pre/post-Diff are largely the same, except for the FSUB/FNEG node.

The problem arrises in visitFSUB(...), when the FSUB(-0,X) is transformed to FNEG(X). This combine, of course, removes the FSUB(-0,X) from its current position in the Worklist, and the new FNEG(X) is placed at the end of the Worklist.

This new Worklist order is pretty unfortunate. The two transforms that are needed to fold the FNEG into the FMED3 are now out of order. (If DAGCombine ran one more time, it looks like we'd correctly get the fold.)

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll
39

Actually, this is a bug in AMDGPUTargetLowering::performFNegCombine(...). Under the FMED3 case, we have this code:

if (Res.getOpcode() != AMDGPUISD::FMED3)
  return SDValue(); // Op got folded away.
if (!Op.hasOneUse()) 
  DAG.ReplaceAllUsesWith(Op, DAG.getNode(ISD::FNEG, SL, VT, Res));
return Res;

That's a problem. We're using RAUW to insert the new user(s) of Res. But we return Res, not the new user(s), to DAGCombine.

DAGCombine will then add the users of Res back to the Worklist, but not the users of the new FNEG. So we miss the new FNEG combine opportunity.

More details to come tomorrow...

Fix the FMED3 DAGCombine problem by explicitly adding the new node's users to the Worklist.

Heads up, @arsenm. There are other instances of this bug in performFNegCombine(...). I see at least two in the other FMA cases.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

We could check the DenormalMode here though.

Guarding this code based on denormal mode is easy enough, but it affects the assembly in a number of tests where denormals are enabled. That will make this patch harder to review. It might be better to optimize in a stand-alone Diff.

Thoughts on this?

arsenm added inline comments.Jul 21 2020, 9:05 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll
39

Yes, the IR should be changed. Theoretically the fsub case would be another test and a different optimization

arsenm added inline comments.Jul 21 2020, 9:08 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3796–3797 ↗(On Diff #279523)

I don't actually know why this would ever happen?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

I think checking for denormals isn't quite correct; canonicalized doesn't just mean denormals are flushed. It also means at minimum signaling nans are quieted

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3796–3797 ↗(On Diff #279523)

Came from D50706. The IR for this particular problem is:

%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float %src2)
%neg.med3 = fsub float -0.0, %med3
%med3.user = fmul float %med3, 4.0

The intention is that the FNEG is sunk into the FMED3 operands and the FMUL absorbs it by negating the constant.

Seems like a pretty specific peep to me, but I'm no expert on this instruction...

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

That's true, but there is more nuance here...

For the general FSUB case, the result will be canonicalized. No problem there.

For the FSUB(-0,X) case, the result may or may not be canonicalized, depending on if the FSUB is replaced with a FNEG or not. And DAGCombine will only do that FNEG transform if in a particular denormal mode.

So, as long as both this check and the decision to transform FSUB(-0,X)->FNEG(X) are in lockstep, it's not really a problem. But, clearly, this is somewhat brittle and error prone.

Being a little pessimistic in order to more forward isn't the worst thing, assuming we don't regress too much. Figuring out a good solution to this subproblem will be easier (for me at least) if we can isolate it from the larger project.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

This is the test that's tripping:

; GCN-LABEL: {{^}}v_test_canonicalize_fneg_fabs_var_f32:
; GFX678: v_mul_f32_e64 [[REG:v[0-9]+]], -1.0, |{{v[0-9]+}}|
; GFX9: v_max_f32_e64 [[REG:v[0-9]+]], -|{{v[0-9]+}}|, -|{{v[0-9]+}}|
; GCN: {{flat|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[REG]]
define amdgpu_kernel void @v_test_canonicalize_fneg_fabs_var_f32(float addrspace(1)* %out) #1 {
  %val = load float, float addrspace(1)* %out
  %val.fabs = call float @llvm.fabs.f32(float %val)
  %val.fabs.fneg = fsub float -0.0, %val.fabs
  %canonicalized = call float @llvm.canonicalize.f32(float %val.fabs.fneg)
  store float %canonicalized, float addrspace(1)* %out
  ret void
}

attributes #1 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" }

We need a way for isCanonicalized(...) to return that an FSUB(-0,X) might not canonicalize if it is combined into an FNEG. Otherwise, the @llvm.canonicalize.f32 will be removed and we end up with:

flat_load_dword v2, v[0:1]
s_waitcnt vmcnt(0) lgkmcnt(0)
v_or_b32_e32 v2, 0x80000000, v2
flat_store_dword v[0:1], v2

Maybe we need a isFSUBtoFNEGLegal(...) helper function? That way the check is uniform across all uses?

arsenm added inline comments.Jul 22 2020, 3:05 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

Basically none of these tests you're running into are intended to use fsub. They're all fnegs that weren't ported since that's what the source modifiers are intended to directly match. Trying to do anything smarter with folding fsub's is an optimization beyond what this intended to check. I think we should just start from the position that it is not legal to lower fsub -0, x to fneg and work from there

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9274–9275

Basically none of these tests you're running into are intended to use fsub. They're all fnegs that weren't ported since that's what the source modifiers are intended to directly match. Trying to do anything smarter with folding fsub's is an optimization beyond what this intended to check.

Ok, that's fair. To reiterate, the tests should be updated to use FNEG. I'll prepare that patch for you to check out. I apologize for putting that burden on you, but I have very little intuition about the AMDGPU backend, so I want to change as little as possible.

I think we should just start from the position that it is not legal to lower fsub -0, x to fneg and work from there

I think this is the right thing to do, assuming we don't impact out-of-tree targets that might still generate the old FSUB(-0,X) idiom.

Thinking some more, I'm fairly sure there are bugs in opt that mistakenly generate the old FSUB(-0,X) pattern. So those will need to be sorted out too. And I'm not sure we can get away with bulk updating all the llc tests for other targets to use FNEG over FSUB(-0,X).

Threading a needle....

Update the tests failing from the SelectionDAGBuilder change to use FNEG.

arsenm accepted this revision.Jul 29 2020, 9:46 AM

LGTM. Can you also make the same fix in GlobalISel?

This revision is now accepted and ready to land.Jul 29 2020, 9:46 AM

I think this needs to be done in FastISel too?

Apologies for the slow reply. I was on vacation last week...

Thanks. Will do. I still need to unwind the unconditional FNEG transform in DAGCombine too.