This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][TargetLowering] Target hook for FCOPYSIGN arg cast folding
AbandonedPublic

Authored by luismarques on Aug 25 2019, 7:11 PM.

Details

Summary

The FCOPYSIGN IR instruction takes magnitude and sign arguments that can have different floating-point types. That instruction is typically generated by calls to copysign functions or intrinsics (copysign, copysignf, copysignl, llvm.copysign.f32, etc.) that take two arguments with the same type. Thus, if you call copysign with differently-typed values you'll get an FCOPYSIGN argument that is "casted" to the right type, using either fp_extend/fpext or fp_round/fptrunc. For example:

float copysignf(float x, float y);

float foo(float x, double y) {
    return copysignf(x, y);
}
Initial selection DAG: %bb.0 'foo:entry'
SelectionDAG has 11 nodes:
  t0: ch = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
        t4: f64,ch = CopyFromReg t0, Register:f64 %1
      t6: f32 = fp_round t4, TargetConstant:i32<0>
    t7: f32 = fcopysign t2, t6
  t9: ch,glue = CopyToReg t0, Register:f32 $f10_32, t7
  t10: ch = RISCVISD::RET_FLAG t9, Register:f32 $f10_32, t9:1

The DAGCombiner currently folds away that cast, unless the sign type is an fp128:

Optimized lowered selection DAG: %bb.0 'foo:entry'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
      t4: f64,ch = CopyFromReg t0, Register:f64 %1
    t11: f32 = fcopysign t2, t4
  t9: ch,glue = CopyToReg t0, Register:f32 $f10_32, t11
  t10: ch = RISCVISD::RET_FLAG t9, Register:f32 $f10_32, t9:1

One case where the folding is desirable is when FCOPYSIGN is expanded (in LegalizeDAG). In that case the conversion is pointless since the expanded code extracts the sign bit using integer bitwise operations and that can be easily done whatever the original floating-point type is, thus you save the cost of a doing the extension or round-down. Another case where you want elide the cast is when your ISA handles both floats and doubles using the same internal format and FP hardware (e.g. PowerPC converts everything to double).

On the other hand, when you have native copysign instructions (and don't convert everything to the same internal format) doing the fold is typically undesirable and just creates implementation busywork. The copysign machine instructions probably don't accept mixed precision types, so now you have to create selection patterns for the various FP type combination, using the appropriate rounding/extending instructions to recreate the cast that was folded away. See for instance the WebAssembly target.

In general, I would say that whether you want to fold the cast away or not depends on the target lowering. Currently the DAGCombiner does not use any target hook to check whether the fold should be done or not. Instead, it only checks for f128, with a vague comment about it being problematic on some targets like x86_64 (not clear exactly which...). This hard coded logic doesn't allow the folding to be done for f128 when it actually would be OK, and folds away when the target doesn't actually handle the mixed types (e.g. RISC-V, at the moment). You'll probably agree that having random x86 target limitations infect the DAGCombiner is not ideal. One possible solution is to introduce a target hook, which is what this patch does.

Feedback would be appreciated about the following:

  • Is there a way to do introduce a better check in the DAGCombiner that does not require introducing a target hook? I explored a couple of options, but I could only make it work with a hook.
  • Nitpick: naming suggestions for the hook. I tried to be consistent with the other hooks but all of the names I could come up with were slightly flawed in some away, including the one in this patch.
  • Semantics for the hook. In this patch we do the fold if the target indicates it can handle the given types. Maybe that's not the best way to express the problem.
  • Implementation for the hook. By default we consider that we can handle any types if the copysign would be expanded, since generally the expansion code should be able to handle any type unless the target messes that up with the legalization steps (that was the case for a couple of targets). Also, I was initially checking for the expansion outside the hook (in the DAGCombiner) but that failed for one x86 test, so I had to move that to the hook.
  • Arguably the comment that was originally on DAGCombiner should be moved to the X86 TargetLowering and adjusted for that new context. But given its vagueness I just deleted it.
  • Not all targets have tests for this, covering all of the relevant types. Not sure for which targets and which target options it's worth adding. For instance, for ARM there's a lack of tests for the f128 case, and it needs to be tested with more than -mtriple=arm.
  • The alternative to all of this would be to just exhaustively implement FCOPYSIGN in all targets, despite the busywork. Is it realistic that we can encounter FCOPYSIGN with all of the FP types without a cast? Doesn't seem so to me.
  • Isn't the root problem of this that we don't have the notion of an instruction being legal depending on the types of the various operands? I checked GlobalISel and it seems like it doesn't solve that either. It seems like it can set legalization actions based on the type of individual operands but not based on combinations of operands. So, more flexible than SelectionDAG but still not flexible enough to solve this without a target hook.
    • It seems like it's not the first time people have run into that limitation. For instance, in CodeGenPrepare it says FIXME: always querying the result type is just an approximation; some nodes' legality is determined by the operand or other means. There's no good way to find out though.

Diff Detail

Event Timeline

luismarques created this revision.Aug 25 2019, 7:11 PM
lenary added inline comments.Aug 27 2019, 8:48 AM
llvm/include/llvm/CodeGen/TargetLowering.h
2609

Which targets use this default and not SignTy != i128, beyond RISC-V and WebAssembly?

llvm/test/CodeGen/RISCV/copysign-casts.ll
19

NIT: Can you add noexcept to these tests?

luismarques marked an inline comment as done.

Adds nounwind attribute to the RISC-V tests.

luismarques marked an inline comment as done.Aug 27 2019, 4:45 PM
luismarques added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
2609

AArch64 overrides the hook but accepts all types, including f128. In-tree targets that keep the default implementation in this patch are ARC, AVR, BPF, Lanai, MSP430, NVPTX, Sparc, RISCV and WebAssembly.

luismarques marked an inline comment as done.Aug 27 2019, 4:54 PM
luismarques added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
2609

...and XCore.

lenary marked an inline comment as done.Sep 9 2019, 7:56 AM

I am happy with this. I think this hook is the correct way to go about choosing how to do this optimisation, and accurately conveys what's going on.

I would like to see a review by someone who works on cross-target parts of DAGCombiner, before this is landed.

llvm/include/llvm/CodeGen/TargetLowering.h
2609

Ok, nice, I just wanted to make sure this was a sensible default, rather than being overriden everywhere.

I would like to see a review by someone who works on cross-target parts of DAGCombiner, before this is landed.

@efriedma would you be willing to review this?

emaste added a subscriber: emaste.Oct 18 2019, 11:57 AM
bsdjhb added a subscriber: bsdjhb.Oct 21 2019, 1:29 PM

FYI, this change allowed me to compile and run a hard-float FreeBSD RISC-V userland. (With stock LLVM the build trips over this assertion)

For FCOPYSIGN, specifically, LegalizeDAG is going to query the target to ask, "is FCOPYSIGN legal for result type X", using the getOperationAction() API? The target has a few different ways to respond: here, the relevant possibilities are "Legal", "Expand", or "Custom". I guess the issue here is that on RISCV, this query is returning "Legal", when it actually isn't legal for all combinations of legal result/input types?

If this is a problem for a bunch of targets, maybe we need a different API for expressing whether an FCOPYSIGN is legal. I'd prefer to follow existing convention for other operations which involve multiple types, though; for example, see TargetLoweringBase::getLoadExtAction.

brooks added a subscriber: brooks.Nov 6 2019, 11:34 AM

Rebased and tweaked patch.

luismarques edited the summary of this revision. (Show Details)Nov 10 2019, 3:52 AM

For FCOPYSIGN, specifically, LegalizeDAG is going to query the target to ask, "is FCOPYSIGN legal for result type X", using the getOperationAction() API? The target has a few different ways to respond: here, the relevant possibilities are "Legal", "Expand", or "Custom". I guess the issue here is that on RISCV, this query is returning "Legal", when it actually isn't legal for all combinations of legal result/input types?
If this is a problem for a bunch of targets, maybe we need a different API for expressing whether an FCOPYSIGN is legal. I'd prefer to follow existing convention for other operations which involve multiple types, though; for example, see TargetLoweringBase::getLoadExtAction.

@efriedma: one problem is that whether we want to do the combine or not doesn't map well to whether FCOPYSIGN is legal (for return type X). For instance, if you don't have an FPU and you expand the copysign (to integer bit manipulations) you actually still want to do the combine, since it's pointless to emit a libcall for promoting/truncating the floating-point sign value -- it's just as easy to extract the sign bit from the original value. In any case, I experimented with the proposed approach and I will submit the patch for review for comparison. But I think it ends up addressing this problem in an even kludgier way (I'll add some details in that patch's summary).

Add nounwind to tests.

dmz added a subscriber: dmz.Nov 12 2019, 9:39 AM
mhorne added a subscriber: mhorne.Nov 16 2019, 2:50 PM
luismarques abandoned this revision.Nov 27 2019, 7:04 AM

Abandoned in favour of D70679 (and possibly other future patches).