The FCOPYSIGN IR instruction takes magnitude and sign arguments that can have different floating-point types. That instruction is typically generated by calls to copysign functions or intrinsics (copysign, copysignf, copysignl, llvm.copysign.f32, etc.) that take two arguments with the same type. Thus, if you call copysign with differently-typed values you'll get an FCOPYSIGN argument that is "casted" to the right type, using either fp_extend/fpext or fp_round/fptrunc. For example:
float copysignf(float x, float y); float foo(float x, double y) { return copysignf(x, y); }
Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t4: f64,ch = CopyFromReg t0, Register:f64 %1 t6: f32 = fp_round t4, TargetConstant:i32<0> t7: f32 = fcopysign t2, t6 t9: ch,glue = CopyToReg t0, Register:f32 $f10_32, t7 t10: ch = RISCVISD::RET_FLAG t9, Register:f32 $f10_32, t9:1
The DAGCombiner currently folds away that cast, unless the sign type is an fp128:
Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t4: f64,ch = CopyFromReg t0, Register:f64 %1 t11: f32 = fcopysign t2, t4 t9: ch,glue = CopyToReg t0, Register:f32 $f10_32, t11 t10: ch = RISCVISD::RET_FLAG t9, Register:f32 $f10_32, t9:1
One case where the folding is desirable is when FCOPYSIGN is expanded (in LegalizeDAG). In that case the conversion is pointless since the expanded code extracts the sign bit using integer bitwise operations and that can be easily done whatever the original floating-point type is, thus you save the cost of a doing the extension or round-down. Another case where you want elide the cast is when your ISA handles both floats and doubles using the same internal format and FP hardware (e.g. PowerPC converts everything to double).
On the other hand, when you have native copysign instructions (and don't convert everything to the same internal format) doing the fold is typically undesirable and just creates implementation busywork. The copysign machine instructions probably don't accept mixed precision types, so now you have to create selection patterns for the various FP type combination, using the appropriate rounding/extending instructions to recreate the cast that was folded away. See for instance the WebAssembly target.
In general, I would say that whether you want to fold the cast away or not depends on the target lowering. Currently the DAGCombiner does not use any target hook to check whether the fold should be done or not. Instead, it only checks for f128, with a vague comment about it being problematic on some targets like x86_64 (not clear exactly which...). This hard coded logic doesn't allow the folding to be done for f128 when it actually would be OK, and folds away when the target doesn't actually handle the mixed types (e.g. RISC-V, at the moment). You'll probably agree that having random x86 target limitations infect the DAGCombiner is not ideal. One possible solution is to introduce a target hook, which is what this patch does.
Feedback would be appreciated about the following:
- Is there a way to do introduce a better check in the DAGCombiner that does not require introducing a target hook? I explored a couple of options, but I could only make it work with a hook.
- Nitpick: naming suggestions for the hook. I tried to be consistent with the other hooks but all of the names I could come up with were slightly flawed in some away, including the one in this patch.
- Semantics for the hook. In this patch we do the fold if the target indicates it can handle the given types. Maybe that's not the best way to express the problem.
- Implementation for the hook. By default we consider that we can handle any types if the copysign would be expanded, since generally the expansion code should be able to handle any type unless the target messes that up with the legalization steps (that was the case for a couple of targets). Also, I was initially checking for the expansion outside the hook (in the DAGCombiner) but that failed for one x86 test, so I had to move that to the hook.
- Arguably the comment that was originally on DAGCombiner should be moved to the X86 TargetLowering and adjusted for that new context. But given its vagueness I just deleted it.
- Not all targets have tests for this, covering all of the relevant types. Not sure for which targets and which target options it's worth adding. For instance, for ARM there's a lack of tests for the f128 case, and it needs to be tested with more than -mtriple=arm.
- The alternative to all of this would be to just exhaustively implement FCOPYSIGN in all targets, despite the busywork. Is it realistic that we can encounter FCOPYSIGN with all of the FP types without a cast? Doesn't seem so to me.
- Isn't the root problem of this that we don't have the notion of an instruction being legal depending on the types of the various operands? I checked GlobalISel and it seems like it doesn't solve that either. It seems like it can set legalization actions based on the type of individual operands but not based on combinations of operands. So, more flexible than SelectionDAG but still not flexible enough to solve this without a target hook.
- It seems like it's not the first time people have run into that limitation. For instance, in CodeGenPrepare it says FIXME: always querying the result type is just an approximation; some nodes' legality is determined by the operand or other means. There's no good way to find out though.
Which targets use this default and not SignTy != i128, beyond RISC-V and WebAssembly?