This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] fold (fp_round (int_to_fp X)) -> (int_to_fp X)
AbandonedPublic

Authored by lenary on Apr 30 2020, 10:56 AM.

Details

Summary

This is a follow-up to D78906 which moves the required fold from a SelectionDAG
pattern into a (target-independent) DAGCombiner fold.

I believe the fold is valid if result type of the round can still accurately
represent the original integer type (signed or unsigned), and the intermediate
fp type is wider than the resulting fp type.

An example from these tests is where the original integer type is i32, the
intermediate float type is double, and the resulting fp type is float. My
understanding in this case is that the fp_round removes as much information as
just doing the int_to_fp directly.

The advantage of doing it in the DAGCombiner in this case is that it allows the
fold to happen, even if the intermediate floating point value is not a legal
type. In that case, the legaliser will turn the round and the int_to_fp into two
separate libcalls, before patterns can be used. The DAGCombiner does not have
this issue, as it is run before legalisation.

This patch needs a thorough review with respect to FP semantics. I am not
particularly familiar with how rounding and the like will affect this fold, and
I did find it hard to get the fold conditions correct to avoid infinite loops in
the AArch64 backend (which affected fp16 types), but to have the optimisation
work on the examples I expected in the RISC-V backend.

Diff Detail

Event Timeline

lenary created this revision.Apr 30 2020, 10:56 AM

(Alive2 is happy with the transform; see http://volta.cs.utah.edu:8080/z/oR3wKj .)

I did find it hard to get the fold conditions correct to avoid infinite loops in the AArch64 backend (which affected fp16 types),

Can you give an example here?

We probably need some code to avoid creating illegal SINT_TO_FP after legalization; that could lead to an infinite loop. There are lots of examples in DAGCombine to follow that call isOperationLegal.

(Alive2 is happy with the transform; see http://volta.cs.utah.edu:8080/z/oR3wKj .)

I did find it hard to get the fold conditions correct to avoid infinite loops in the AArch64 backend (which affected fp16 types),

Can you give an example here?

We probably need some code to avoid creating illegal SINT_TO_FP after legalization; that could lead to an infinite loop. There are lots of examples in DAGCombine to follow that call isOperationLegal.

This patch itself does not cause infinite loops. However, small changes to the validity checker of the transform seem to cause issues, in the worst case breaking all of the following:

LLVM :: CodeGen/AArch64/arm64-convert-v4f64.ll
LLVM :: CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
LLVM :: CodeGen/AArch64/complex-int-to-fp.ll
LLVM :: CodeGen/AArch64/f16-instructions.ll
LLVM :: CodeGen/AArch64/fdiv_combine.ll
LLVM :: CodeGen/AArch64/fp16-v16-instructions.ll
LLVM :: CodeGen/AArch64/fp16-v4-instructions.ll
LLVM :: CodeGen/AArch64/fp16-v8-instructions.ll

I did try to ensure that SINT_TO_FP was legal, but I'm not sure the query I had for that was ever correct, because I'm not sure the query hasOperation(ISD::SINT_TO_FP, IntTy) necessarily tells you anything - I think I'm looking for the other half of this query, which says "is it legal to go *to* this FP type with sint_to_fp. Guidance on this would also be helpful :) I'm new to the DAGCombiner.

The worst offendors seemed to be the following. I looked into f16-instructions.ll with llc -debug-only=dagcombine and there was definitely an infinite loop somewhere there.

LLVM :: CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
LLVM :: CodeGen/AArch64/f16-instructions.ll

However, small changes to the validity checker of the transform seem to cause issues

You mean, we don't have enough test coverage to catch all the issues? :)

Actually, I guess you won't run into issues with the current version of the patch because the legal integer types aren't narrow enough. I assume if you added a call to GetNumSignBits or whatever, the issues would pop back up. The problem is essentially that the fp16 type is "legal", but doesn't have SINT_TO_FP; we instead convert to a 32-bit float, then truncate the float to fp16. Which is exactly the pattern you're matching here.

I think if the result type of the fp_round is a half, the transform is legal for any integer type. half is so tiny that all integer half values fit into the mantissa of a 32-bit float, so there can't be any rounding issues. Alive2 agrees.

I'm not sure the query hasOperation(ISD::SINT_TO_FP, IntTy) necessarily tells you anything

If that returns Legal, it means SINT_TO_FP is legal for all legal floating-point types. If it's not legal for some types, the target has to mark it Custom and add handling for the types in question.

lenary added a comment.May 1 2020, 2:02 AM

Thanks for all the help, I will add the hasOperation(...) check to the fold.

I'm not convinced I'm using the semanticsPrecision from the correct node - I think perhaps I should be using the precision of the final resulting type of the fp_round. Once I've made the hasOperation change, I'll check this potential change against the AArch64 tests again.

lenary updated this revision to Diff 261453.May 1 2020, 4:15 AM

Address @efriedma's feedback and guidance:

  • Use hasOperation to ensure legality. This causes the optimisation not to happen for the testcase on on RV64*F, in this example, I think because i32 is not a legal type on that architecture. I'll follow-up this patch if I can get that working.

I have also refactored the check, and added an even greater commentary of why I
believe the transform to be correct, which I think means it applies in more
cases. This is based off the observation/assumption that fp_round always
removes precision of the final float, so that precision loss can always be
"moved" into the int_to_fp operation.

From http://volta.cs.utah.edu:8080/z/d6wpw5:

----------------------------------------
define float @src(i64 %x) {
%0:
  %xx = sitofp i64 %x to double
  %xxx = fptrunc double %xx to float
  ret float %xxx
}
=>
define float @tgt(i64 %x) {
%0:
  %xx = sitofp i64 %x to float
  ret float %xx
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i64 %x = #x0810400800000010 (581034755034710032)

Source:
double %xx = #x43a0208010000000 (581034755034710016)
float %xxx = #x5d010400 (581034720674971648)

Target:
float %xx = #x5d010401 (581034789394448384)
Source value: #x5d010400 (581034720674971648)
Target value: #x5d010401 (581034789394448384)

Use hasOperation to ensure legality. This causes the optimisation not to happen for the testcase on on RV64*F, in this example, I think because i32 is not a legal type on that architecture. I'll follow-up this patch if I can get that working.

You could relax the check a bit; if !LegalTypes, you don't need to worry whether the operation is legal.

Actually, it might make sense to do this in instcombine instead of DAGCombine.

lenary added a comment.May 1 2020, 3:46 PM

From http://volta.cs.utah.edu:8080/z/d6wpw5:

...
Source value: #x5d010400 (581034720674971648)
Target value: #x5d010401 (581034789394448384)

I *thought* something like this might pop up last time.

I'll defer to you over whether we move this check to instcombine. In the meantime, I'll reinstate the previous check (that the intermediate float can precisely represent the source int), and add !LegalTypes || to the hasOperation check.

lenary updated this revision to Diff 261640.May 2 2020, 6:39 AM

Updates to use original precision check (against intermediate float type), and
relaxes the hasOperation() check pre-legaltypes.

spatel added a comment.May 2 2020, 8:43 AM

Actually, it might make sense to do this in instcombine instead of DAGCombine.

Agree - if we can fold this earlier and universally, that would be better. See D79116 for a similar transform.

RKSimon added a subscriber: RKSimon.May 4 2020, 7:54 AM

rGc048a02b5b26 does this transform in IR. Do we still need a backend transform?

lenary abandoned this revision.May 31 2020, 11:18 AM

rGc048a02b5b26 does this transform in IR. Do we still need a backend transform?

I don't think so.