fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are UB.
Details
- Reviewers
- scanon - efriedma - craig.topper - RKSimon - nemanjai - hfinkel - javed.absar 
- Commits
- rG3d453ad7118a: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
 rG6124cae8f765: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
 rL330437: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
 rL328921: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
Diff Detail
- Repository
- rL LLVM
Event Timeline
This is the class of optimizations that I would call "formally allowed by the standard, but extremely likely to break things and surprise people." Which isn't to say that we shouldn't do it, just ... be prepared.
More of a concern (to me), this will be a significant perf regression for armv7 and non-SSE4.1 x86, and most other arches with hardware conversion but not trunc. We need so means to identify those targets and disable this (the double-conversion dance is frequently used specifically as an optimization to avoid calling trunc on such targets).
Ah, I wasn't thinking hard enough about targets/types that don't have direct support. We really don't want to do this in IR then. Let me look at a DAG combine instead.
Patch updated:
Move the transform to the DAG and only do it when the target says it is legal. This is a completely different patch than the last rev, but keeping it under the same review number for context.
Two questions, to which I do not know the answer:
(a) Are the semantics of ISD::FP_TO_[US]INT for out-of-range values specified anywhere? The do not seem to be, but maybe I'm just missing it.
(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?
No, I don't see anything either. We just have:
/// FP_TO_[US]INT - Convert a floating point value to a signed or unsigned /// integer.
I was assuming these nodes follow the same rules as the IR instructions since they are mapped 1-to-1 in SelectionDAGBuilder::visitFPToSI() etc.
(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?
I don't see coverage for armv7, but I may be overlooking it (cc @dmgreen). For x86, I also don't see any test of the roundtrip. I'll add a RUN line to the ftrunc.ll test file.
I'll buy that. Can you add a note specifying this?
(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?
I don't see coverage for armv7, but I may be overlooking it (cc @dmgreen). For x86, I also don't see any test of the roundtrip. I'll add a RUN line to the ftrunc.ll test file.
Great. With those two questions addressed, this LGTM.
I think this is important to answer. I can see that on PPC, we use saturating FP -> Int conversions (i.e. values larger than 2^n -1 produce the maximum integral value, etc.). That behaviour will certainly change with this combine.
Patch updated:
- Add a comment to the DAG node definitions to indicate they have the same behavior as their IR counterparts.
- Add a basic test file for ARMv7 (let me know if that needs adjusting or more coverage).
- Added a RUN line for x86 pre-SSE4.1 to ftrunc.ll to show that this patch doesn't change anything for that target.
Ah, OK. The language reference clearly states that unrepresentable values produce undefined results. As far as PPC is concerned, this LGTM.
P.S. If you think it's appropriate, perhaps it would be nice to add this PPC test case since PPC has FTRUNC for vectors as well:
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown \
; RUN:   -verify-machineinstrs < %s | FileCheck %s
define <4 x float> @truncf32(<4 x float> %a) {
; CHECK-LABEL: truncf32:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrspiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptosi <4 x float> %a to <4 x i32>
  %1 = sitofp <4 x i32> %0 to <4 x float>
  ret <4 x float> %1
}
define <2 x double> @truncf64(<2 x double> %a) {
; CHECK-LABEL: truncf64:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrdpiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptosi <2 x double> %a to <2 x i64>
  %1 = sitofp <2 x i64> %0 to <2 x double>
  ret <2 x double> %1
}
define <4 x float> @truncf32u(<4 x float> %a) {
; CHECK-LABEL: truncf32u:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrspiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptoui <4 x float> %a to <4 x i32>
  %1 = uitofp <4 x i32> %0 to <4 x float>
  ret <4 x float> %1
}
define <2 x double> @truncf64u(<2 x double> %a) {
; CHECK-LABEL: truncf64u:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrdpiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptoui <2 x double> %a to <2 x i64>
  %1 = uitofp <2 x i64> %0 to <2 x double>
  ret <2 x double> %1
}Thanks for adding the new test. LGTM
| test/CodeGen/ARM/ftrunc.ll | ||
|---|---|---|
| 2 ↗ | (On Diff #139871) | Mind adding a armv8 target too? It should be come up with something like a vrintz, if I'm understanding this correctly. | 
| 25 ↗ | (On Diff #139871) | This is fine as-is, but I think there are no double to i64 rounding in arm (there is aarch64), hence the aeabi calls. There will be a double to i32 and back, but this is definitely testing something sensible, and the armv8 target should show a big improvement :) | 
This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)
The patch started off as an IR transform, and the quoted comment applied only to cases (I hope) where the target doesn’t have native support for an fptrunc instruction. NaN/inf inputs are undefined with these ops, so if something was relying on that undefined output, that could be the source of the problem.
I’m not an arm64 expert, but the tests show that we should codegen ‘frintz’ instructions with this patch. So that would be the place to look for diffs.
We found it: https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c48
It's exactly the "formally allowed by the standard, but extremely likely to break things and surprise people" situation :-) The question is just what the appropriate fix is...
Aha - nice detective work. :)
If it helps, I can revert this while fixing the sources. Can this pattern be added to the UB sanitizer? Or maybe it's already there, but wasn't used?
We think there's an UB check for this already, but v8 isn't UB clean.
The fast way to do the "is this double an int" check on an assembly level is to convert to int and back and then compare if the roundtripped value is equal. After this change, is there some way to express this approach in C?
After this change, is there some way to express this approach in C?
You can use platform-specific intrinsics, e.g. _mm_cvtsd_si32 is defined to return INT_MIN for out-of-range values. Or, in portable C, you can do a range check: if (x <= INT_MAX && x >= INT_MIN && (double)(int)x == x).
C++ (templated) option is shown as an example here:
https://stackoverflow.com/questions/2544394/c-floating-point-to-integer-type-conversions
Or, in portable C, you can do a range check: if (x <= INT_MAX && x >= INT_MIN && (double)(int)x == x).
Sorry, the bounds on this aren't precisely correct; you want something more like if (x < INT_MAX+1LL && x >= INT_MIN-1LL && (double)(int)x == x). And note this depends on the fact that INT_MIN-1LL can be exactly represented as a double.
I see that patches were applied to the Chromium sources:
https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c59
Is it ok to re-commit this and see what else breaks? :)
Sounds okay to me. It would probably be good to add a release notes entry about this though, explaining what can break and showing how to use UBSan to check for it. I can easily imagine other code relying on the same pattern that V8 used.
What is the expected behavior for casting float values in range (-1.0 , 0.0) from float->int->float? If the expected value is -0.0 (instead of 0.0), what is the reason for that?
Not sure I understand the question. Casting FP (-1.0, 0.0) to int with round-to-zero mode results in integer 0. Converting that back to FP is always 0.0.
Ah, I see - we get this wrong because we may now produce -0.0 instead of 0.0.
Ie, trunc(), ISD::FTRUNC, and presumably all of the hardware instructions that map to those ops play by the IEEE754 rounding rules:
"These operations convert zero operands to zero results of the same sign."
So we can't do this without 'nsz', right?
(FWIW since this was enabled again we had at least two bugs due to this -- https://crbug.com/845816 https://crbug.com/851415 -- and one confused thread at https://groups.google.com/a/chromium.org/forum/#!topic/cxx/W5wma_HXWOo )