[DAGCombine] (float)((int) f) --> ftrunc (PR36617)


[DAGCombine] (float)((int) f) --> ftrunc (PR36617)

This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

Optimization of floating-point casts is improved. This may cause surprising
results for code that is relying on undefined behavior. Code sanitizers can
be used to detect affected patterns such as this:

  int main() {
    float x = 4294967296.0f;
    x = (float)((int)x);
    printf("junk in the ftrunc: %f\n", x);
    return 0;

  $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
  ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                 representable values of type 'int'
  junk in the ftrunc: 0.000000

Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909


spatelApr 20 2018, 8:07 AM
Differential Revision
D44909: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
rL330436: [CostModel][X86] Add srem/urem constant cost tests

Event Timeline

scanon added a subscriber: scanon.Apr 20 2018, 8:43 AM

Tangential question: Do we have an intrinsic floating -> integer conversion with defined semantics for out-of-range values?

If not, we should probably add it, to support source languages that want to get rid of UB for these cases. I know both ARM and x86 provide fully-defined conversions in HW (unfortunately they did not pick the *same* fully-defined conversion: ARM clamps and returns 0x800...0 for NaN, x86 returns 0x800...0 for all out of range values). Should such an intrinsic be implementation defined to "what the HW does" (and a clamp-and-convert sequence for platforms without support), or standardize one behavior (the ARM one is arguably more useful, and still cheap to implement on x86)?

If such a thing already exists, we may want to consider pointing people to it.

I don't think we have what you're describing (looking through clang source
because I don't think there's any actual documentation...).

The intrinsic you're describing would be a wrapper for things like
_mm_cvtsd_si32() on x86 and vcvt_s32_f32 on ARM?