This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
ClosedPublic

Authored by spatel on Jun 12 2018, 10:39 AM.

Details

Summary

As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option.

Diff Detail

Event Timeline

spatel created this revision.Jun 12 2018, 10:39 AM
lebedev.ri added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11069–11070

Couldn't we do and (ftrunc %x), (-1 >> 1), i.e. unset the sign bit?
Or is that worse than not transforming to ftrunc in the first place?

spatel added inline comments.Jun 12 2018, 11:01 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11069–11070

You'd want to keep this in FP, so I think that would be:
fabs (ftrunc x)

And that could be better than the alternative in some cases. See D44909 for what this looks like without using ftrunc. If the sequence is more than 2 converts, fabs is likely a winner.

But given the fallout from the original patch, I expect this is the patch most people would prefer. :)
We'll avoid the UB headaches for most code, and code that cares about FP perf likely has some loosened FP constraints anyway.

lebedev.ri added inline comments.Jun 12 2018, 11:52 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11069–11070

You'd want to keep this in FP, so I think that would be:
fabs (ftrunc x)

Right, right. Otherwise you might need to leave fp domain, etc etc.

But given the fallout from the original patch, I expect this is the patch most people would prefer. :)

That cat is already out of the bag though, and modulo this edge-case, the transform is valid i think.
Backtracking like this, while not for no reason, may be viewed as acceptance of views of some who
think UB is bad and lack of UB [in code] should not be assumed by compilers. But this is purely my opinion.

spatel added inline comments.Jun 12 2018, 1:04 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11069–11070

Depends on your perspective. For brave (much appreciated!) projects that track LLVM trunk, yes they've already suffered/benefited. But most potential customers won't see this change until the next major release.

But this discussion is independent of this particular patch. We have a miscompile that must be fixed. Salvaging perf is secondary, and I don't think we can do it universally - ie, converting 2 casts to fabs+ftrunc or select+fcmp+fabs+ftrunc requires a TLI hook, so that's definitely a bigger patch than what we have here currently.

Should this adjust the ReleaseNotes?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11069–11070

Could you please at least add a FIXME? :)

spatel marked an inline comment as done.Jun 19 2018, 12:28 PM

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

spatel updated this revision to Diff 151965.Jun 19 2018, 12:30 PM

Patch updated:
Added a 'TODO' comment about using a FABS-based sequence if we can't ignore -0.0.

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

The text in first and last links suggests that such an optimization is always being done.
But now it will only be done in presence of no-signed-zeros-fp-math attribute.
Which is controlled by -ffast-math (or maybe some more fine-grained option, too?)

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

The text in first and last links suggests that such an optimization is always being done.
But now it will only be done in presence of no-signed-zeros-fp-math attribute.
Which is controlled by -ffast-math (or maybe some more fine-grained option, too?)

Right - it should still be allowed with "-fno-signed-zeros" at a minimum.

If we change the docs to provide the exact details of the potential optimization, then we have to adjust the docs anytime that changes (eg, if we implement the fabs trick for some subset of data types on some subset targets). So my preference is to not document the implementation at that level, but if you think that the text is misleading, I'll add a note.

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

The text in first and last links suggests that such an optimization is always being done.

One more point here: if it sounds like it's always done, then that's already wrong.

We only do the transformation when the target has a legal FTRUNC op, so some fraction of targets are never affected by this transform regardless of fast-math. Eg, x86 doesn't have roundXX before SSE4.1.

lebedev.ri accepted this revision.Jun 26 2018, 8:39 AM

I still think the docs need some adjustment.
And not for all arches the there are full set of tests (i.e. one with "no-signed-zeros-fp-math"="true", one without).
But this is a regression, so i'd guess that can be iterated later on.

It would be best if someone else could agree/disagree, but other than that, LG.

This revision is now accepted and ready to land.Jun 26 2018, 8:39 AM

I still think the docs need some adjustment.
And not for all arches the there are full set of tests (i.e. one with "no-signed-zeros-fp-math"="true", one without).
But this is a regression, so i'd guess that can be iterated later on.

It would be best if someone else could agree/disagree, but other than that, LG.

Thanks! I'm not opposed to rewording the docs, but I'm also not sure how to adjust them to make it better rather than worse.

If anyone would like to take a shot at that, I'm happy to review (and as I said, I think any text tweaking should be independent of this code change...though others may disagree with that assessment).

This revision was automatically updated to reflect the committed changes.