These simplifications were previously enabled only with isFast(), but that
is more restrictive than required. Since r317488, FMF has 'reassoc' to
control these cases at a finer level.
Details
Diff Detail
Event Timeline
The tests here demonstrate the overlap between instcombine and reassociate, and based on feedback I've gotten, I'd say that rL170471 was a mistake for instcombine. It created a mini-reassociation pass within instcombine.
I've been cataloging, testing, and updating FP binop instcombines recently (eg, rL328502, rL329501), and the remaining big blob is FAddCombine() - which is poorly named because as shown in this patch, it's also called from visitFSub()...
I don't actually know everything that can happen in this code, but if others are confident that it's just reassociation (without need for nsz/nnan/ninf), then I suppose this is ok to loosen up. I'd still like to have dedicated tests in instcombine for whatever is happening in FAddCombine() though, so add to the tests from rL170471 to demonstrate at least some of the diffs without needing -reassociate?
A couple of short comments/questions, and one longer one:
Should we rename it to FAddSubCombine() in this work (assuming this patch moves forward, given the other questions here)?
... I'd still like to have dedicated tests in instcombine for whatever is happening in FAddCombine() though, so add to the tests from rL170471 to demonstrate at least some of the diffs without needing -reassociate?
To be sure I'm understanding, you mean demonstrate some diffs in those tests using the flag 'reassoc' rather than 'fast' (without needing -reassociate), right? I have a concern that meaningful transformations based on the 'reassoc' flag alone may require both -instcombine and -reassociate.
My longer question relates to:
... but if others are confident that it's just reassociation (without need for nsz/nnan/ninf), then I suppose this is ok to loosen up.
In a sense, I do feel we should check 'nsz' in addition to 'reassoc' (and maybe others). And similarly, the tests in the other files that I modified in this patch possibly should only do the transformation if 'reassoc' and 'nsz' (and others?) are set. But when starting with C/C++ (when -ffast-math isn't specified), we only will set 'reassoc' if some other switches in addition to -fassociative-math are passed. Specifically:
-fassociative-math -fno-signed-zeros -fno-trapping-math
(details at D39812).
So in a sense, an instruction that has only 'reassoc' set on it isn't complete, in that it cannot be generated by Clang. I feel like this is an IEEE thing, rather than a C/C++ thing (so other language front-ends ought to do something similar). But from this perspective, is checking only 'reassoc' OK? Or is should we change tests that have only 'reassoc' on an instruction to also have 'nsz'? Which then leads me to the problem/question that -fno-trapping-math doesn't have a bit in FMF, so it creates a new dimension to the problem/question.
Anyway, I took the interpretation that when the 'reassoc' bit is set, it in a sense means the equivalent of at least -fassociative-math -fno-signed-zeros -fno-trapping-math from a C/C++ command-line perspective. I'm not sure how we should interpret this. As I write this comment, I feel like I've opened a can of worms.
I'd prefer more specific like reassociateFAddFSub(), but I wouldn't bother unless we want to keep that code in its current form.
... I'd still like to have dedicated tests in instcombine for whatever is happening in FAddCombine() though, so add to the tests from rL170471 to demonstrate at least some of the diffs without needing -reassociate?
To be sure I'm understanding, you mean demonstrate some diffs in those tests using the flag 'reassoc' rather than 'fast' (without needing -reassociate), right?
Yes. Example from fast-math.ll:
; (X + C1) + C2 => X + (C1 + C2) define float @fold5_reassoc(float %f1, float %f2) { ; CHECK-LABEL: @fold5( ; CHECK-NEXT: [[ADD1:%.*]] = fadd reassoc float [[F1:%.*]], 9.000000e+00 ; CHECK-NEXT: ret float [[ADD1]] ; %add = fadd float %f1, 4.000000e+00 %add1 = fadd reassoc float %add, 5.000000e+00 ret float %add1 }
So put something like that under the existing test to show the minimal FMF requirement. Or just replace the 'fast' in the existing test with 'reassoc'.
I have a concern that meaningful transformations based on the 'reassoc' flag alone may require both -instcombine and -reassociate.
There could be some kind of symbiotic behavior here, but this is really saying that we don't have a well-defined division of labor between these passes, right?
My longer question relates to:
... but if others are confident that it's just reassociation (without need for nsz/nnan/ninf), then I suppose this is ok to loosen up.
In a sense, I do feel we should check 'nsz' in addition to 'reassoc' (and maybe others). And similarly, the tests in the other files that I modified in this patch possibly should only do the transformation if 'reassoc' and 'nsz' (and others?) are set. But when starting with C/C++ (when -ffast-math isn't specified), we only will set 'reassoc' if some other switches in addition to -fassociative-math are passed. Specifically:
-fassociative-math -fno-signed-zeros -fno-trapping-math(details at D39812).
So in a sense, an instruction that has only 'reassoc' set on it isn't complete, in that it cannot be generated by Clang. I feel like this is an IEEE thing, rather than a C/C++ thing (so other language front-ends ought to do something similar). But from this perspective, is checking only 'reassoc' OK? Or is should we change tests that have only 'reassoc' on an instruction to also have 'nsz'? Which then leads me to the problem/question that -fno-trapping-math doesn't have a bit in FMF, so it creates a new dimension to the problem/question.
Anyway, I took the interpretation that when the 'reassoc' bit is set, it in a sense means the equivalent of at least -fassociative-math -fno-signed-zeros -fno-trapping-math from a C/C++ command-line perspective. I'm not sure how we should interpret this. As I write this comment, I feel like I've opened a can of worms.
Hopefully, not. :)
We need to be careful here - just because clang is behaving some way doesn't mean other front-ends are doing the same. We defined the LLVM FMF (maybe not precisely enough in the LangRef) so 'reassoc' and 'nsz' are independent bits.
I've made several recent changes to the FMF requirements in instsimplify/instcombine assuming that 'reassoc' does not imply anything beyond the ability to rearrange math ops. If the sign-of-zero doesn't match the IEEE requirements, then the transform needs 'nsz' too. Trapping isn't a concern because we assume a default FP environment (see recent edits to the LangRef).
So a few ways forward:
- Easiest: change the FMF requirements in this patch to reassoc + nsz. We're confident that this code only deals with fadd/fsub, so { nnan , ninf , arcp , afn , contract } shouldn't be in play.
- More involved: audit (and add tests for) all of the potential transforms enabled by FAddCombine and determine when only 'reassoc' is needed.
- Ideal: audit instcombine and the reassociate pass and split up the transforms, so we don't have redundancies - or at least the redundancies are justified and documented. Note that this goes beyond just the FAddCombine chunk of code. We also do reassociation in instcombine under the name SimplifyAssociativeOrCommutative(), and I think FAddCombine() has redundant code for things that are handled there too.
Thanks for the clarifications Sanjay!
Yes, a lack of a well-defined division of labor between these passes is exactly my concern. I'll look into adding or modifying tests. Depending on possible interaction, that may not be fruitful. Ultimately, that should be addressed.
To be explicit, my immediate goal is to make reassociation "work" when unrelated FMF are disabled. For example:
float foo(float arg, float x) { return (arg + x) - x; }
should simply return arg when compiled at -O2 -ffast-math -fno-reciprocal-math. But as it currently stands, since -fno-reciprocal-math makes isFast() false, disabling the reciprocal transformation incorrectly prevents the reassoication.
So although ultimately the -instcombine/-reassociate division of labor should be addressed, I'd prefer to make this small step in improving the FMF status without doing that more substantial piece of work.
So in a sense, an instruction that has only 'reassoc' set on it isn't complete, in that it cannot be generated by Clang. I feel like this is an IEEE thing, rather than a C/C++ thing (so other language front-ends ought to do something similar). But from this perspective, is checking only 'reassoc' OK? Or is should we change tests that have only 'reassoc' on an instruction to also have 'nsz'? Which then leads me to the problem/question that -fno-trapping-math doesn't have a bit in FMF, so it creates a new dimension to the problem/question.
Anyway, I took the interpretation that when the 'reassoc' bit is set, it in a sense means the equivalent of at least -fassociative-math -fno-signed-zeros -fno-trapping-math from a C/C++ command-line perspective. I'm not sure how we should interpret this. As I write this comment, I feel like I've opened a can of worms.
Hopefully, not. :)
We need to be careful here - just because clang is behaving some way doesn't mean other front-ends are doing the same. We defined the LLVM FMF (maybe not precisely enough in the LangRef) so 'reassoc' and 'nsz' are independent bits.
OK, I can certainly be on board with that. My point about it feels like an IEEE thing rather than a C/C++ thing was to imply that other front-ends really should be doing the same. But in any case, I do like that clarity of (for example) 'reassoc' and 'nsz' explicitly being understood to be independent bits (or more directly, that 'reassoc' alone isn't generally enough to enable reassoication). Further, that clarification helps to keep the can of worms closed.
...
So a few ways forward:
- Easiest: change the FMF requirements in this patch to reassoc + nsz. We're confident that this code only deals with fadd/fsub, so { nnan , ninf , arcp , afn , contract } shouldn't be in play.
- More involved: audit (and add tests for) all of the potential transforms enabled by FAddCombine and determine when only 'reassoc' is needed.
- Ideal: audit instcombine and the reassociate pass and split up the transforms, so we don't have redundancies - or at least the redundancies are justified and documented. Note that this goes beyond just the FAddCombine chunk of code. We also do reassociation in instcombine under the name SimplifyAssociativeOrCommutative(), and I think FAddCombine() has redundant code for things that are handled there too.
Short term, I'll intend to do (1) -- this approach will let me meet my goal of making reassociation work when unrelated FMF are disabled (at least it will in some cases). In doing that, I'll experiment with updating the tests in fast-math.ll, like you suggest.
Also in doing that, I'll look into (2) at least somewhat, to see if there are additional changes that clearly would go well as part of this sort of change.
Longer term, I'll look into (3). Not sure when I'll get to that, though.
I've updated the code-change in "InstCombineAddSub.cpp" to require both 'reassoc' and 'nsz' to do the transformations. (I haven't changed the name from FAddCombine to FAddSubCombine or ReassociateFAddFSub. I could do that if we move forward with this.)
There could be some kind of symbiotic behavior here, but this is really saying that we don't have a well-defined division of labor between these passes, right?
Yes, a lack of a well-defined division of labor between these passes is exactly my concern. I'll look into adding or modifying tests. Depending on possible interaction, that may not be fruitful. Ultimately, that should be addressed.
In looking this over, I was pleasantly surprised that I didn't run into the troublesome interaction I was concerned about. So I've added new tests to "InstCombine/fast-math.ll" to check for the transformations happening when 'reassoc' + 'nsz' are on (with only '-instcombine' used). I've also added tests that verify that 'reassoc' alone doesn't enable the transformation in the appropriate places. Lastly, I updated the tests I had previously modified to use both 'reassoc' and 'nsz' where appropriate when testing for the minimal set of needed flags to transform things.
Thanks - more instcombine tests in this area are needed IMO.
I think the code change is fine in that it removes the unrelated FMF from these transforms. And it may be that after this change, we're 'good enough'. Ie, it may be that nobody cares about splitting the FMF hairs further than this.
But I have pointed out some test comments that I don't think are correct. I didn't look at all tests, but I suggest softening the language in general and adding 'TODO', so if someone does decide to improve more, the tests acknowledge that what we have currently isn't complete.
test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
61–62 | x * 2.0 + x --> x * 3.0 This doesn't require 'nsz'? If x = -0.0, this preserves that the output is -0.0. In fact, this exact case doesn't need any fast-ness? (cc @scanon) | |
99–101 | (4.0 - x) + (5.0 - y) --> 9.0 - (x + y) Again, I don't think this actually requires 'nsz'. Maybe it's better to mark all similar tests with "TODO: We may be able to remove the 'nsz' requirement." | |
232–235 | See the earlier link - another potential special-case where we don't need any FMF. | |
407–414 | -x + y --> y - x This doesn't require any FMF. | |
429–436 | x + (-y) --> x - y Again, no FMF needed at all. |
Thanks Sanjay. Updated the tests to be more careful about when 'nsz' is and isn't needed, along with comments to clarify. Also simplified some tests where no FMF is needed (irrespective of this patch).
test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
61–62 | I've updated the comment about 'nsz' not being needed, and I've changed the test so as not to be exercising the special-case of 3*x (and added a comment about the special-case as a TODO, in case anyone ever wants to take advantage of that point). | |
99–101 | Thanks. In looking, I think this one definitely doesn't require 'nsz', so I've added a TODO comment that's says 'nsz' is unneeded (rather than "may be able to remove..."). Same for many of the others, so I've added similar TODO comments, where appropriate. | |
232–235 | I've updated the test to have the opportunity for more folding, so it's not the special-case of x+x+x. | |
407–414 | Good point! In fact, it does fold without any FMF, even without my patch. I've updated the test to demonstrate the folding without 'fast' (or anything else). | |
429–436 | As with 'fold14()' above, this does fold without any FMF, even without my patch. Test updated. |
LGTM - see inline for a couple of potential follow-ups.
test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
379 | Maybe independent of this patch, but the fmul shouldn't need any FMF for this fold, so it's worth checking if that actually happens. Another consideration is whether the fmul hasOneUse(). We definitely don't have enough multi-use tests to show that folds are not unintentionally creating extra instructions. | |
407–414 | Nit: it's a bit crazy that there are no existing regression tests for these always-safe fneg folds; must be from the early cowboy compiler days. :) |
test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
379 | Hmmm... You're saying that the transformation: |
test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
379 | The fsub needs FMF, but not the fmul. That's the general rule we're using in FMF-enabled folds. The non-strict final value (the result of the fsub) is what we're reassociating/factoring here, so as long as that instruction has reassoc+nsz, it doesn't matter if the fmul intermediate value is strict. |
The fsub needs FMF, but not the fmul. That's the general rule we're using in FMF-enabled folds. The non-strict final value (the result of the fsub) is what we're reassociating/factoring here, so as long as that instruction has reassoc+nsz, it doesn't matter if the fmul intermediate value is strict
Got it. Thanks for explaining,
x * 2.0 + x --> x * 3.0
This doesn't require 'nsz'? If x = -0.0, this preserves that the output is -0.0.
In fact, this exact case doesn't need any fast-ness? (cc @scanon)
https://reviews.llvm.org/D31164?id=92521#inline-270764