- User Since
- Jan 29 2015, 2:13 PM (393 w, 1 d)
Tue, Jul 26
Mon, Jul 25
Thanks for the quick reviews!
Sun, Jul 24
Fri, Jul 22
Fri, Jul 15
Thu, Jul 14
- Remove the cleanup that has been done separately (and committed as 230c8c56f21cfe4e23a24793f3137add1af1d1f0).
- Check only for reassoc and nsz (rather than isFast()).
Jul 14 2022
Committed at 230c8c56f21cfe4e23a24793f3137add1af1d1f0.
Added an integer version of a reassociation test, including a TODO of a possible further improvement.
Jul 13 2022
Addressed review comments from @spatel , except for adding an integer test.
To add some context to make it easier to review, the approach taken here of adding a ReassociatePass::OrderedSet parameter (named ToRedo) to the function LinearizeExprTree , is the same technique already used in two other functions in "Reassociate.cpp": NegateValue and BreakUpSubtract.
Jul 12 2022
I'll split this into two independent patches.
I've split off the miscellaneous cleanup into a separate patch: D129612. Once that wraps up, I'll update the patch here to just be the improved optimization with reassoc and nsz.
Thanks for the quick feedback!
Jul 11 2022
Assuming this approach is sensible, I'm raising the question of whether this patch should be split into two.
Apr 26 2022
Thanks for the review @RKSimon !
Apr 22 2021
Clear simple, safe fix. And I've verified it fixes the problem we ran into downstream.
Mar 31 2021
Dec 8 2020
This commit appears to be the root cause of a run-time crash related to the interaction of global initializers and the wrapper functions to access thread_local variables -- reported as PR48030.
Nov 18 2020
Sep 2 2020
Sep 1 2020
Jun 22 2020
Mar 11 2020
Revisiting this patch. I think that before fixing the -ffp-contract=off problem I originally raised here, there are two questions that have come up in this discussion that we should first resolve. Specifically:
Feb 26 2020
Jan 27 2020
Used the clearer '!off && !on' (rather than '!(off || on)') in the checks.
Jan 26 2020
This is a bit of an oddity in our handling.
Yes it is!
This is certainly getting more complicated than I had originally expected. I feel there isn't a clear spec on what we want in terms of whether FMA should be enabled "automatically" at (for example) '-O2', and/or whether it should be enabled by -ffast-math. I'm very willing to make whatever change is needed here, to meet whatever spec we all ultimately agree on.
So I think this patch should be put on hold until we decide on these wider aspects.
Jan 24 2020
Update a comment to remove the discussion about enabling the __FAST_MATH__ preprocessor macro. The handling of the setting of __FAST_MATH__ is done in "clang/lib/Frontend/InitPreprocessor.cpp", and once we decide on the set of conditions that enable that definition, we can add an appropriate comment there.
A separate question is the interaction of -ffast-math with -ffp-contract=. Currently, there is no such interaction whatsoever in GCC: -ffast-math does not imply any particular -ffp-contract= setting, and vice versa the -ffp-contract= setting is not considered at all when defining __FAST_MATH__. This seems at least internally consistent.
Jan 17 2020
Updated patch to correct a comment and fix a typo.
Jan 16 2020
Jinx ;) This took quite some tracking down on our side - too bad we didn't wait a couple of weeks.
Changing this to address only the Clang driver aspect, as discussed in the comments. Will spin off a separate patch for the LLVM side.
One commit for the clang changes should be ok; it's a very small diff. But I'm still not sure if the driver change induces frontend diffs that we should make visible via tests.
The only thing I can think of is that it changes whether/when __FAST_MATH__ is defined. But that'll be indirectly tested, via the updated tests in "Driver/fast-math.c", which will verify that "-ffast-math" is passed appropriately (and the __FAST_MATH__ dependency on "-ffast-math" is already tested in "Preprocessor/predefined-macros.c").
This all sounds good to me.
Jan 15 2020
Jan 14 2020
This is a follow up to https://reviews.llvm.org/D72529.
Independently, we came across this same problem, and I proposed a patch at D72469 (which has been committed) that is essentially the simple fix that is described above:
Addressed comments from @hfinkel .
Thanks for the quick feedback @hfinkel
Jan 13 2020
Jan 9 2020
Dec 19 2019
Thanks for the quick review.
Dec 13 2019
Given the performance improvements here, I'd like to develop this patch further.
Dec 12 2019
Nov 4 2019
This probably needs to be taken over by someone who cares about full LTO performance
Oct 4 2019
Jul 19 2019
Jul 17 2019
Jul 15 2019
Do we need to keep the old behavior on platforms where clang is the de facto compiler?
Jun 17 2019
Jun 13 2019
Jun 12 2019
Jun 11 2019
Thanks @fhahn. I'll update the patch to address those comments.
This patch was accepted by @andreadb about a week ago, but Andrea said he'd leave the final decision to @RKSimon. With Simon's comments from a week ago asking @fhahn if he had any preferences for keeping the current default as-is, or changing it to something that seems to reflect existing hardware better, I've been hesitant to actually commit it, even though it's marked as Accepted. With a week now gone by, I'll plan on committing this tomorrow, unless Simon or Florian raise any concerns.
Jun 4 2019
May 31 2019
Updated the patch to allow arbitrary alignment of float and double nt-stores for SSE4A.
May 28 2019
May 16 2019
May 9 2019
One point I want to explicitly raise, is that prior to this proposed change, the current vectorization implementation ignores the nontemporal hint when checking whether a memory op is vectorizable. So conceptually, it's as though the two new routines:
bool isLegalNTStore(Type *DataType, unsigned Alignment)
bool isLegalNTLoad(Type *DataType, unsigned Alignment)
return true for all types and alignments. With that in mind, I've made the (default) target-independent implementations of these functions return true -- it's only the overriding functions on X86 (which is the target PR40759 was reported against) that will ever return false here (so on non X86 targets, this would be an NFC commit). But in looking through the other backends, I believe that none of them support misaligned nontemporal mem ops. So possibly a better default would be to have these new functions return true only if the specified DataType is minimally aligned at Alignment.