- User Since
- Aug 18 2016, 4:39 AM (68 w, 4 d)
@RKSimon I saw that you committed a couple of changes fixing some GCC warnings in the Hexagon backend. Thank you very much for that! This patch here fixes the same warning as rL320254, in a different way.
Sat, Dec 9
Fri, Dec 8
Fix code to use the first debug location found, not the last! Updated the test case with multiple debug locations, to check we actually pick the first one, if there are multiple blocks with debug info.
Thu, Dec 7
Thanks! I agree with Matthias in that a target feature would be better to enable/disable aggressive FMA. In AArch64.td there are already quite a lot of similar target features, like FeatureFuseAES or FeatureSlowSTRQro that enable certain optimizations for some micro architectures.
This seems like a bug indeed! Could you add a test case without a preheader, to make sure we catch this regression in the future?
Added more test cases and fixed infinite loop/
Wed, Dec 6
Thanks for having a look Eli! Dropping all the info is not intentional, this was missed during review. I've patch that adds debug locations D40432 and will follow this up with another patch preserving other metadata & co
Accept in Phabricator after LGTM by Gerolf and Evandro, to make Arc happy.
Thanks for having a look. There are no perf regressions on Cortex-A57 on SPEC2k, the lnt test suite and SPEC2006. I'll commit this now, but will also run SPEC2017 on another board.
Thanks for all the feedback, it's been really helpful. I have updated the code to look for a valid debug location in all extracted blocks and added a test case where there is no debug loc in the header.
Thanks Evandro. I think this looks good to me now. I still think it's unfortunate that we basically duplicate the loop to add artificial dependences, but I don't see a way to share the code without making it more complicated than it is now. Please wait a bit with committing, to give Matthias a chance to voice additional concerns.
The comment next to the VecValuesToIgnore declaration explicitly states that those instructions should be skipped during cost modelling if VF > 1 (/// Values to ignore in the cost model when VF > 1.)
Tue, Dec 5
Remove lnt update instead of fixing it.
Thanks Sam, LGTM too.
Mon, Dec 4
Thanks for having a look! I'll address the assertion before committing tomorrow and also think a bit about the PGO case.
On a related note, the MachineCombiner ignores enableAggressiveFMAFusion, so at -O3, the example won't be combined. I plan to put up a patch soon that updates the machine combiner to consider combining multiple uses, if profitable.
Fri, Dec 1
Thanks for having a look! I ran clang-format now.
Ping. This change only replaces the call site if there actually are varargs to forward.
LGTM, as long as you are happy with the speedup.
Great, thanks Sam. This is a NFC, could you change the title to something like [DAGCombine] Remove unused isAndLoadExtLoad arguments (NFC)
Thu, Nov 30
Nice, thanks Eli. Looks good to me. It might be worth waiting with committing till tomorrow in case Sam has any more thoughts.
Wed, Nov 29
Thanks Sander. IMO this is a nice improvement to the error messages and not SVE specific. Could you update the title & description? The only potential issue I can think about the same assembler string being enabled by multiple target features. What would happen in that case? The worst thing that can happen is only printing 1 of the target features to enable it?
Tue, Nov 28
The change for LoadPostIdx looks good, I am not entirely sure about StorePostIdx. Could you elaborate why we should remove ReadAdrBase there? Maybe @javed.absar has some thoughts too.
Looks good to me, with some nits. However it seems that we do not use the fast math flags anywhere else in Clang codegen, so it would be good to clarify if it is OK to use it here.
Thanks Evandro for having a look! I would appreciate any feedback from Gerolf or Sebastian, as they reviewed/added the change summing up the latencies of the deleted instructions.
Mon, Nov 27
LGTM, given that happy with the slight change in behavior in llvm/test/CodeGen/AArch64/arm64-ldp-cluster.ll for exynos.
Thanks Sander, looks good to me now. I've added Eli and Eric as reviewers, please wait with committing so they can have a look too.
Addressed Evandro's comment.
Thanks Sam, looks good to me. Please wait a day or two with committing, to give others a chance to raise additional concerns.
Fri, Nov 24
Improve test slightly.
Wed, Nov 22
Tue, Nov 21
Looks like a straight-forward simplification to me, but it would be good if someone else could have a look too.
I ran into this being a problem in D40306. The patterns added there are are not profitable on Cortex-A57, but because the NewRootLatency is under-estimated, the unprofitable pattern gets applied.
Mon, Nov 20
This needs some more work. I'll update the patch and post some data once I am done.
Sat, Nov 18
Fri, Nov 17
> Can I ask why you chose to tackle this one? What motivated you to care about the memory consumption of LVI? Just curious about the background and motivating test cases.
Thanks @reames ! This patch should introduces a more compact lattice element representation using PointerIntPair, not the ConstantRange folding set.
Thu, Nov 16
just 2 nits that might be worth considering. Otherwise LGTM as a fix for the reverted D29219
Add check to ensure we only split call sites in tryToSplitOnOrPredicatedArgument if the header block has exactly 2 successors. I plan to post a patch relaxing that soonish.