Can you add a test using llvm-mc -show-inst ?
Sigh. I should probably have found that problem. I hadn't considered multiple stores overriding happening like that.
I was under the impression that without a -mcpu it defaulted to cortex-a53 schedule. It looks like it's no-schedule though, which still counts as an in-order core as it has no MicroOpBufferSize. Can we check if ST->getSchedModel().ProcID != 0, which will be the "NoSchedModel".
Mon, Apr 19
Thanks for the change. This looks very sensible to me, so long as the other SVE folks agree.
Fri, Apr 16
Rebase and move the condition logic into the start of IsGuaranteedLoopInvariant.
Nice one, thanks. This Looks good to me.
Thanks. Seems like a useful step forward. LGTM.
Thanks for the patch! Given that this only shifts the invocation of LI, there shouldn't be any problems in terms of compile time.
Add a ContainsIrreducibleLoops flag, Move LoopInfo and add some irreducible tests.
Thu, Apr 15
It would be good to see some extra tests for various edge cases, like offsets near to the boundaries and different pairs of instructions being combined/not.
Like D100463, could this be done in instcombine/TTI::instCombineIntrinsic?
Hi @dmgreen! This is SVE-specific, and SVEIntrinsicOpts.cpp is where such transformations are typically placed (at least for now). I did a quick grep and it seems SVE intrinsics don't currently have much of a presence in generic passes like instcombine/constant folding, perhaps because some SVE optimisations are more complex than others and I guess it makes sense to keep them all in the same place.
X86 also has llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp which houses instcombine-like optimisations for X86. 🙂
I think I managed to convince myself that this is correct. But there is a lot that could go wrong and subtly forgotten, this code has been a bit error prone in the past. Hopefully now that it's gained some infrastructure that's less likely.
Wed, Apr 14
Looks like a mechanical extension to the other changes.
I would have gone the other way, making the mtriple and mattr both command line args. LGTM either way though.
Why is this kind of thing not done in instcombine? Or even constant folding if that is what this is really doing.
Can you base this on top of D99723? Some of the code may be able to be shared, even if there will be differences.
Tue, Apr 13
You folks all know more about scheduling than I do, but if you are in accord that is great. I can certainly verify that this improves the accuracy of the M7 schedule for the samples I've seen.
Mon, Apr 12
This looks sensible to me, if we can get the scheduler to agree.
This LGTM too.
Yep. "cheap" -> "beneficial to convert"
Add some brackets to a comment, to help readability.
I don't have much context, but I'm just wondering if a similar optimization for csneg might be useful ?
sub(0, csneg( X, Y, <cc>) ) = csinv -X, -Y-1, <cc>
Sat, Apr 10
Looks good, expect for the one getValue.
Looks simple. LGTM
Seems straight forward.
Fri, Apr 9
Thu, Apr 8
I'm a little worried that WLSTP is going to cause problems, with it not used anywhere else. Lets at least add an option for disabling it needed.
Wed, Apr 7
But the other thing I was just wondering, not that I mind these patterns here, but are we not expecting that the VDUP is sunk to its user? I think that's probably what I would expect, but don't know if that is a fair expectation.
LGTM, so long as the test is cleaned up a little
Hello. We've received reports that this is bloating codesize of some code, quite a lot in places. There is an example in https://godbolt.org/z/66TEKa1xK. Essentially the glomming together of reads/writes into i32's (in our case) helps to reduce the total number of loads/stores needed. Splitting that up into individual i8/i16's creates a lot more load/mask/load/mask/or/store sequences.
Tue, Apr 6
Mon, Apr 5
Fri, Apr 2
Thu, Apr 1
Thanks. My tests agreed, LGTM
renamed some functions,
added comments to test (and updated)
updated some incorrect code:
- adjustBBOffsetsAfter() is called with BBPrevious as input since BB is moved, which would cause change in offsets after it.
- code checking for LE to loopExit now starts search from the MBB after loopExit. Updated the test accordingly.