- User Since
- May 24 2016, 8:35 AM (243 w, 3 d)
#pragma clang loop vectorize_predicate(disable) vectorize_width(4) Gives llvm.loop.vectorize.predicate.enable=false, llvm.loop.vectorize.width=4, llvm.loop.vectorize.scalable.enable=false, llvm.loop.vectorize.enable=true
Sounds good to me. Can you clean up the tests a little?
but someone who knows AArch should comment.
Oh, that's a good idea! Updated to use InstructionCost where it can. Thanks.
The code changes look simple enough, LGTM.
Rename to getExtendedAddReductionCost and adjust some hasOneUse early exits.
Wed, Jan 20
I could not replicate the bug in https://bugs.llvm.org/show_bug.cgi?id=48546 with the latest clang, as I mentioned in the ticket. Can you provide a more specific reproducer, and use it as a test case for what is being fixed?
Tue, Jan 19
There are some tests in llvm/unittests/Support/Host.cpp. Can you add test for these cpu's and the multiple infos you were seeing? I suppose it now gets the info from the last one?
Fix base getExtendedReductionCost to use Extend Type for the reduction cost.
Mon, Jan 18
I only looked at the ARM equivalent. From what I remember, the sequence of events was something like:
- One of the two operands to the mul was converted from a sext to an anyext. The other was not due to having multiple uses.
- That anyext was folded into a load to produce a zextload (we don't produce a vector anyext load)
- We couldn't match anything due one operand being a sext and the other being a zextload.
So in that case we would either need to use demanded bits know the top bits are not needed when converting it to a mull, create an anyextload instead of a zextload or handle multiple uses so both inputs turn into anyext or zextloads.
Thanks. Hopefully that doesn't just cause other problems :)
I believe this sentence is the important part (the width=1 is just an edge case):
For all other non-zero vectorization widths, the pragma is not ignored unless vectorization is explicitly disabled using vectorize(disable)
Sat, Jan 16
Thanks for the patch. There is a patch to make MVE consistent with the rest of MVE in D94867. This will need rebasing on top of that, with update tests to make the two consistent again.
Fri, Jan 15
LGTM. We sometimes generate a lot of shuffles in an attempt to do lane interleaving and I know the simplification of them isn't always what it could be once all the lowering has happened. I thought more happened through simplifying buildvectors but apparently not. This looks like a good continuation to the existing code.
Wed, Jan 13
Thanks for the changes. LGTM
Mon, Jan 11
Oh yeah. Sorry. Forgot about that.
Fri, Jan 8
Rebase and ping. I also adjusted some code and better dealt with loop invariant operands.
Thanks. I made a typo in the summary, it should have said "do not seem to be correct", not "do seem...".
Hello. I tried running our downstream benchmarks with this patch and it did not appear to have any effect, either in performance or codesize. (That doesn't mean that nothing is effected, but it's at least a good sign).
Thu, Jan 7
LGTM thanks, but please try and simplify the test case if you can.
Wed, Jan 6
Thanks. LGTM with a couple of suggestions.
Added a unit test, that caught that the Parent was not set correctly.