- User Since
- Jan 26 2016, 2:17 AM (260 w, 3 d)
I guess this is a typo, and should be vectorize_predicate(enable)
Gives llvm.loop.vectorize.predicate.enable=true, llvm.loop.vectorize.width=4, llvm.loop.vectorize.scalable.enable=false, llvm.loop.vectorize.enable=true
#pragma clang loop vectorize_predicate(enable) vectorize_width(1)
Gives llvm.loop.vectorize.width=1, llvm.loop.vectorize.scalable.enable=false
#pragma clang loop vectorize_predicate(disable) vectorize_width(1)
Gives llvm.loop.vectorize.width=1, llvm.loop.vectorize.scalable.enable=falseShould we be adding llvm.loop.vectorize.predicate.enable metadata even when width=1? I would be inclined to emit the predication pragma anyway.
Thanks @shchenz, comments addressed.
Thanks, LGTM too.
Wed, Jan 20
- Ported over the only positive test test/CodeGen/X86/sink-cheap-instructions.ll to a MIR test: this is sink_add in the new MIR test file.
- Walk candidates in reverse order, to start at the bottom of def-use chains, and try using the uses first before attempting to sink a def.
Tue, Jan 19
- Added statistics NumLoopSunk, the number of instructions sunk into a loop
- Check that a use of candidate is in a loop,
- Added a bunch of negative tests.
Mon, Jan 18
Ah yes, thanks. I think we should improve the langref spec also, this should be described in the same paragraph, not floating below some example.
I am trying to remember details here, but first about this:
Thank you very much for looking at this @shchenz !
Fri, Jan 15
Looks like we get a lot of happiness from using these attributes. ;-)
I had a look at it before and the general direction also looked okay to me too, which I was happy to see confirmed. Therefore I am happy to accept this, even though I haven't checked all intrinsics because this is a lot of changes. So perhaps wait a day with committing in case anyone wants to have one more look.
Thu, Jan 14
Yep, don't think it is useful or necessary to run this at O0.
I was going to look at this today, but thanks for fixing! :-)
Wed, Jan 13
To avoid code-size bloat, this is indeed the behaviour that we want.
Nice one, thanks, LGTM.
Tue, Jan 12
Little friendly ping: how do we like this little reshuffle?
Mon, Jan 11
Adding to my previous comment:
Here's my "pen-and-paper exercise" of the problem that we are solving here, and lets start with some examples.
In these examples, empty blocks means there are no instructions in it that are relevant to this problem.
Fri, Jan 8
I have upload WIP patch D93694 to show the direction and motivation for this change here.
Thu, Jan 7
Thanks for getting to the bottom of this. Agreed, and also LGTM.
Fixed formatting and description of the command line option. Related to this, and in addition to my previous message, this is off-by-default and is currently covered by exactly 1 test:
test/CodeGen/X86/sink-cheap-instructions.ll. After this NFC change, I will look into extending this.
See updated Title and Summary.
Wed, Jan 6
Looks like a good fix to me.
LGTM, perhaps wait a day with committing in case there are more comments.
Tue, Jan 5
Nice one, LGTM.
Yep, thanks, that should have been const.
Yeah, so thanks again for looking at this. After looking more into this, I don't think this change at the moment makes much sense.
Mon, Jan 4
Dec 22 2020
Hmmmm, good points, will check. I was assuming that the legality checks were performed, but yeah, this is impressively broken if not!
Now with a MIR test.
Dec 21 2020
Dec 15 2020
Dec 14 2020
Perhaps some tests are missing: one inlined suggestion is related to AssumeITCMBankConflict, and I was wondering about other cases like Size > 4 and SP relative access?
Dec 10 2020
Dec 9 2020
Dec 8 2020
Looking good in general. I think we would need some perf numbers to see if we haven't missed something.
Some more remarks inlined.
Looks like a good change to me, and this looked reasonable:
Cheers, looks like a good fix to me.
Thanks for the feedback.
Dec 7 2020
Dec 3 2020
The short story is that this was "accidentally" getting things right.
Dec 2 2020
Yep, LGTM too
I have some high-level questions:
- Are we fighting another optimisations here, some sort of loop-rotate or is this just MBP reshulffling blocks in a way that is not good for us?
- I know we don't nest WLSTPs for profitability reasons, but in theory we could. Not sure we need to check this though. But in general my impression is that some more test can be added, but perhaps you were still working on that.
- I am wondering if some cost-modeling is required. For example, if the iteration count is very low, would that change things?
Dec 1 2020
Thanks, and I will look into InsertValue too. That may take a few days, but will certainly do that.
Nov 30 2020
Decided to use TTI.getInstructionCost because getVectorInstrCost doesn't work here; it's an aggregate type, not really a vector.
That means the cost for a ExtractValue will be Free now, which I think is actually more accurate (than 1, and also the original 4).
I haven't added InsertValue yet, because that is e.g. not supported by getInstructionCost as it seems a little bit of a different beast.
Thanks. While I do have plans to look into this more soon, I think it's time to commit this thing as it has been proven to do some good, so will do that a bit later today.