- User Since
- May 24 2016, 8:35 AM (190 w, 3 d)
I feel that this should be an decision made inside the pass, not necessarily in the pass manager. We may have a single module with both functions that are and are not optsize. The inliner/unroller would change thresholds based on this, for example.
This looks useful as a separate function, the interface looks pretty clean and I think it lives nicely in LoopUtils.
These seems like a sensible thing to go along with D72714.
Seems good to me, if Sam is happy with the tail predication parts.
Thu, Jan 16
Wed, Jan 15
Nice one. Good to see codegen changes coming out of these intrinsics.
Thanks! This is mostly a copy of the existing Neon code, adjusted for MVE. Hopefully fairly straight forward.
I just copied the existing tests with post-inc versions.
Update another test.
Tue, Jan 14
I had not realised that this required LCSSA form. That will likely cause many more changed than you really want. It's probably best not to create LCSSA form here without destroying it afterwards. Do you know if LoopSimpleForm is needed too? It looks from the code that it shouldn't (it can handle multiple exits).
LGTM by the way. I hadn't meant to tread on Sjoerd toes, I didn't see he was taking a look already. The issues were mostly unrelated, this looks like a nice improvement on it's own.
Mon, Jan 13
Sounds good. I think that if we wanted to make VPSEL work, it would be better to try to convert them to VMOVt in a VPT block. (Although if there is only one of them, creating the block maybe more overhead. Hopefully the VCMP can be folded in).
This looks sensible, using the VPT blocks to know that the instructions are predicated on something that makes tail predication valid.
Sorry for the long delay here. This wasn't making things much better in the tests I was trying (it was a bit up-and-down). I've adjusted it now to include shouldFavorPostInc for MVE subtargets, not just disable shouldFavorBackedgeIndex. That should get the costmodel more correct in LSR, where the AddRec is now free because it can just be the postinc. I also removed the old "containsVectors(L)" check as adding something that is O(n) to the inner parts of LSR, something that is already O(something large), was probably a bad idea.
LGTM. Nice one.
Sun, Jan 12
Yeah, thanks. The square case comes up quite a lot.
Fri, Jan 10
This I like. I like the structure. I think it's nice to split it out like this.
Sounds good to me.
Yeah, I said these would be tough to read! LGTM
Thu, Jan 9
Nice test. LGTM
I agree that IntrNoMem makes a very sensible default.
Ah. LGTM. Thanks
Wed, Jan 8
LGTM I think
I imagine trying to read what a llvm.arm.mve.vshl means in IR would be quite difficult, but it does make lowering them simpler. LGTM
LGTM then. Thanks.
Tue, Jan 7
If UnJ needs loops in LoopSimplify form then it should request (or enforce) that loops are in that form. I think it can be done with AU.addRequiredID(LoopSimplifyID); (and maybe a DEPENDENCY on LoopSimplify?) That should hopefully mean that the tests don't need these adjustments.
Thanks. LGTM, if no-one else has any other comments.
Mon, Jan 6
OK. LGTM then.
Update as per the comments. I may leave this to be committed until the MVE codegen is further along (it depends on the first patch at least).
This is only the new pass manager, right? It's probably best to keep the old pass manager working in the same way too, to keep them in sync. That will be the one that we use downstream at the mo.
Sun, Jan 5
Fri, Jan 3
Move more of the logic into Subtarget.
Looks interesting. Thanks for splitting it out, it makes it a lot clearer.
Sounds good. We could add an arm64 triple run line to the existing tests, but I'm not sure it adds much.
Thu, Jan 2
Agreed, this is the place that gathers all symbolic strides for which runtime checks are later added by replaceSymbolicStrideSCEV().
Agreed, a gather or scatter would probably be a better option than a runtime check, certainly if the stride turns out to be other than 1, in which case the runtime check option will execute the original scalar loop.
Note that if the stride does turn out to be 1, a runtime check may be faster: the cost of a vector load/store is typically less than that of a gather/scatter, disregarding the overhead of the runtime check itself. So having a way to "manually" restore original performance for such cases may be useful (in addition to EnableMemAccessVersioning). Always preferring a gather or scatter as suggested should be good step forward, given the expected complexity of devising a cost-based preference.
Instead of teaching LAI to make such target/cost-based decisions, it would have been better to let this analysis continue to collect *all* symbolic strides potentially subject to runtime checks, and teach the planning/transform to prune/decide which strides to actually specialize; e.g., have LVP::plan() start by calling "CM.setVersioningForStrideDecisions()", analogous to InterleavedAccessInfo::analyzeInterleaving() which collects all potential interleave groups, and CM::setCostBasedWideningDecision() which decides which of the groups to materialize (per VF). However, this requires a fair amount of refactoring; worth a TODO?
The AArch64 changes look good to me. Thanks
Wed, Jan 1
Mon, Dec 30
Sounds good. Can you add a test?
Sounds interesting. My understanding is that there were several features of IT instructions that were performance deprecated. Some of the instructions inside the blocks and accesses to PC were ruled out too. Take a look at isV8EligibleForIT for how they are disabled (I happened to be looking at it the other day for different reasons. Someone wanted to try restrict-it on other cpus).