- User Since
- May 24 2016, 8:35 AM (111 w, 6 d)
I like the idea of this, giving control to the user to figure out the best way to manipulate loops (even if they end up shooting themselves in the foot with it :) )
Sat, Jul 14
Hello. I just have some test Nits.
Thu, Jul 12
Sun, Jul 8
Fri, Jul 6
The updates looks correct to me. Can you add a test for pgo-memop-opt that preserves the DT and verifies it's correct.
Thu, Jul 5
Wed, Jul 4
Mon, Jul 2
+1. Nice work guys.
Sun, Jul 1
Thu, Jun 28
OK. The important DA patches are in. Unless anyone objects, I will commit this soon (hopefully over the weekend).
Mon, Jun 25
Fri, Jun 22
Hello. Glad to see this is moving forward! This looks like it's going to be very useful.
Thu, Jun 21
Wed, Jun 20
Very minor update as NewLoad is unused in release builds.
Tue, Jun 19
LGTM. This looks simple enough. Perhaps wait enough for anyone on a different time zone to object.
You may need to update Transforms/LoopSimplifyCFG/scev.ll now too (which is hopefully simple). Otherwise this looks sensible to me.
I have changed these to asserts, which I believe will never fire. I'm not sure if it breaks the design on DAGCombiner to work like this, just refining the alignment on the MMO.
Mon, Jun 18
Ping. This look Ok now?
Yes I can see that. I would have liked to turn this on for more in-order cores, but without scheduling enough to at least say that a load takes multiple cycles, I didn't feel I had a great justification. For the record, these were the changes I saw on a A53 with useAA returning true (units are time, so lower is better. these are more than 2%):
Yes, I agree. There are hopefully multiple improvements we can make in the future. I remember seeing one patch recently (40369) that should help delinearise when sext's are involved (which I think happens a lot on 64bit machines). One thing that would probably be good to add somehow would be versioning based on needs of the loop. For example in your case it could check that ss is more than w, or in a more general case check that arrays do not alias. I know polly is very good at that, but it's not something we have for DA yet.
Sun, Jun 17
Worth seeing D48202, which plans to remove this function in preference of MergeBlockIntoPredecessor (which does the merge the other way, so doesn't run into the same problem removing the entry block).
Sat, Jun 16
Jun 15 2018
Hello. Thanks for running the tests. I did not know that there were uses outside of loop interchange and now unroll and jam. That's good to know it will get some more use.
Jun 14 2018
OK. One DA patch is in, the other is still waiting for review.
Jun 13 2018
So we (Arm Compiler 6) don't ship/compile against compiler-rt. At least not at the moment. I'm not sure why, it's been like that from before my time, we have just survived like that for a long while. I'm guessing our c library has always just filled in the gaps (at least the parts that we need).
Jun 12 2018
Requires D48029 to survives a bootstrap, but that looks like a more generic error than having to use this option. Otherwise I believe this is safe.
I like the idea I think. Should this be guarded by some sort of gnueabi though?
Jun 11 2018
getOriginalAlignment() -> getAlignment()
https://rise4fun.com/Z3/hP6p is a slightly cleaner version of the proof for the change in GCD.ll.
Jun 6 2018
Jun 5 2018
I quite like the UnrollAndFuse naming. I'd not heard that the xlc compiler called it that. The UnrollAndJam pass was origin named that before I renamed for similar reasons (UnrollAndJam being more well known).
I noticed in the paper that you used the name "unrollandjam", minus underscores. Should I change this use that spelling here? I have no strong opinion of one over the other (was just using what I had found from the Intel docs).
Added a comment. I could change it to calculate the MaxAlignment from the alignment on the globals, if you think that's better in the long run.
Jun 4 2018
Thanks for the comments. I've moved things around as a result of your suggestions.
Jun 1 2018
May 31 2018
Thanks! I have a couple more DA patches around, but they may not be as straight forward as this one.
May 30 2018
Attempted to clean up tests and make them not rely on arm. Removed the outer loop IV check, some other small code edits and some comment improvements/formatting.
May 29 2018
May 28 2018
Sorry, I forgot to mention the important fact that those results were Arm, specifically thumbv8m.baseline on a cortex-m23 (where the pointers are 32bit, and only i32 are legal types). Try this data layout for arm code:
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
May 27 2018
Reverted in 333359 as it's failing on some of the builders. I believe the tests are dependant on the arm backend being built.
May 24 2018
So yes, I ran some quick benchmarks and I believe this will cause regressions in some circumstances. In one case I looked at (which is running under our special LTO pipeline and may be a little difficult to replicate), we start off with this:
%shr = lshr i32 %sub, 6 %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr
This is turned into:
%shr.lhs.trunc = trunc i32 %sub to i16 %shr.rhs.trunc = trunc i32 6 to i16 %shr = lshr i16 %shr.lhs.trunc, %shr.rhs.trunc %shr.zext = zext i16 %shr to i32 %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
Which gets turned right back into:
%shr = lshr i32 %sub, 6 %shr.zext = and i32 %shr, 1023 %arrayidx11 = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
This splits out the pragma clang loop unroll_and_jam handling into D47320, for if/when we need it. Which I believe is what you wanted, correct me if I'm wrong.
May 23 2018
In my experience, they are used.
May 22 2018
May 21 2018
OK thanks. I will leave this for at least a couple more days whilst I run extra tests. Any other comments from anyone are welcome/appreciated.
I also forgot to mention I took a quick look and it appears that getBestSimplifyQuery is only used in two places, here in CVP and LoopRotate.
Thanks for the info. I see the problem. I will commit this, to make it so this isn't crashing, and we can go from there.
May 18 2018
I may have sowed some confusion here by not updating commit messages as the code changed.