Thanks for the update. Just one minor comment inlined.
Nov 13 2018
Nov 9 2018
Thanks Vedant for handling this. Overall I agree with this change, but I believe it's good to break this patch down into three parts and get each of them reviewed separately :
- Mark @llvm.trap cold.
- Do not treat noreturn calls as cold unless they are actually marked cold.
- Do not treat inline asm as an outlining barrier.
- Mark @llvm.trap cold.
- Do not treat noreturn calls as cold unless they are actually marked cold, and Do not treat inline asm as an outlining barrier.
Nov 8 2018
Thanks for revisiting this.
Thanks Florian for the update.
Nov 7 2018
I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.
IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).
Nov 6 2018
Nov 5 2018
Oct 29 2018
Oct 25 2018
Kindly ping one more time.
Oct 22 2018
Oct 12 2018
With the last modification I made based on Eli's comment, I didn't see any significant changes in size / performance in my spec2000 test on AArch64. Observed minor performance improvement in some internal benchmarks.
Oct 10 2018
Tried to handle Eli's comment. Please take a look and let me know any comment.
Oct 2 2018
Oct 1 2018
Do you have perf/size numbers?
I didn't see any degradation in size/perf in my spec2000. Just observed minor performance improvement in some internal benchmarks.
Sep 28 2018
Sep 11 2018
Sep 6 2018
Compile it with clang -target hexagon -O3. The code you added eventually punts (FormInputLSRUseAndFormula returns false), and LSR proceeds to do its thing. I did some analysis, and the problem is with %add.ptr = getelementptr inbounds i32, i32* %2, i32 %Col.0. This is not an "address use", since it goes into another GEP. It exists in the original source, but LSR never looks at it. Your code does and that makes it exit early. Maybe you should restrict the uses you look at to the same ones that LSR starts with?
Sorry for the extremely long delay on this change. Now I just updated the patch. Please take a look and let me know any comment.
Aug 17 2018
May 24 2018
May 23 2018
LGTM too. DomBBUsedRegs and OptBBUsedRegs are not really used, but keeping both DefedRegs and UsedRegs seems clearer than creating extra functions in LiveRegUnits.
May 17 2018
May 16 2018
Addressed Michael's comment.
Thanks Daniel for the review.
I agree that ideally LICM should preserve the CFG, and this change could potentially impact the compile time. However, I think this is the safest fix at this moment without a mess of updating extra analysis passes. In current pipelines (-O3, -O2, -O1, -Oz, and -Os), I looked at -debug-pass=Executions, and didn't see any extra execution of analysis passes after this change.
May 11 2018
May 1 2018
Refactors this change based on current tip. Made the check for load from FI more conservative.
Apr 27 2018
Thanks Sebastian for your review!
Matthias, please let me know if this change is still okay with you.
Apr 26 2018
Apr 25 2018
Now reassociations lookup is bounded only by Depth. If there are reasonable amount of reassociations on each level ~16, the whole number would not exceed ~16^3 which is ok.
Apr 24 2018
Thanks Quentin for the review.
Just a minor change in accumulateUsedDefed(). Please let me know any comment.
Apr 19 2018
Please let me know if my change in LiveRegUnits is okay with you.
Apr 18 2018
Addressed Matthias' comment by moving TargetInstrInfo::trackRegDefsUses into LiveRegUnits::accumulateUsedDefed.
Apr 17 2018
Have you seen LiveRegUnits::accumulate() glancing over the code here (and earlier) it seems to me that TII::tracksRegDefsUses() was just invented to do the same thing: figuring out which registers are free/usable over a range of instructions...
Apr 16 2018
Apr 13 2018
Thanks for handling this. Just curious if you see any issue with the non-null attribute in a constant ?
Apr 12 2018
Addressed Francis' comment.
Even without LiveRegUnits, I think you could replace ModifiedRegs and UsedRegs to be ModifiedRegUnits and UsedRegUnits, and use MCRegUnitIterators to check if registers alias.
I'm ok to commit this and follow-up in separate patches for the rest. Thanks!
Apr 11 2018
Apr 9 2018
My initial motivation case for this was when spilling the incoming argument register. This is somewhat related with the initial motivation of PostRASink pass because we do not sink the COPY for argument register before allocating. When RA try to spill the incoming argument register, it probably don't want to change the placement of the spill during RA, since sinking it down will extend the live range of the spilled value. So I do this after RA just like we did it for Copy in PostRASink pass.
Just minor update in comments.
Apr 6 2018
What do you think about using register units throughout the pass? I found them so much clearer to use when dealing with register aliases.
Apr 4 2018
With/without this change, I collected llvm stats for spec2000/2006/2017 on AArch64. I observed +31.87% more loads from stores promoted in AArch64LoadStoreOptimizer pass, and minor improvement in shrink-wrapping in case spills are sunk from the entry.
Apr 3 2018
Added NoVRegs for ShrinkWrap as well.
Updated summary and title.
Thanks for handling this. Just minor comments inlined.
The shrink-wrap pass would already assert here. Could you add the property to ShrinkWrap as well?
Apr 2 2018
Mar 30 2018
Mar 29 2018
Sounds good to me.
Thanks for doing this. Now, you intended it to be a NFC, right? If then, it will be good to keep the original test cases as it is. Adding more tests is definitely good.
Mar 28 2018
Rebased and minor changes in comments. Please let me know if the approach I'm using here make sense.