- User Since
- May 11 2015, 7:59 AM (215 w, 19 h)
I will add some comments, but I really don't think this belongs in constant islands. This doesn't have to worry about iterative changes, the loop size may only vary by 8 bytes, which is nothing compared to the 4KB that we need to concern ourselves with. Plus this is a very specific pass, especially once we start having to handle the tail predicated loops!
Ok, please ignore my nonsense. LGTM
- Renamed all the things to low-overhead loops.
- Used the arm.space intrinsic and added another test for the edge case.
- Reversed the order of the search for LoopEnd and LoopDec, breaking early if possible.
- Switched the order of constant island and low-overhead loops.
I will try to reorder the final passes. I hope that I can change the size of the pseudo instructions to be pessimistically big enough to be a cmp and br. I imagine that TTI will have to try to calculate the size, or at least the amount of live variables, so that these loops don't cause unnecessary register pressure and actually slow things down because of spills.
Tue, Jun 18
Currently using a mask to calculate the unrolled body test value, this looks like it solves the problem but causes regressions! Next I will try counting up with adds instead of down...
Patch to enable Arm code generation for do-while loops: https://reviews.llvm.org/D63476
Mon, Jun 17
LGTM. Remember to add and remove the files in SVN too ;)
Cheers Dave. I've removed the ExitBlock check in the ARM backend. Also, currently we'd need CounterInReg == RequiresLoopLatchExit so I've just switched to querying CounterInReg instead of introducing a new parameter. If, later, there's another target which has a separate requirement, then we can re-add the option.
Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.
Fri, Jun 14
Bit confusing this, which other patch(es) is this dependent upon?
What's big-endian anyway..? LGTM.
Thu, Jun 13
Wed, Jun 12
Thanks both. I went for the REQUIRES option, though slightly different, and I'll also add that config file.
First part: https://reviews.llvm.org/rL363147
Thanks @reames - this is the approach I will take and I'll pay particular attention to the add/sub case. I'll link the commits back to this review as I go.
- Now checking for loop invariant trip count.
- Added tests for double and half precision floats.
- Now checking the new hasLOB target feature.
Tue, Jun 11
Mon, Jun 10
- Folded saturation into the overflow case.
- Allow scalar sqrt.
Sun, Jun 9
This is broken, as well as the patch is depends on!
Fri, Jun 7
Thanks Philip. This has been reverted, so something is wrong. I'm finding it difficult to wrap, pun intended, my head around how adding users affects the semantics of a given expression. Are they any reviews/threads that you could point me to?
Now implemented isLoweredToCall to look at intrinsics.
Committed in rL362774.
LGTM. Yes, a bit difficult to test atm!
Thu, Jun 6
Cheers, NumElements has been renamed to LoopDecrement.
Removed initialisation in HardwareLoopInfo constructor and, instead, added two more options to do the same.
Wed, Jun 5
Using FlagAnyWrap for lshr and udiv.
- Added expand safety check..
- Made mightUseCTR into a private method. I tried static but since it uses many PPCTTIImpl members it just makes sense to have it as part of the class.
Tue, Jun 4
Updated llvm.loop.decrement.reg comment description.
I would expect many targets to have some kind of validity check late on in the pipeline. loop.decrement.reg is designed so that it be just be selected to a machine sub, as the IV chain still exists along with the icmp and br. I have assumed that because the intrinsic behaves like a sub, any target should be able to, hopefully trivially, fall back to a machine sub late on. Is this something that would be difficult for you..? The loop.decrement, which produces an i1, would cause more problems but this framework allows the backend to make the best decision for itself.
Mon, Jun 3
Removed RequiresNewPreheader from HardwareLoopInfo.
- Added comments describing the intrinsics.
- loop_decrement now just returns an i1, instead of being declared anyint. This allowed removing some changes from PPC backend.
Thanks both, I've moved the logic into InsertBinop.
Fri, May 31
Thu, May 30
I will continue going over this tomorrow.
Hopefully can now be handled by using generic saturating nodes.