- Fix an issue with the incorrect value being used for the number of elements being passed to [d|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register! Two helpers have been added to RDA for this.
- Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication.
- Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block.
- Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp.
Details
- Reviewers
dmgreen SjoerdMeijer - Commits
- rGacbc9aed726d: [ARM][MVE] Fixes for tail predication.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/ARM/ARMInstrThumb2.td | ||
---|---|---|
5212–5213 | Cheers. Yeah, it's not until I've now enabled evaluating terminators that we're seeing it! | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/unsafe-cpsr-loop-use.mir | ||
145 | To make this test still test what it's designed for... This patch now allows this loop to be converted into a loloop, by using $lr = tMOVr in the preheader, so I've put a move in the loop, a use of lr, to prevent the transform from kicking in. |
OK. LGTM. Nice fixes.
One thing I did notice, we are sometimes relying on kill flags, but not updating them ourselves? Something to think about maybe.
I'm surprised we haven't need this earlier!
Can these 2 lines be be removed now?