Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations. Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint). Takes the maximum number of elements processed in an iteration of the loop body. Returns false if the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint). Takes the number of elements remaining to be processed as well as the maximum number of elements processed in an iteration of the loop body. Returns the updated number of elements remaining.
I am not sure we need speak about elements in vectorised or non-vectorised loops here, see comment below.