Changes outside LangRef LGTM; please commit separately.
I think each of these intrinsics needs a "semantics" section that specifies the bare semantics of each intrinsic, without mentioning anything about loops/control flow, assuming that's possible.
Signature here is wrong?
"but are not allowed to duplicate these intrinsic calls"? Run-on sentence, and I'm not sure what isn't allowed to duplicate the intrinsic calls.
Where does "llvm.loop.decrement" retrieve the loop iteration counter from? I think we need to describe this a bit more carefully. For the others, there's sort of a trivial lowering back to plain LLVM IR, but here I'm not sure what that looks like.
Thanks for catching that unclear part of the llvm.loop.decrement description, hopefully that's clearer now.
I have left the remarks about NoDuplicate in (for now), fixed that sentence, because I think the idea is that we for example don't want to a block to be cloned/copied, (partly) destroying the hardware loop structure.
The descriptions here related to llvm.loop.decrement still aren't really sufficient. The description of llvm.set.loop.iterations says "It's a hint to the backend". But llvm.loop.decrement is apparently depending on llvm.set.loop.iterations to have some sort of side-effect.
Really, I suspect the problem here is that llvm.loop.decrement is actually just unsound, and we avoid any issues with it simply by running it really late in the pass pipeline. But if that's the case, I'd prefer to state that explicitly.
"llvm.loop.decrement.i64", not "llvm.loop.decrement.reg.i64"
Yep, agreed. I was unfamiliar with llvm.loop.decrement, looked into it today, but was struggling with it. I will have a look again to see if I haven't missed anything. And llvm.set.loop.iterations is probably very dubious too, because it just sits there, floating around. As you said, we get away with this because this is run really late. Probably slightly better is to let llvm.set.loop.iterations produce a value, so that it probably kind of models a move of the iteration count to the iteration count register, which can be picked up. But I don't have the bandwidth to do this right now: first I would like to document the current state of the art here, then finish D79100 and D79783, and after that I would like to return to this. So I will add a statement about unsoundness of this.
Probably slightly better is to let llvm.set.loop.iterations produce a value, so that it probably kind of models a move of the iteration count to the iteration count register, which can be picked up.
If you were actually looking at changing this, probably simplest would be to kill off llvm.loop.decrement completely, and convert the PPC hardware loops to work more like the ARM hardwareloops. Without llvm.loop.decrement, each intrinsic has an obvious conversion: llvm.set.loop.iterations is a no-op, llvm.test.set.loop.iterations is an icmp, llvm.loop.decrement.reg is a sub.
Thinking about it a bit more, I'm not sure we really want to promise these are stable. They're sort of an internal implementation detail, and frontend and mid-level optimizations don't really have any business generating them.
Maybe explicitly state these may be modified in the future, and are not intended to be used outside the backend.