This is an archive of the discontinued LLVM Phabricator instance.

[HardwareLoops] Intrinsic LangRef descriptions
ClosedPublic

Authored by SjoerdMeijer on May 20 2020, 11:20 AM.

Details

Summary

As pointed out by @lebedev.ri in D79100, intrinsic descriptions of the HardwareLoop intrinsics were missing in LangRef. This adds these descriptions.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.May 20 2020, 11:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2020, 11:20 AM
simoll added a subscriber: simoll.May 20 2020, 11:31 AM

Changes outside LangRef LGTM; please commit separately.

llvm/docs/LangRef.rst
14900

I think each of these intrinsics needs a "semantics" section that specifies the bare semantics of each intrinsic, without mentioning anything about loops/control flow, assuming that's possible.

14919

Signature here is wrong?

samparker added inline comments.May 20 2020, 11:17 PM
llvm/docs/LangRef.rst
14899

Why the mention of vector?

14905

Maybe use 'trip count' instead to match the internal terminology the we use?

14925

vector again

14927

I think these ones are always placed in the preheader's (unique?) predecessor.

14957

Probably worth stating that it doesn't wrap.

And thanks for doing this!

Thanks for looking at this and your comments.

Changes outside LangRef have been committed in rGb0614509a0f1.

I will pick this up soon and then address the comments.

SjoerdMeijer retitled this revision from [HardwareLoops] Intrinsic descriptions to [HardwareLoops] Intrinsic LangRef descriptions.
SjoerdMeijer edited the summary of this revision. (Show Details)

I think this addresses the comments.

samparker accepted this revision.May 21 2020, 6:34 AM

With the one change, LGTM. Thanks again.

llvm/docs/LangRef.rst
14928

Missed this bit, these should return an i1.

This revision is now accepted and ready to land.May 21 2020, 6:34 AM
efriedma added inline comments.May 21 2020, 12:13 PM
llvm/docs/LangRef.rst
14990

"but are not allowed to duplicate these intrinsic calls"? Run-on sentence, and I'm not sure what isn't allowed to duplicate the intrinsic calls.

15022

Where does "llvm.loop.decrement" retrieve the loop iteration counter from? I think we need to describe this a bit more carefully. For the others, there's sort of a trivial lowering back to plain LLVM IR, but here I'm not sure what that looks like.

Thanks for catching that unclear part of the llvm.loop.decrement description, hopefully that's clearer now.

I have left the remarks about NoDuplicate in (for now), fixed that sentence, because I think the idea is that we for example don't want to a block to be cloned/copied, (partly) destroying the hardware loop structure.

The descriptions here related to llvm.loop.decrement still aren't really sufficient. The description of llvm.set.loop.iterations says "It's a hint to the backend". But llvm.loop.decrement is apparently depending on llvm.set.loop.iterations to have some sort of side-effect.

Really, I suspect the problem here is that llvm.loop.decrement is actually just unsound, and we avoid any issues with it simply by running it really late in the pass pipeline. But if that's the case, I'd prefer to state that explicitly.

llvm/docs/LangRef.rst
15005

"llvm.loop.decrement.i64", not "llvm.loop.decrement.reg.i64"

SjoerdMeijer added a comment.EditedMay 22 2020, 11:59 AM

The descriptions here related to llvm.loop.decrement still aren't really sufficient. The description of llvm.set.loop.iterations says "It's a hint to the backend". But llvm.loop.decrement is apparently depending on llvm.set.loop.iterations to have some sort of side-effect.

Really, I suspect the problem here is that llvm.loop.decrement is actually just unsound, and we avoid any issues with it simply by running it really late in the pass pipeline. But if that's the case, I'd prefer to state that explicitly.

Yep, agreed. I was unfamiliar with llvm.loop.decrement, looked into it today, but was struggling with it. I will have a look again to see if I haven't missed anything. And llvm.set.loop.iterations is probably very dubious too, because it just sits there, floating around. As you said, we get away with this because this is run really late. Probably slightly better is to let llvm.set.loop.iterations produce a value, so that it probably kind of models a move of the iteration count to the iteration count register, which can be picked up. But I don't have the bandwidth to do this right now: first I would like to document the current state of the art here, then finish D79100 and D79783, and after that I would like to return to this. So I will add a statement about unsoundness of this.

Probably slightly better is to let llvm.set.loop.iterations produce a value, so that it probably kind of models a move of the iteration count to the iteration count register, which can be picked up.

If you were actually looking at changing this, probably simplest would be to kill off llvm.loop.decrement completely, and convert the PPC hardware loops to work more like the ARM hardwareloops. Without llvm.loop.decrement, each intrinsic has an obvious conversion: llvm.set.loop.iterations is a no-op, llvm.test.set.loop.iterations is an icmp, llvm.loop.decrement.reg is a sub.

llvm/docs/LangRef.rst
14880

Thinking about it a bit more, I'm not sure we really want to promise these are stable. They're sort of an internal implementation detail, and frontend and mid-level optimizations don't really have any business generating them.

Maybe explicitly state these may be modified in the future, and are not intended to be used outside the backend.

SjoerdMeijer added inline comments.May 27 2020, 12:36 AM
llvm/docs/LangRef.rst
14880

Agreed, and thanks, will add this. For exactly these reasons I could justify to myself that we hadn't document/exposed them before, but that is not entirely correct...

Added a paragraph that these intrinsics should not be used outside the backend.

This revision was automatically updated to reflect the committed changes.