This is an archive of the discontinued LLVM Phabricator instance.

Introduce LowerGCLeafIntrinsics pass
AbandonedPublic

Authored by mkazantsev on Aug 3 2021, 5:54 AM.

Details

Summary

Some of the intrinsics (such as llvm.memcpy.element.unordered.atomic) may be
lowered differently, depending on whether or not they are considered gc leaf.
Those that are gc leaves may be easily lowered on IR level according to their
semantics.

On small data pieces, it saves time on call overhead (that may be significant if
we are about to copy a small portion of data). On big data pieces, ideally the code
gen should be able to generate code not worse than any other possible lowering
for such simple cases.

Another advantage of IR lowering is that the compiler may figure out some facts
(e.g. regarding length of the copied data) and do less job than straightforward
lowering into a library call would.

This patch introduces a pass that may lower various GC leaf intrinsics on IR level,
and implements it for llvm.memcpy.element.unordered.atomic.

Diff Detail

Event Timeline

mkazantsev created this revision.Aug 3 2021, 5:54 AM
mkazantsev requested review of this revision.Aug 3 2021, 5:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 5:54 AM
nikic requested changes to this revision.Aug 3 2021, 11:51 AM
nikic added a subscriber: nikic.

Without commenting on anything else, this implementation is not compatible with opaque pointers. Please remove mentions of getPointerElementType() and instead rely on the provided element size, possibly by inserting appropriate casts. If this is not sufficient for some reason and you really do need to know the exact element type, then this is going to need a change to the intrinsic design.

This revision now requires changes to proceed.Aug 3 2021, 11:51 AM
mkazantsev updated this revision to Diff 364019.Aug 4 2021, 2:58 AM

Not sure I understand well what is and what is not compatible with opaque pointers, but I've tried to addressed the comment by removing getPointerElementType. Also added a preheader to the loop and moved number of iterations computation there.

mkazantsev updated this revision to Diff 364026.Aug 4 2021, 3:22 AM
nikic resigned from this revision.Aug 4 2021, 12:11 PM

Thanks, should be fine now. Just avoiding getPointerElementType() is usually enough.

I think the name of the pass is misleading. What you are doing here is you provide an inlined lowering for a memory builtin. Your current implementation has a limitation such that it only inlines gc-leaf element atomic memcpys. I don't think the fact that it is gc-leaf is critical here. This transform can be extended to handle non-GC leaf operations. It can also be extended to handle non-atomic operations.

Here are some high-level questions to this optimization:

  • Should this lowering be a middle-end pass or something which resides in the backend?
  • If we chose to do this as a middle-end pass, when should this pass be scheduled?
  • How does this lowering interact with backend optimizations for memcpys? Like, replacing short constant length memcpys with loads/stores.
  • Is it always profitable to inline a memcpy? Should we have some heuristic here to select hot and short memcpys?
  • In this patch you do hand-rolled vectorization. Have you considered the alternative where you rely on the existing vectorizer instead?

I suggest writing a proposal to llvm-dev and discussing it there first.

There is a target-specific hook to emit code for regular (non-atomic) memcpys: EmitTargetCodeForMemcpy. Maybe we should just implement a similar hook for element-atomic copy?

llvm/lib/Transforms/Scalar/LowerGCLeafIntrinsics.cpp
68–69

LangRef doesn't prohibit other length types.

129–132

These loads and stores must be at least ElementSizeInBytes-atomic. I'm not sure you can express this in the IR at the moment.

mkazantsev abandoned this revision.Aug 4 2021, 9:34 PM

That's a non-starter then.