This is an archive of the discontinued LLVM Phabricator instance.

[InstCombineCalls] Unfold element atomic memcpy instruction
ClosedPublic

Authored by igor-laevsky on Jan 19 2017, 9:41 AM.

Details

Summary

This change teaches instruction combiner on how to unfold element atomic memcpy intrinsic into sequence of explicit atomic memory operations.

Also I added new IntrinsicInst subclass to provide user friendly interface for the element atomic memcpy intrinsic. In the future we want to integrate this new class into MemIntrinsic hierarchy which will allow us to easily write optimizations covering both atomic and non-atomic versions. However as a first step I decided to frame it as a separate entity and refactor it later after we determine which use cases generalised interface should cover.

Diff Detail

Event Timeline

igor-laevsky created this revision.Jan 19 2017, 9:41 AM
reames requested changes to this revision.Jan 23 2017, 8:05 PM
reames added inline comments.
include/llvm/IR/IntrinsicInst.h
155

minor: instead of "class handles" use "represents"

lib/Transforms/InstCombine/InstCombineCalls.cpp
63

Units? The name should ideally reflect.

119

Please use the same convention used for the normal memcpy. Lower the atomic memcopy, set the length of the original to zero, then remove the zero length one. This also requires you to implement a currently missed optimization for zero length copies.

131

Why 16 elements as opposed to 8 bytes? (i.e. what the non-atomic form uses)

167

The source alignment may not hold for all elements. i.e. your alignment might be 1024 with your element size being 8.

This revision now requires changes to proceed.Jan 23 2017, 8:05 PM
igor-laevsky edited edge metadata.
igor-laevsky marked 4 inline comments as done.
igor-laevsky added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
131

This transformation is not exactly similar to the non-atomic unfolding. Non atomic version only unfolds into single load/store pair. Here we can generate more operations. Number 16 means that no more than 16 load/store pairs will be generated. I picked it as an arbitrary small integer but we can easily tune it afterwards.

167

Yes, thanks for the catch!

sanjoy accepted this revision.Feb 4 2017, 10:10 PM

lgtm with nits inline

include/llvm/IR/IntrinsicInst.h
171

No need to assert on isa -- that assert should implicitly happen during cast<>

lib/Transforms/InstCombine/InstCombineCalls.cpp
66

I'd rephrase this slightly as "Maximum number of elements in atomic memcpies the optimizer is allowed to unfold".

(i.e. no need to get into details -- this is a hidden option that only people who know what they're doing should use)

117

clang-format?

122

Minor: use auto * here.

132

s/ElementSize/ElementSizeInBits/

133

There is a DataLayout in InstCombiner.

165

I'd also prefer adding an assert here that checks what you said in the comment.

169

Note: you can do better here -- e.g. if the source is aligned to 16 bytes, and the element size is 4 bytes, then instead the alignment being 16, 4, 4, 4, ... it can be 16, 4, 8, 4, 16, ... etc.

For now I'll just leave a TODO.

181

Is it difficult to delete AMI right here? If not, I'd just do that -- no point in waiting for another iteration if it is obvious what will happen.

1538

This is really a separate transform -- I'd prefer splitting it out and landing it independently.

This revision was automatically updated to reflect the committed changes.
igor-laevsky marked 7 inline comments as done.Feb 8 2017, 6:56 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
165

Proper checks of this invariants are quite verbose and will clobber the actual logic. I would prefer to rely on the verifier in this case.

181

Philip suggested that I should stick to the same process as other memory intrinsic optimizations do. I don't have a strong opinion here, but I would go with Philip's suggestion since I already implemented it.