[Atomics][LoopIdiom] Recognize unordered atomic memcpy
Needs ReviewPublic

Authored by dneilson on Tue, May 16, 9:20 AM.

Details

Summary

Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations.

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Diff Detail

dneilson created this revision.Tue, May 16, 9:20 AM
anna added a comment.Tue, May 16, 2:22 PM

Daniel, please upload the diff with full context. You can do this manually or via the arcanist tool: http://llvm.org/docs/Phabricator.html

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
24

Please add the CHECKs at the start of the function. That's the pattern followed usually in tests.

dneilson updated this revision to Diff 99205.Tue, May 16, 2:37 PM
  • Updating to reflect changes in prerequisite change.
  • Moved CHECKs in test case to start of procs.

Note: Test currently does not pass verification; need to propagate alignment info from the loads/stores onto the intrinsic pointer args.

dneilson marked an inline comment as done.Tue, May 16, 2:38 PM
reames requested changes to this revision.Tue, May 16, 5:53 PM

I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.

I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
85

Why have this off by default?

361

This predicate shows up a lot and confuses the code. Either remove the option or think about how to factor this out more cleanly.

864

Unrelated whitespace. Please remove.

994

Can this be either a tiernary or a helper lambda to remove the potentially uninitialized variable?

996

This size limit is target specific. (Including which functions are defined.) For the moment, I'd be fine just making this a nicely commented global variable.

Can this be sunk inside isLegalStore/isLegalLoad? It feels strange to have it here.

999

We also need to worry about alignment for the atomic loads and stores don't we? Or is that guaranteed by the fact they're atomic? Either case, comment potentially warranted?

1164

unrelated whitespace

This revision now requires changes to proceed.Tue, May 16, 5:53 PM

I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.

I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.

Good catch on isLegalStore; easy enough to clean up, and add a test case for. I'll think about the refactoring; you're probably right on that one...

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
361

My inclination is to ultimately remove the option. It's just there during initial development.

999

Implied by the load/store. ex, from the load langref: "align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee."

I *do* need to propagate the align info from the load/store to the pointer arg, though, and I haven't done that yet.

It's probably worth mirroring that LangRef verbiage on alignment from atomic load/store into the atomic memcpy.

dneilson added inline comments.Wed, May 17, 8:35 AM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
994

Could be, but I'm not convinced that it really buys anything. The value is defined in both branches of the if-else that immediately follows it.

dneilson updated this revision to Diff 99477.Thu, May 18, 11:58 AM
  • Address functional issue re: don't want to accidentally flag unordered atomic stores as okay for memset & memset_pattern.
  • Add test case to verify functional issue is non-issue.
  • Add align attribute to the pointer args on the generated intrinsic call.

TODO Remaining: Remove the command-line opt that toggles the transform.

dneilson marked 2 inline comments as done.Thu, May 18, 11:59 AM
anna added a comment.Thu, May 18, 2:38 PM

Looking at the code in LoopIdiomRecognize::collectStores, it might be better to refactor it first. Firstly, there's couple of TODOs regarding adding more patterns such as memcmp, memmove etc.

So, having something like this : isLegalStore(SI, ForMemset, ForMemsetPattern, ForMemcpy) seems messy. Adding something like an enum of patterns we are recognizing against, will be useful.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
357

Nit: extra semi colon.

361

This predicate is really confusing. isSimple = !isAtomic && !isVolatile
I'm not even sure if isSimple and ForUnorderedAtomic might negate each other.

404–405

Can you pls add a comment here stating that this is not supported for memset and memset_patternN.

431

Actually, there are 2 booleans (RecogUnorderedAtomicMemcpy and ForUnorderedAtomic ) and they are being used for different purposes throughout this function. Could you perhaps do an early bail out and state the requirements at the start of the function?

Also, add an assert at the caller of isLegalStore that unordered atomic stores was legal only for memcpy.

1005

Nit: end with period.

1016

Nit: no need of braces for single line else.

anna requested changes to this revision.Fri, May 19, 6:45 AM

Marking as needing changes, based on comments inline.

Daniel, as mentioned in previous comment: it might be better to start off with a refactoring NFC patch which would make the code which uses isLegaslStore cleaner. First, instead of bool, return the pattern for which this is valid. Creating an enum would be good - it states clearly which patterns are currently supported.
The caller of isLegalStore collects these stores based on the pattern, but there is no check that exactly one pattern is selected. Also there is an implicit ordering on which pattern is checked.
Once the NFC is reviewed and checked in, coming back to this patch and cleaning up the booleans and the various checks in isLegalStore might be easier.

This revision now requires changes to proceed.Fri, May 19, 6:45 AM
dneilson marked 6 inline comments as done.Fri, May 19, 11:03 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
85

I'm removing the option entirely in the next diff.

361

It's the double negation... always confusing.

!ForUnordered && !isSimple
=> !(ForUnordered || isSimple)
=> !( (isAtomic && isUnordered) || isSimple)
=> !( (isAtomic && (isSimple || unordered-atomic)) || isSimple )

Which is exactly what's wanted here.

431

One (RecogUnorderedAtomicMemcpy) was a command-line arg to turn on the idiom recognition for unordered atomic memcpy. I've just removed the option entirely.

dneilson updated this revision to Diff 99610.Fri, May 19, 12:14 PM
dneilson marked 2 inline comments as done.

Addressing most concerns.

Remaining: Find a way to query a platform-specific maximum store-size for converting into an unordered memcpy.

skatkov added inline comments.Mon, May 22, 11:18 PM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
999

I would prefer to revert the if statement.
if SI and LI are not atomic then memcpy otherwise your code.

It is simpler to follow if the short case is handled first.

dneilson marked an inline comment as done.Tue, May 23, 8:01 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
999

Sure

anna requested changes to this revision.Tue, May 23, 11:56 AM

Some comments inline.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1003

We can do 2 things here: Add a hidden cl option for the size that is default to 16 for now, instead of the hardcoded value.

You could take a look if TTI has that target specific information you need. Perhaps getRegisterBitWidth?

1012

If the spec requires atomic loads and stores to have an alignment, shouldn't this be an assert, rather than a check and return?

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
80

please add a test for memset_patternN not being recognized as well. These tests should change once support is added for both memsets.

This revision now requires changes to proceed.Tue, May 23, 11:56 AM
dneilson marked 3 inline comments as done.Wed, May 24, 7:42 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1003

I've changed this to query the runtime lib for whether an unordered atomic memcpy with the given store size exists. If it does, then we can create the memcpy; else we can't because we won't be able to lower it into a lib call.

1012

Spec requires atomic loads/stores to have alignment, but has no such requirement for non-atomic loads/stores. It's possible that one of source or dest will be non-atomic. I'll clarify the comment.

dneilson updated this revision to Diff 100088.Wed, May 24, 7:45 AM
dneilson marked an inline comment as done.
  • Refining logic regarding which store sizes to allow for the unordered atomic memcpy.
  • Add some additional tests -- simple loads/stores with no alignment, and memset_pattern
anna added a comment.Thu, May 25, 6:14 AM

These changes look good to me. Prefer @reames to specifically LGTM this.

dneilson updated this revision to Diff 100259.Thu, May 25, 9:39 AM
dneilson edited the summary of this revision. (Show Details)
  • Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

LGTM as well.

anna added a comment.Fri, May 26, 7:05 AM
  • Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

Do you mean the intrinsic that was already added? If so, could you please run make check all to confirm that all tests pass.

In D33243#765659, @anna wrote:
  • Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

Do you mean the intrinsic that was already added? If so, could you please run make check all to confirm that all tests pass.

Yes. This is overtop of the intrinsic that's already in-tree. I've run both check & check-all without issue.