This is an archive of the discontinued LLVM Phabricator instance.

[Atomics][LoopIdiom] Recognize unordered atomic memcpy
ClosedPublic

Authored by dneilson on May 16 2017, 9:20 AM.

Details

Summary

Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations.

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Diff Detail

Event Timeline

dneilson created this revision.May 16 2017, 9:20 AM
anna edited edge metadata.May 16 2017, 2:22 PM

Daniel, please upload the diff with full context. You can do this manually or via the arcanist tool: http://llvm.org/docs/Phabricator.html

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
24

Please add the CHECKs at the start of the function. That's the pattern followed usually in tests.

dneilson updated this revision to Diff 99205.May 16 2017, 2:37 PM
  • Updating to reflect changes in prerequisite change.
  • Moved CHECKs in test case to start of procs.

Note: Test currently does not pass verification; need to propagate alignment info from the loads/stores onto the intrinsic pointer args.

dneilson marked an inline comment as done.May 16 2017, 2:38 PM
reames requested changes to this revision.May 16 2017, 5:53 PM

I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.

I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
84

Why have this off by default?

359–362

This predicate shows up a lot and confuses the code. Either remove the option or think about how to factor this out more cleanly.

870

Unrelated whitespace. Please remove.

997

Can this be either a tiernary or a helper lambda to remove the potentially uninitialized variable?

999

This size limit is target specific. (Including which functions are defined.) For the moment, I'd be fine just making this a nicely commented global variable.

Can this be sunk inside isLegalStore/isLegalLoad? It feels strange to have it here.

1002

We also need to worry about alignment for the atomic loads and stores don't we? Or is that guaranteed by the fact they're atomic? Either case, comment potentially warranted?

1169

unrelated whitespace

This revision now requires changes to proceed.May 16 2017, 5:53 PM

I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.

I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.

Good catch on isLegalStore; easy enough to clean up, and add a test case for. I'll think about the refactoring; you're probably right on that one...

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
359–362

My inclination is to ultimately remove the option. It's just there during initial development.

1002

Implied by the load/store. ex, from the load langref: "align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee."

I *do* need to propagate the align info from the load/store to the pointer arg, though, and I haven't done that yet.

It's probably worth mirroring that LangRef verbiage on alignment from atomic load/store into the atomic memcpy.

dneilson added inline comments.May 17 2017, 8:35 AM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
997

Could be, but I'm not convinced that it really buys anything. The value is defined in both branches of the if-else that immediately follows it.

dneilson updated this revision to Diff 99477.May 18 2017, 11:58 AM
dneilson edited edge metadata.
  • Address functional issue re: don't want to accidentally flag unordered atomic stores as okay for memset & memset_pattern.
  • Add test case to verify functional issue is non-issue.
  • Add align attribute to the pointer args on the generated intrinsic call.

TODO Remaining: Remove the command-line opt that toggles the transform.

dneilson marked 2 inline comments as done.May 18 2017, 11:59 AM
anna added a comment.May 18 2017, 2:38 PM

Looking at the code in LoopIdiomRecognize::collectStores, it might be better to refactor it first. Firstly, there's couple of TODOs regarding adding more patterns such as memcmp, memmove etc.

So, having something like this : isLegalStore(SI, ForMemset, ForMemsetPattern, ForMemcpy) seems messy. Adding something like an enum of patterns we are recognizing against, will be useful.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
356

Nit: extra semi colon.

359–362

This predicate is really confusing. isSimple = !isAtomic && !isVolatile
I'm not even sure if isSimple and ForUnorderedAtomic might negate each other.

408

Can you pls add a comment here stating that this is not supported for memset and memset_patternN.

434

Actually, there are 2 booleans (RecogUnorderedAtomicMemcpy and ForUnorderedAtomic ) and they are being used for different purposes throughout this function. Could you perhaps do an early bail out and state the requirements at the start of the function?

Also, add an assert at the caller of isLegalStore that unordered atomic stores was legal only for memcpy.

1008

Nit: end with period.

1019

Nit: no need of braces for single line else.

anna requested changes to this revision.May 19 2017, 6:45 AM

Marking as needing changes, based on comments inline.

Daniel, as mentioned in previous comment: it might be better to start off with a refactoring NFC patch which would make the code which uses isLegaslStore cleaner. First, instead of bool, return the pattern for which this is valid. Creating an enum would be good - it states clearly which patterns are currently supported.
The caller of isLegalStore collects these stores based on the pattern, but there is no check that exactly one pattern is selected. Also there is an implicit ordering on which pattern is checked.
Once the NFC is reviewed and checked in, coming back to this patch and cleaning up the booleans and the various checks in isLegalStore might be easier.

This revision now requires changes to proceed.May 19 2017, 6:45 AM
dneilson marked 6 inline comments as done.May 19 2017, 11:03 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
84

I'm removing the option entirely in the next diff.

359–362

It's the double negation... always confusing.

!ForUnordered && !isSimple
=> !(ForUnordered || isSimple)
=> !( (isAtomic && isUnordered) || isSimple)
=> !( (isAtomic && (isSimple || unordered-atomic)) || isSimple )

Which is exactly what's wanted here.

434

One (RecogUnorderedAtomicMemcpy) was a command-line arg to turn on the idiom recognition for unordered atomic memcpy. I've just removed the option entirely.

dneilson updated this revision to Diff 99610.May 19 2017, 12:14 PM
dneilson edited edge metadata.
dneilson marked 2 inline comments as done.

Addressing most concerns.

Remaining: Find a way to query a platform-specific maximum store-size for converting into an unordered memcpy.

skatkov added inline comments.May 22 2017, 11:18 PM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1002

I would prefer to revert the if statement.
if SI and LI are not atomic then memcpy otherwise your code.

It is simpler to follow if the short case is handled first.

dneilson marked an inline comment as done.May 23 2017, 8:01 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1002

Sure

anna requested changes to this revision.May 23 2017, 11:56 AM

Some comments inline.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1006

We can do 2 things here: Add a hidden cl option for the size that is default to 16 for now, instead of the hardcoded value.

You could take a look if TTI has that target specific information you need. Perhaps getRegisterBitWidth?

1015

If the spec requires atomic loads and stores to have an alignment, shouldn't this be an assert, rather than a check and return?

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
80

please add a test for memset_patternN not being recognized as well. These tests should change once support is added for both memsets.

This revision now requires changes to proceed.May 23 2017, 11:56 AM
dneilson marked 3 inline comments as done.May 24 2017, 7:42 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1006

I've changed this to query the runtime lib for whether an unordered atomic memcpy with the given store size exists. If it does, then we can create the memcpy; else we can't because we won't be able to lower it into a lib call.

1015

Spec requires atomic loads/stores to have alignment, but has no such requirement for non-atomic loads/stores. It's possible that one of source or dest will be non-atomic. I'll clarify the comment.

dneilson updated this revision to Diff 100088.May 24 2017, 7:45 AM
dneilson edited edge metadata.
dneilson marked an inline comment as done.
  • Refining logic regarding which store sizes to allow for the unordered atomic memcpy.
  • Add some additional tests -- simple loads/stores with no alignment, and memset_pattern
anna added a comment.May 25 2017, 6:14 AM

These changes look good to me. Prefer @reames to specifically LGTM this.

dneilson updated this revision to Diff 100259.May 25 2017, 9:39 AM
dneilson edited the summary of this revision. (Show Details)
  • Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.
skatkov edited edge metadata.May 25 2017, 7:44 PM

LGTM as well.

anna added a comment.May 26 2017, 7:05 AM
  • Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

Do you mean the intrinsic that was already added? If so, could you please run make check all to confirm that all tests pass.

In D33243#765659, @anna wrote:
  • Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

Do you mean the intrinsic that was already added? If so, could you please run make check all to confirm that all tests pass.

Yes. This is overtop of the intrinsic that's already in-tree. I've run both check & check-all without issue.

reames requested changes to this revision.May 26 2017, 7:54 PM

Very close, but needs changes. In particular, bug around volatile access.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
358

This is missing the case where a instruction is both unordered atomic and volatile. Add fix and test case please.

359–362

I'd strongly recommend rewriting this as:
if (Volatile) return None;
if (isAtomic() && Ordering != unordered) return None;

408

add: (yet)

1019

This is incorrect. We can't allow a misaligned atomic memcpy. If we can't ensure the load and store is sufficiently aligned, we must reject the transform. Fix and add test please.

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
129

Please add at least one test case for each element size in (2, 4, 8)

This revision now requires changes to proceed.May 26 2017, 7:54 PM
dneilson marked an inline comment as done.May 29 2017, 7:23 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
358

isUnordered() precludes volatile. From StoreInst:
393 bool isUnordered() const {
394 return (getOrdering() == AtomicOrdering::NotAtomic ||
395 getOrdering() == AtomicOrdering::Unordered) &&
396 !isVolatile();
397 }

So, the case isn't missing.

359–362

True enough... wouldn't hurt to make it more explicit. I'm thinking:
if (Volatile) return None;
if (not Unordered) return None;

1019

Funny enough, I caught this when updating the version of this patch that's overtop of the changed atomic memcpy intrinsic; just didn't port it to this version. Will do that...

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
129

Will do.

dneilson marked 4 inline comments as done.May 29 2017, 9:20 AM
dneilson updated this revision to Diff 100619.May 29 2017, 9:32 AM
dneilson edited edge metadata.
  • Adding tests to check permutations of alignment validity.
  • Adding tests for element sizes beyond 1 byte.
  • Clarify some conditional rejections in isLegalStore()
reames accepted this revision.May 30 2017, 3:21 PM

LGTM

Anna, since Daniel doesn't yet have commit access would you mind landing this? I would do it myself, but can't commit the time to watch for any problems after commit.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
361

I don't understand the note? Maybe just remove

anna added a comment.May 31 2017, 7:23 AM

LGTM

Anna, since Daniel doesn't yet have commit access would you mind landing this? I would do it myself, but can't commit the time to watch for any problems after commit.

Sure. @dneilson Could you please update the diff after addressing the comment inline, run clang-format on modified lines, and required tests.. I'll run some tests and check it in once that's done. Thanks.

dneilson marked an inline comment as done.May 31 2017, 8:07 AM
dneilson added inline comments.
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
361

The test is for 'isUnordered()', and the note aims to clarify that unordered implies (=>) simple.

dneilson updated this revision to Diff 100867.May 31 2017, 8:35 AM
dneilson marked an inline comment as done.
  • Rebase to ToT.
  • clang-format (no-op)

@anna I've verified that this passes 'make check'

This revision was automatically updated to reflect the committed changes.
anna added a comment.May 31 2017, 10:29 AM

Patch was reverted in rL304315 because of undefined reference to RTLIB::getMEMCPY_ELEMENT_ATOMIC in polly (http://lab.llvm.org:8011/builders/polly-arm-linux/builds/5582) and mingw builds. Not sure whether the other builds link RTLIB. clang seemed to pass fine (http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/4373)

dneilson updated this revision to Diff 100907.May 31 2017, 12:16 PM
  • Remove use of RTLIB in loop idiom. Results in a circular dependence in libs, so we can't use it there. Instead of querying RTLIB for allowable store sizes, we query the scalar register bit width from TTI.
anna reopened this revision.May 31 2017, 12:19 PM
dneilson added inline comments.May 31 2017, 12:23 PM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1012

This is the replacement due to the inability to use RTLIB in ScalarOpt due to a circular dependency. It's not ideal as it's theoretically possible that larger width versions of the intrinsic's lib call will exist (ex: ones that are implemented to use vector regs for the load/stores), but we won't be able to exploit any that are wider than scalar register width.

anna added inline comments.Jun 1 2017, 7:02 AM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1012

Is it possible that even if the StoreSizeBits is within the max registerBitWidth for the target arch, we do not have the corresponding lib call?
Could you please check if TLI has the information you require or can be modified to do so - it seems to have memset, memcpy etc.

dneilson added inline comments.Jun 1 2017, 7:55 AM
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1012

Theoretically it's possible, yes. Lib calls for sizes 1, 2, 4, 8, and 16 bytes are currently defined. It a platform doesn't have one of those sizes's definition available and we create the call, then the result will be a link error -- which seems to be intentional per the discussion around the design of the element atomic memcpy. If scalar registers with more than 16 bytes are available on the platform, then we'd end up creating the intrinsic call and then blowing up when lowering it to a libcall in SelectionDAGBuilder -- this is the part I'd like to avoid.

TLI seems to be concerned with system lib calls (libc, libm, etc), and not the __llvm_ lib calls, so I don't think that's a suitable place for this.

dneilson updated this revision to Diff 101452.Jun 5 2017, 1:34 PM
  • Abstract the max element size for atomic memory intrinsics into the TTI.
reames added a comment.Jun 5 2017, 2:53 PM

LGTM again w/minor comment addressed before commit.

include/llvm/Analysis/TargetTransformInfoImpl.h
429 ↗(On Diff #101452)

Why 4? 0 would seem like a safer default.

reames accepted this revision.Jun 5 2017, 2:55 PM
dneilson added inline comments.Jun 6 2017, 6:31 AM
include/llvm/Analysis/TargetTransformInfoImpl.h
429 ↗(On Diff #101452)

Good point, zero would seem safer... I put in 4 because my original intent was to put in getRegisterBitWidth()/8 & the default reg bit width is 32 -- I couldn't do that because the call here would get bound to the getRegisterBitWidth() in this base class, and would never see the platform-specific value.

dneilson updated this revision to Diff 101563.Jun 6 2017, 7:45 AM
  • Default max element size to 0
anna added inline comments.Jun 6 2017, 7:51 AM
include/llvm/Analysis/TargetTransformInfoImpl.h
429 ↗(On Diff #101452)

So, again it maybe the case that getRegisterBitWidth()/8 may not have the corresponding libcall defined for the target. I have a mild preference to leave it to the target to confirm they have valid definitions for these libcall sizes (i.e. having 0 as the default in the base definition).

dneilson marked an inline comment as done.Jun 6 2017, 7:53 AM
anna accepted this revision.Jun 6 2017, 8:43 AM

Daniel, I had written my comment while you updated the diff :). This LGTM now. I'll land this upstream on your behalf.

This revision is now accepted and ready to land.Jun 6 2017, 8:43 AM
This revision was automatically updated to reflect the committed changes.
anna added a comment.Jun 6 2017, 10:57 AM

FYI: The committed patch (rL304806)caused failures in multiple targets (passed on X86, PPC and s390), since the unordered-atomic-memcpy.ll test was in the common LoopIdiom directory.
Discussing with echristo on IRC: Specifying target-triple as X86 is not enough. It would still be run on targets where the X86 backend is not compiled in.

I've checked in the fix (rL304809), which is to move the file to the X86 specific subdirectory, where the lit.cfg chooses that the test should not be run on any other target.