This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
lib/
-
IR/
-
IRBuilder.cpp
-
Transforms/Scalar/
-
Scalar/
14/37
LoopIdiomRecognize.cpp
-
test/Transforms/LoopIdiom/
-
Transforms/
-
LoopIdiom/
3/4
unordered-atomic-memcpy.ll

Differential D33243

[Atomics][LoopIdiom] Recognize unordered atomic memcpy
ClosedPublic

Authored by dneilson on May 16 2017, 9:20 AM.

Download Raw Diff

Details

Reviewers

reames
anna
skatkov

Commits

rGb2a212c070d9: [Atomics][LoopIdiom] Recognize unordered atomic memcpy
rG056c009f1b0f: [Atomics][LoopIdiom] Recognize unordered atomic memcpy
rL304806: [Atomics][LoopIdiom] Recognize unordered atomic memcpy
rL304310: [Atomics][LoopIdiom] Recognize unordered atomic memcpy

Summary

Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations.

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Diff Detail

Event Timeline

dneilson created this revision.May 16 2017, 9:20 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMay 16 2017, 9:20 AM

dneilson added a parent revision: D33240: [Atomics] Rename and change prototype for atomic memcpy intrinsic.May 16 2017, 11:20 AM

Daniel, please upload the diff with full context. You can do this manually or via the arcanist tool: http://llvm.org/docs/Phabricator.html

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
24	Please add the `CHECK`s at the start of the function. That's the pattern followed usually in tests.

Updating to reflect changes in prerequisite change.
Moved CHECKs in test case to start of procs.

Note: Test currently does not pass verification; need to propagate alignment info from the loads/stores onto the intrinsic pointer args.

dneilson marked an inline comment as done.May 16 2017, 2:38 PM

I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.

I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
84	Why have this off by default?
356	This predicate shows up a lot and confuses the code. Either remove the option or think about how to factor this out more cleanly.
853	Unrelated whitespace. Please remove.
984	Can this be either a tiernary or a helper lambda to remove the potentially uninitialized variable?
986	This size limit is target specific. (Including which functions are defined.) For the moment, I'd be fine just making this a nicely commented global variable. Can this be sunk inside isLegalStore/isLegalLoad? It feels strange to have it here.
989	We also need to worry about alignment for the atomic loads and stores don't we? Or is that guaranteed by the fact they're atomic? Either case, comment potentially warranted?
1135	unrelated whitespace

This revision now requires changes to proceed.May 16 2017, 5:53 PM

In D33243#756942, @reames wrote:

I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.

I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.

Good catch on isLegalStore; easy enough to clean up, and add a test case for. I'll think about the refactoring; you're probably right on that one...

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
356	My inclination is to ultimately remove the option. It's just there during initial development.
989	Implied by the load/store. ex, from the load langref: "align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee." I do need to propagate the align info from the load/store to the pointer arg, though, and I haven't done that yet. It's probably worth mirroring that LangRef verbiage on alignment from atomic load/store into the atomic memcpy.

dneilson added inline comments.May 17 2017, 8:35 AM

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
984	Could be, but I'm not convinced that it really buys anything. The value is defined in both branches of the if-else that immediately follows it.

Address functional issue re: don't want to accidentally flag unordered atomic stores as okay for memset & memset_pattern.
Add test case to verify functional issue is non-issue.
Add align attribute to the pointer args on the generated intrinsic call.

TODO Remaining: Remove the command-line opt that toggles the transform.

dneilson marked 2 inline comments as done.May 18 2017, 11:59 AM

anna added a subscriber: llvm-commits.May 18 2017, 12:58 PM

Looking at the code in LoopIdiomRecognize::collectStores, it might be better to refactor it first. Firstly, there's couple of TODOs regarding adding more patterns such as memcmp, memmove etc.

So, having something like this : isLegalStore(SI, ForMemset, ForMemsetPattern, ForMemcpy) seems messy. Adding something like an enum of patterns we are recognizing against, will be useful.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
354	Nit: extra semi colon.
356	This predicate is really confusing. `isSimple = !isAtomic && !isVolatile` I'm not even sure if isSimple and ForUnorderedAtomic might negate each other.
399	Can you pls add a comment here stating that this is not supported for memset and memset_patternN.
427	Actually, there are 2 booleans (RecogUnorderedAtomicMemcpy and ForUnorderedAtomic ) and they are being used for different purposes throughout this function. Could you perhaps do an early bail out and state the requirements at the start of the function? Also, add an assert at the caller of `isLegalStore` that unordered atomic stores was legal only for `memcpy`.
995	Nit: end with period.
1006	Nit: no need of braces for single line else.

Marking as needing changes, based on comments inline.

Daniel, as mentioned in previous comment: it might be better to start off with a refactoring NFC patch which would make the code which uses isLegaslStore cleaner. First, instead of bool, return the pattern for which this is valid. Creating an enum would be good - it states clearly which patterns are currently supported.
The caller of isLegalStore collects these stores based on the pattern, but there is no check that exactly one pattern is selected. Also there is an implicit ordering on which pattern is checked.
Once the NFC is reviewed and checked in, coming back to this patch and cleaning up the booleans and the various checks in isLegalStore might be easier.

This revision now requires changes to proceed.May 19 2017, 6:45 AM

dneilson marked 6 inline comments as done.May 19 2017, 11:03 AM

dneilson added inline comments.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
84	I'm removing the option entirely in the next diff.
356	It's the double negation... always confusing. !ForUnordered && !isSimple => !(ForUnordered \|\| isSimple) => !( (isAtomic && isUnordered) \|\| isSimple) => !( (isAtomic && (isSimple \|\| unordered-atomic)) \|\| isSimple ) Which is exactly what's wanted here.
427	One (RecogUnorderedAtomicMemcpy) was a command-line arg to turn on the idiom recognition for unordered atomic memcpy. I've just removed the option entirely.

Addressing most concerns.

Remaining: Find a way to query a platform-specific maximum store-size for converting into an unordered memcpy.

skatkov added inline comments.May 22 2017, 11:18 PM

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
989	I would prefer to revert the if statement. if SI and LI are not atomic then memcpy otherwise your code. It is simpler to follow if the short case is handled first.

dneilson marked an inline comment as done.May 23 2017, 8:01 AM

dneilson added inline comments.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
989	Sure

Some comments inline.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
993	We can do 2 things here: Add a hidden `cl` option for the size that is default to 16 for now, instead of the hardcoded value. You could take a look if `TTI` has that target specific information you need. Perhaps `getRegisterBitWidth`?
1002	If the spec requires atomic loads and stores to have an alignment, shouldn't this be an assert, rather than a check and return?
test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
80	please add a test for memset_patternN not being recognized as well. These tests should change once support is added for both memsets.

This revision now requires changes to proceed.May 23 2017, 11:56 AM

dneilson marked 3 inline comments as done.May 24 2017, 7:42 AM

dneilson added inline comments.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
993	I've changed this to query the runtime lib for whether an unordered atomic memcpy with the given store size exists. If it does, then we can create the memcpy; else we can't because we won't be able to lower it into a lib call.
1002	Spec requires atomic loads/stores to have alignment, but has no such requirement for non-atomic loads/stores. It's possible that one of source or dest will be non-atomic. I'll clarify the comment.

Refining logic regarding which store sizes to allow for the unordered atomic memcpy.
Add some additional tests -- simple loads/stores with no alignment, and memset_pattern

These changes look good to me. Prefer @reames to specifically LGTM this.

Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

dneilson removed a parent revision: D33240: [Atomics] Rename and change prototype for atomic memcpy intrinsic.May 25 2017, 9:39 AM

LGTM as well.

In D33243#764575, @dneilson wrote:

Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

Do you mean the intrinsic that was already added? If so, could you please run make check all to confirm that all tests pass.

In D33243#765659, @anna wrote:

In D33243#764575, @dneilson wrote:

Remove dependence on D33240 by expanding to the current, existing, form of atomic.memcpy.

Do you mean the intrinsic that was already added? If so, could you please run make check all to confirm that all tests pass.

Yes. This is overtop of the intrinsic that's already in-tree. I've run both check & check-all without issue.

Very close, but needs changes. In particular, bug around volatile access.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
354	This is missing the case where a instruction is both unordered atomic and volatile. Add fix and test case please.
356	I'd strongly recommend rewriting this as: if (Volatile) return None; if (isAtomic() && Ordering != unordered) return None;
399	add: (yet)
1006	This is incorrect. We can't allow a misaligned atomic memcpy. If we can't ensure the load and store is sufficiently aligned, we must reject the transform. Fix and add test please.
test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
129	Please add at least one test case for each element size in (2, 4, 8)

This revision now requires changes to proceed.May 26 2017, 7:54 PM

dneilson marked an inline comment as done.May 29 2017, 7:23 AM

dneilson added inline comments.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
354	isUnordered() precludes volatile. From StoreInst: 393 bool isUnordered() const { 394 return (getOrdering() == AtomicOrdering::NotAtomic \|\| 395 getOrdering() == AtomicOrdering::Unordered) && 396 !isVolatile(); 397 } So, the case isn't missing.
356	True enough... wouldn't hurt to make it more explicit. I'm thinking: if (Volatile) return None; if (not Unordered) return None;
1006	Funny enough, I caught this when updating the version of this patch that's overtop of the changed atomic memcpy intrinsic; just didn't port it to this version. Will do that...
test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll
129	Will do.

dneilson marked 4 inline comments as done.May 29 2017, 9:20 AM

Adding tests to check permutations of alignment validity.
Adding tests for element sizes beyond 1 byte.
Clarify some conditional rejections in isLegalStore()

LGTM

Anna, since Daniel doesn't yet have commit access would you mind landing this? I would do it myself, but can't commit the time to watch for any problems after commit.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
358	I don't understand the note? Maybe just remove

In D33243#768222, @reames wrote:

LGTM

Anna, since Daniel doesn't yet have commit access would you mind landing this? I would do it myself, but can't commit the time to watch for any problems after commit.

Sure. @dneilson Could you please update the diff after addressing the comment inline, run clang-format on modified lines, and required tests.. I'll run some tests and check it in once that's done. Thanks.

dneilson marked an inline comment as done.May 31 2017, 8:07 AM

dneilson added inline comments.

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
358	The test is for 'isUnordered()', and the note aims to clarify that unordered implies (=>) simple.

Rebase to ToT.
clang-format (no-op)

@anna I've verified that this passes 'make check'

Closed by commit rL304310: [Atomics][LoopIdiom] Recognize unordered atomic memcpy (authored by annat). · Explain WhyMay 31 2017, 9:40 AM

This revision was automatically updated to reflect the committed changes.

Patch was reverted in rL304315 because of undefined reference to RTLIB::getMEMCPY_ELEMENT_ATOMIC in polly (http://lab.llvm.org:8011/builders/polly-arm-linux/builds/5582) and mingw builds. Not sure whether the other builds link RTLIB. clang seemed to pass fine (http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/4373)

Remove use of RTLIB in loop idiom. Results in a circular dependence in libs, so we can't use it there. Instead of querying RTLIB for allowable store sizes, we query the scalar register bit width from TTI.

anna reopened this revision.May 31 2017, 12:19 PM

dneilson added inline comments.May 31 2017, 12:23 PM

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
999	This is the replacement due to the inability to use RTLIB in ScalarOpt due to a circular dependency. It's not ideal as it's theoretically possible that larger width versions of the intrinsic's lib call will exist (ex: ones that are implemented to use vector regs for the load/stores), but we won't be able to exploit any that are wider than scalar register width.

anna added inline comments.Jun 1 2017, 7:02 AM

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
999	Is it possible that even if the `StoreSizeBits` is within the max `registerBitWidth` for the target arch, we do not have the corresponding lib call? Could you please check if `TLI` has the information you require or can be modified to do so - it seems to have `memset`, `memcpy` etc.

dneilson added inline comments.Jun 1 2017, 7:55 AM

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
999	Theoretically it's possible, yes. Lib calls for sizes 1, 2, 4, 8, and 16 bytes are currently defined. It a platform doesn't have one of those sizes's definition available and we create the call, then the result will be a link error -- which seems to be intentional per the discussion around the design of the element atomic memcpy. If scalar registers with more than 16 bytes are available on the platform, then we'd end up creating the intrinsic call and then blowing up when lowering it to a libcall in SelectionDAGBuilder -- this is the part I'd like to avoid. TLI seems to be concerned with system lib calls (libc, libm, etc), and not the __llvm_ lib calls, so I don't think that's a suitable place for this.

Abstract the max element size for atomic memory intrinsics into the TTI.

LGTM again w/minor comment addressed before commit.

include/llvm/Analysis/TargetTransformInfoImpl.h
429 ↗	(On Diff #101452)	Why 4? 0 would seem like a safer default.

reames accepted this revision.Jun 5 2017, 2:55 PM

dneilson added inline comments.Jun 6 2017, 6:31 AM

include/llvm/Analysis/TargetTransformInfoImpl.h
429 ↗	(On Diff #101452)	Good point, zero would seem safer... I put in 4 because my original intent was to put in getRegisterBitWidth()/8 & the default reg bit width is 32 -- I couldn't do that because the call here would get bound to the getRegisterBitWidth() in this base class, and would never see the platform-specific value.

Default max element size to 0

anna added inline comments.Jun 6 2017, 7:51 AM

include/llvm/Analysis/TargetTransformInfoImpl.h
429 ↗	(On Diff #101452)	So, again it maybe the case that `getRegisterBitWidth()/8` may not have the corresponding libcall defined for the target. I have a mild preference to leave it to the target to confirm they have valid definitions for these libcall sizes (i.e. having 0 as the default in the base definition).

dneilson marked an inline comment as done.Jun 6 2017, 7:53 AM

Daniel, I had written my comment while you updated the diff :). This LGTM now. I'll land this upstream on your behalf.

This revision is now accepted and ready to land.Jun 6 2017, 8:43 AM

Closed by commit rL304806: [Atomics][LoopIdiom] Recognize unordered atomic memcpy (authored by annat). · Explain WhyJun 6 2017, 9:45 AM

This revision was automatically updated to reflect the committed changes.

FYI: The committed patch (rL304806)caused failures in multiple targets (passed on X86, PPC and s390), since the unordered-atomic-memcpy.ll test was in the common LoopIdiom directory.
Discussing with echristo on IRC: Specifying target-triple as X86 is not enough. It would still be run on targets where the X86 backend is not compiled in.

I've checked in the fix (rL304809), which is to move the file to the X86 specific subdirectory, where the lit.cfg chooses that the test should not be run on any other target.

Revision Contents

Path

Size

include/

llvm/

IR/

IRBuilder.h

22 lines

lib/

IR/

IRBuilder.cpp

39 lines

Transforms/

Scalar/

LoopIdiomRecognize.cpp

57 lines

test/

Transforms/

LoopIdiom/

unordered-atomic-memcpy.ll

98 lines

Diff 99477

include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	public:
}		}

CallInst CreateMemCpy(Value Dst, Value Src, Value Size, unsigned Align,		CallInst CreateMemCpy(Value Dst, Value Src, Value Size, unsigned Align,
bool isVolatile = false, MDNode *TBAATag = nullptr,		bool isVolatile = false, MDNode *TBAATag = nullptr,
MDNode *TBAAStructTag = nullptr,		MDNode *TBAAStructTag = nullptr,
MDNode *ScopeTag = nullptr,		MDNode *ScopeTag = nullptr,
MDNode *NoAliasTag = nullptr);		MDNode *NoAliasTag = nullptr);

		/// \brief Create and insert an unordered-atomic memcpy between the specified
		/// pointers.
		///
		/// If the pointers aren't i8*, they will be converted. If a TBAA tag is
		/// specified, it will be added to the instruction. Likewise with alias.scope
		/// and noalias tags.
		CallInst *CreateElementUnorderedAtomicMemCpy(
		Value Dst, Value Src, uint64_t Size, unsigned Align, bool dst_unordered,
		bool src_unordered, uint8_t elementsize, MDNode *TBAATag = nullptr,
		MDNode TBAAStructTag = nullptr, MDNode ScopeTag = nullptr,
		MDNode *NoAliasTag = nullptr) {
		return CreateElementUnorderedAtomicMemCpy(
		Dst, Src, getInt64(Size), Align, dst_unordered, src_unordered,
		elementsize, TBAATag, TBAAStructTag, ScopeTag, NoAliasTag);
		}

		CallInst *CreateElementUnorderedAtomicMemCpy(
		Value Dst, Value Src, Value *Size, unsigned Align, bool dst_unordered,
		bool src_isunordered, uint8_t elementsize, MDNode *TBAATag = nullptr,
		MDNode TBAAStructTag = nullptr, MDNode ScopeTag = nullptr,
		MDNode *NoAliasTag = nullptr);

/// \brief Create and insert a memmove between the specified		/// \brief Create and insert a memmove between the specified
/// pointers.		/// pointers.
///		///
/// If the pointers aren't i8*, they will be converted. If a TBAA tag is		/// If the pointers aren't i8*, they will be converted. If a TBAA tag is
/// specified, it will be added to the instruction. Likewise with alias.scope		/// specified, it will be added to the instruction. Likewise with alias.scope
/// and noalias tags.		/// and noalias tags.
CallInst CreateMemMove(Value Dst, Value *Src, uint64_t Size, unsigned Align,		CallInst CreateMemMove(Value Dst, Value *Src, uint64_t Size, unsigned Align,
bool isVolatile = false, MDNode *TBAATag = nullptr,		bool isVolatile = false, MDNode *TBAATag = nullptr,
▲ Show 20 Lines • Show All 1,497 Lines • Show Last 20 Lines

lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (ScopeTag)
CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

if (NoAliasTag)		if (NoAliasTag)
CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

return CI;		return CI;
}		}

		CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(
		Value Dst, Value Src, Value *Size, unsigned Align, bool dst_unordered,
		bool src_unordered, uint8_t elementsize, MDNode *TBAATag,
		MDNode TBAAStructTag, MDNode ScopeTag, MDNode *NoAliasTag) {
		Dst = getCastedInt8PtrValue(Dst);
		Src = getCastedInt8PtrValue(Src);

		Value *Ops[] = {Dst,
		Src,
		Size,
		getInt32(Align),
		getInt1(0),
		getInt1(dst_unordered),
		getInt1(src_unordered),
		getInt8(elementsize)};
		Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()};
		Module *M = BB->getParent()->getParent();
		Value *TheFn = Intrinsic::getDeclaration(
		M, Intrinsic::memcpy_element_unordered_atomic, Tys);

		CallInst *CI = createCallHelper(TheFn, Ops, this);

		// Set the TBAA info if present.
		if (TBAATag)
		CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);

		// Set the TBAA Struct info if present.
		if (TBAAStructTag)
		CI->setMetadata(LLVMContext::MD_tbaa_struct, TBAAStructTag);

		if (ScopeTag)
		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

		if (NoAliasTag)
		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

		return CI;
		}

CallInst *IRBuilderBase::		CallInst *IRBuilderBase::
CreateMemMove(Value Dst, Value Src, Value *Size, unsigned Align,		CreateMemMove(Value Dst, Value Src, Value *Size, unsigned Align,
bool isVolatile, MDNode TBAATag, MDNode ScopeTag,		bool isVolatile, MDNode TBAATag, MDNode ScopeTag,
MDNode *NoAliasTag) {		MDNode *NoAliasTag) {
Dst = getCastedInt8PtrValue(Dst);		Dst = getCastedInt8PtrValue(Dst);
Src = getCastedInt8PtrValue(Src);		Src = getCastedInt8PtrValue(Src);

Value *Ops[] = { Dst, Src, Size, getInt32(Align), getInt1(isVolatile) };		Value *Ops[] = { Dst, Src, Size, getInt32(Align), getInt1(isVolatile) };
▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");		STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");

static cl::opt<bool> UseLIRCodeSizeHeurs(		static cl::opt<bool> UseLIRCodeSizeHeurs(
"use-lir-code-size-heurs",		"use-lir-code-size-heurs",
cl::desc("Use loop idiom recognition code size heuristics when compiling"		cl::desc("Use loop idiom recognition code size heuristics when compiling"
"with -Os/-Oz"),		"with -Os/-Oz"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<bool> RecogUnorderedAtomicMemcpy(
		"loop-idiom-unordered-atomic-memcpy",
		cl::desc("Allow loop idiom recognition to find and insert unordered-atomic "
		"memcpy intrinsics"),
		cl::init(false), cl::Hidden);
		reamesUnsubmitted Done Reply Inline Actions Why have this off by default? reames: Why have this off by default?
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions I'm removing the option entirely in the next diff. dneilson: I'm removing the option entirely in the next diff.

namespace {		namespace {

class LoopIdiomRecognize {		class LoopIdiomRecognize {
Loop *CurLoop;		Loop *CurLoop;
AliasAnalysis *AA;		AliasAnalysis *AA;
DominatorTree *DT;		DominatorTree *DT;
LoopInfo *LI;		LoopInfo *LI;
ScalarEvolution *SE;		ScalarEvolution *SE;
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	static Constant getMemSetPatternValue(Value V, const DataLayout *DL) {
// Otherwise, we'll use an array of the constants.		// Otherwise, we'll use an array of the constants.
unsigned ArraySize = 16 / Size;		unsigned ArraySize = 16 / Size;
ArrayType *AT = ArrayType::get(V->getType(), ArraySize);		ArrayType *AT = ArrayType::get(V->getType(), ArraySize);
return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));		return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));
}		}

bool LoopIdiomRecognize::isLegalStore(StoreInst *SI, bool &ForMemset,		bool LoopIdiomRecognize::isLegalStore(StoreInst *SI, bool &ForMemset,
bool &ForMemsetPattern, bool &ForMemcpy) {		bool &ForMemsetPattern, bool &ForMemcpy) {
		bool ForUnorderedAtomic = SI->isAtomic() && SI->isUnordered();;
		annaUnsubmitted Done Reply Inline Actions Nit: extra semi colon. anna: Nit: extra semi colon.
		reamesUnsubmitted Done Reply Inline Actions This is missing the case where a instruction is both unordered atomic and volatile. Add fix and test case please. reames: This is missing the case where a instruction is both unordered atomic and volatile. Add fix…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions isUnordered() precludes volatile. From StoreInst: 393 bool isUnordered() const { 394 return (getOrdering() == AtomicOrdering::NotAtomic \|\| 395 getOrdering() == AtomicOrdering::Unordered) && 396 !isVolatile(); 397 } So, the case isn't missing. dneilson: isUnordered() precludes volatile. From StoreInst: 393 bool isUnordered() const { 394…
// Don't touch volatile stores.		// Don't touch volatile stores.
if (!SI->isSimple())		if (!ForUnorderedAtomic && !SI->isSimple())
		reamesUnsubmitted Not Done Reply Inline Actions This predicate shows up a lot and confuses the code. Either remove the option or think about how to factor this out more cleanly. reames: This predicate shows up a lot and confuses the code. Either remove the option or think about…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions My inclination is to ultimately remove the option. It's just there during initial development. dneilson: My inclination is to ultimately remove the option. It's just there during initial development.
		annaUnsubmitted Not Done Reply Inline Actions This predicate is really confusing. `isSimple = !isAtomic && !isVolatile` I'm not even sure if isSimple and ForUnorderedAtomic might negate each other. anna: This predicate is really confusing. `isSimple = !isAtomic && !isVolatile` I'm not even sure if…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions It's the double negation... always confusing. !ForUnordered && !isSimple => !(ForUnordered \|\| isSimple) => !( (isAtomic && isUnordered) \|\| isSimple) => !( (isAtomic && (isSimple \|\| unordered-atomic)) \|\| isSimple ) Which is exactly what's wanted here. dneilson: It's the double negation... always confusing. !ForUnordered && !isSimple => !(ForUnordered \|\|…
		reamesUnsubmitted Not Done Reply Inline Actions I'd strongly recommend rewriting this as: if (Volatile) return None; if (isAtomic() && Ordering != unordered) return None; reames: I'd strongly recommend rewriting this as: if (Volatile) return None; if (isAtomic() &&…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions True enough... wouldn't hurt to make it more explicit. I'm thinking: if (Volatile) return None; if (not Unordered) return None; dneilson: True enough... wouldn't hurt to make it more explicit. I'm thinking: if (Volatile) return None…
return false;		return false;

		reamesUnsubmitted Done Reply Inline Actions I don't understand the note? Maybe just remove reames: I don't understand the note? Maybe just remove
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions The test is for 'isUnordered()', and the note aims to clarify that unordered implies (=>) simple. dneilson: The test is for 'isUnordered()', and the note aims to clarify that unordered implies (=>)…
// Don't convert stores of non-integral pointer types to memsets (which stores		// Don't convert stores of non-integral pointer types to memsets (which stores
// integers).		// integers).
if (DL->isNonIntegralPointerType(SI->getValueOperand()->getType()))		if (DL->isNonIntegralPointerType(SI->getValueOperand()->getType()))
return false;		return false;

// Avoid merging nontemporal stores.		// Avoid merging nontemporal stores.
if (SI->getMetadata(LLVMContext::MD_nontemporal))		if (SI->getMetadata(LLVMContext::MD_nontemporal))
return false;		return false;
Show All 24 Lines	bool LoopIdiomRecognize::isLegalStore(StoreInst *SI, bool &ForMemset,
// turned into a memset of i8 -1, assuming that all the consecutive bytes		// turned into a memset of i8 -1, assuming that all the consecutive bytes
// are stored. A store of i32 0x01020304 can never be turned into a memset,		// are stored. A store of i32 0x01020304 can never be turned into a memset,
// but it can be turned into memset_pattern if the target supports it.		// but it can be turned into memset_pattern if the target supports it.
Value *SplatValue = isBytewiseValue(StoredVal);		Value *SplatValue = isBytewiseValue(StoredVal);
Constant *PatternValue = nullptr;		Constant *PatternValue = nullptr;

// If we're allowed to form a memset, and the stored value would be		// If we're allowed to form a memset, and the stored value would be
// acceptable for memset, use it.		// acceptable for memset, use it.
if (HasMemset && SplatValue &&		if (!ForUnorderedAtomic && HasMemset && SplatValue &&
		annaUnsubmitted Done Reply Inline Actions Can you pls add a comment here stating that this is not supported for memset and memset_patternN. anna: Can you pls add a comment here stating that this is not supported for memset and…
		reamesUnsubmitted Done Reply Inline Actions add: (yet) reames: add: (yet)
// Verify that the stored value is loop invariant. If not, we can't		// Verify that the stored value is loop invariant. If not, we can't
// promote the memset.		// promote the memset.
CurLoop->isLoopInvariant(SplatValue)) {		CurLoop->isLoopInvariant(SplatValue)) {
// It looks like we can use SplatValue.		// It looks like we can use SplatValue.
ForMemset = true;		ForMemset = true;
return true;		return true;
} else if (HasMemsetPattern &&		} else if (!ForUnorderedAtomic && HasMemsetPattern &&
// Don't create memset_pattern16s with address spaces.		// Don't create memset_pattern16s with address spaces.
StorePtr->getType()->getPointerAddressSpace() == 0 &&		StorePtr->getType()->getPointerAddressSpace() == 0 &&
(PatternValue = getMemSetPatternValue(StoredVal, DL))) {		(PatternValue = getMemSetPatternValue(StoredVal, DL))) {
// It looks like we can use PatternValue!		// It looks like we can use PatternValue!
ForMemsetPattern = true;		ForMemsetPattern = true;
return true;		return true;
}		}

// Otherwise, see if the store can be turned into a memcpy.		// Otherwise, see if the store can be turned into a memcpy.
if (HasMemcpy) {		if (HasMemcpy) {
// Check to see if the stride matches the size of the store. If so, then we		// Check to see if the stride matches the size of the store. If so, then we
// know that every byte is touched in the loop.		// know that every byte is touched in the loop.
APInt Stride = getStoreStride(StoreEv);		APInt Stride = getStoreStride(StoreEv);
unsigned StoreSize = getStoreSizeInBytes(SI, DL);		unsigned StoreSize = getStoreSizeInBytes(SI, DL);
if (StoreSize != Stride && StoreSize != -Stride)		if (StoreSize != Stride && StoreSize != -Stride)
return false;		return false;

// The store must be feeding a non-volatile load.		// The store must be feeding a non-volatile load.
LoadInst *LI = dyn_cast<LoadInst>(SI->getValueOperand());		LoadInst *LI = dyn_cast<LoadInst>(SI->getValueOperand());
if (!LI \|\| !LI->isSimple())		if (!LI \|\| (RecogUnorderedAtomicMemcpy && !LI->isUnordered()) \|\|
		(!RecogUnorderedAtomicMemcpy && !LI->isSimple()))
		annaUnsubmitted Done Reply Inline Actions Actually, there are 2 booleans (RecogUnorderedAtomicMemcpy and ForUnorderedAtomic ) and they are being used for different purposes throughout this function. Could you perhaps do an early bail out and state the requirements at the start of the function? Also, add an assert at the caller of `isLegalStore` that unordered atomic stores was legal only for `memcpy`. anna: Actually, there are 2 booleans (RecogUnorderedAtomicMemcpy and ForUnorderedAtomic ) and they…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions One (RecogUnorderedAtomicMemcpy) was a command-line arg to turn on the idiom recognition for unordered atomic memcpy. I've just removed the option entirely. dneilson: One (RecogUnorderedAtomicMemcpy) was a command-line arg to turn on the idiom recognition for…
return false;		return false;

// See if the pointer expression is an AddRec like {base,+,1} on the current		// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided load. If we have something else, it's a		// loop, which indicates a strided load. If we have something else, it's a
// random load we can't handle.		// random load we can't handle.
const SCEVAddRecExpr *LoadEv =		const SCEVAddRecExpr *LoadEv =
dyn_cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));		dyn_cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));
if (!LoadEv \|\| LoadEv->getLoop() != CurLoop \|\| !LoadEv->isAffine())		if (!LoadEv \|\| LoadEv->getLoop() != CurLoop \|\| !LoadEv->isAffine())
▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines	NewCall =
Builder.CreateMemSet(BasePtr, SplatValue, NumBytes, StoreAlignment);		Builder.CreateMemSet(BasePtr, SplatValue, NumBytes, StoreAlignment);
} else {		} else {
// Everything is emitted in default address space		// Everything is emitted in default address space
Type *Int8PtrTy = DestInt8PtrTy;		Type *Int8PtrTy = DestInt8PtrTy;

Module *M = TheStore->getModule();		Module *M = TheStore->getModule();
Value *MSP =		Value *MSP =
M->getOrInsertFunction("memset_pattern16", Builder.getVoidTy(),		M->getOrInsertFunction("memset_pattern16", Builder.getVoidTy(),
Int8PtrTy, Int8PtrTy, IntPtr);		Int8PtrTy, Int8PtrTy, IntPtr);
		reamesUnsubmitted Done Reply Inline Actions Unrelated whitespace. Please remove. reames: Unrelated whitespace. Please remove.
inferLibFuncAttributes(M->getFunction("memset_pattern16"), TLI);		inferLibFuncAttributes(M->getFunction("memset_pattern16"), TLI);

// Otherwise we should form a memset_pattern16. PatternValue is known to be		// Otherwise we should form a memset_pattern16. PatternValue is known to be
// an constant array of 16-bytes. Plop the value into a mergable global.		// an constant array of 16-bytes. Plop the value into a mergable global.
GlobalVariable GV = new GlobalVariable(M, PatternValue->getType(), true,		GlobalVariable GV = new GlobalVariable(M, PatternValue->getType(), true,
GlobalValue::PrivateLinkage,		GlobalValue::PrivateLinkage,
PatternValue, ".memset_pattern");		PatternValue, ".memset_pattern");
GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global); // Ok to merge these.		GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global); // Ok to merge these.
Show All 14 Lines	bool LoopIdiomRecognize::processLoopStridedStore(
return true;		return true;
}		}

/// If the stored value is a strided load in the same loop with the same stride		/// If the stored value is a strided load in the same loop with the same stride
/// this may be transformable into a memcpy. This kicks in for stuff like		/// this may be transformable into a memcpy. This kicks in for stuff like
/// for (i) A[i] = B[i];		/// for (i) A[i] = B[i];
bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,		bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,
const SCEV *BECount) {		const SCEV *BECount) {
		if (!RecogUnorderedAtomicMemcpy)
assert(SI->isSimple() && "Expected only non-volatile stores.");		assert(SI->isSimple() && "Expected only non-volatile stores.");
		else
		assert(SI->isUnordered() &&
		"Expected only non-volatile non-ordered stores.");

Value *StorePtr = SI->getPointerOperand();		Value *StorePtr = SI->getPointerOperand();
const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));		const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
APInt Stride = getStoreStride(StoreEv);		APInt Stride = getStoreStride(StoreEv);
unsigned StoreSize = getStoreSizeInBytes(SI, DL);		unsigned StoreSize = getStoreSizeInBytes(SI, DL);
bool NegStride = StoreSize == -Stride;		bool NegStride = StoreSize == -Stride;

// The store must be feeding a non-volatile load.		// The store must be feeding a non-volatile load.
LoadInst *LI = cast<LoadInst>(SI->getValueOperand());		LoadInst *LI = cast<LoadInst>(SI->getValueOperand());
assert(LI->isSimple() && "Expected only non-volatile stores.");		if (!RecogUnorderedAtomicMemcpy)
		assert(LI->isSimple() && "Expected only non-volatile loads.");
		else
		assert(LI->isUnordered() &&
		"Expected only non-volatile non-ordered loads.");

// See if the pointer expression is an AddRec like {base,+,1} on the current		// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided load. If we have something else, it's a		// loop, which indicates a strided load. If we have something else, it's a
// random load we can't handle.		// random load we can't handle.
const SCEVAddRecExpr *LoadEv =		const SCEVAddRecExpr *LoadEv =
cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));		cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));

// The trip count of the loop and the base pointer of the addrec SCEV is		// The trip count of the loop and the base pointer of the addrec SCEV is
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (mayLoopAccessLocation(LoadBasePtr, MRI_Mod, CurLoop, BECount, StoreSize,
RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);		RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);
return false;		return false;
}		}

if (avoidLIRForMultiBlockLoop())		if (avoidLIRForMultiBlockLoop())
return false;		return false;

// Okay, everything is safe, we can transform this!		// Okay, everything is safe, we can transform this!

// The # stored bytes is (BECount+1)*Size. Expand the trip count out to		// The # stored bytes is (BECount+1)*Size. Expand the trip count out to
// pointer size if it isn't already.		// pointer size if it isn't already.
BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);		BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);

const SCEV *NumBytesS =		const SCEV *NumBytesS =
SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);		SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
if (StoreSize != 1)		if (StoreSize != 1)
NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),		NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
SCEV::FlagNUW);		SCEV::FlagNUW);

Value *NumBytes =		Value *NumBytes =
Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());		Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());

CallInst *NewCall =		unsigned Align = std::min(SI->getAlignment(), LI->getAlignment());
Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes,		CallInst *NewCall = nullptr;
		reamesUnsubmitted Not Done Reply Inline Actions Can this be either a tiernary or a helper lambda to remove the potentially uninitialized variable? reames: Can this be either a tiernary or a helper lambda to remove the potentially uninitialized…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Could be, but I'm not convinced that it really buys anything. The value is defined in both branches of the if-else that immediately follows it. dneilson: Could be, but I'm not convinced that it really buys anything. The value is defined in both…
std::min(SI->getAlignment(), LI->getAlignment()));		if (RecogUnorderedAtomicMemcpy && (SI->isAtomic() \|\| LI->isAtomic())) {
		// element.unordered.atomic is limited to 16-byte element-size because
		reamesUnsubmitted Not Done Reply Inline Actions This size limit is target specific. (Including which functions are defined.) For the moment, I'd be fine just making this a nicely commented global variable. Can this be sunk inside isLegalStore/isLegalLoad? It feels strange to have it here. reames: This size limit is target specific. (Including which functions are defined.) For the moment…
		// 1,2,4,8, and 16 are the only lib functions that are defined. Should this
		// be a limit to min(platform register size, 16) ?
		if (StoreSize > 16)
		reamesUnsubmitted Not Done Reply Inline Actions We also need to worry about alignment for the atomic loads and stores don't we? Or is that guaranteed by the fact they're atomic? Either case, comment potentially warranted? reames: We also need to worry about alignment for the atomic loads and stores don't we? Or is that…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Implied by the load/store. ex, from the load langref: "align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee." I do need to propagate the align info from the load/store to the pointer arg, though, and I haven't done that yet. It's probably worth mirroring that LangRef verbiage on alignment from atomic load/store into the atomic memcpy. dneilson: Implied by the load/store. ex, from the load langref: "align must be explicitly specified on…
		skatkovUnsubmitted Done Reply Inline Actions I would prefer to revert the if statement. if SI and LI are not atomic then memcpy otherwise your code. It is simpler to follow if the short case is handled first. skatkov: I would prefer to revert the if statement. if SI and LI are not atomic then memcpy otherwise…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Sure dneilson: Sure
		return false;
		NewCall = Builder.CreateElementUnorderedAtomicMemCpy(
		StoreBasePtr, LoadBasePtr, NumBytes, Align, SI->isAtomic(),
		LI->isAtomic(), StoreSize);
		annaUnsubmitted Done Reply Inline Actions We can do 2 things here: Add a hidden `cl` option for the size that is default to 16 for now, instead of the hardcoded value. You could take a look if `TTI` has that target specific information you need. Perhaps `getRegisterBitWidth`? anna: We can do 2 things here: Add a hidden `cl` option for the size that is default to 16 for now…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions I've changed this to query the runtime lib for whether an unordered atomic memcpy with the given store size exists. If it does, then we can create the memcpy; else we can't because we won't be able to lower it into a lib call. dneilson: I've changed this to query the runtime lib for whether an unordered atomic memcpy with the…
		// Propagate alignment info onto the pointer args. Note that unordered
		// atomic loads/stores are required by the spec to have an alignment
		annaUnsubmitted Done Reply Inline Actions Nit: end with period. anna: Nit: end with period.
		auto setAlignment = [NewCall](unsigned argNo, unsigned alignment) {
		// Don't set alignment of 0
		if (!alignment)
		return;
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions This is the replacement due to the inability to use RTLIB in ScalarOpt due to a circular dependency. It's not ideal as it's theoretically possible that larger width versions of the intrinsic's lib call will exist (ex: ones that are implemented to use vector regs for the load/stores), but we won't be able to exploit any that are wider than scalar register width. dneilson: This is the replacement due to the inability to use RTLIB in ScalarOpt due to a circular…
		annaUnsubmitted Not Done Reply Inline Actions Is it possible that even if the `StoreSizeBits` is within the max `registerBitWidth` for the target arch, we do not have the corresponding lib call? Could you please check if `TLI` has the information you require or can be modified to do so - it seems to have `memset`, `memcpy` etc. anna: Is it possible that even if the `StoreSizeBits` is within the max `registerBitWidth` for the…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Theoretically it's possible, yes. Lib calls for sizes 1, 2, 4, 8, and 16 bytes are currently defined. It a platform doesn't have one of those sizes's definition available and we create the call, then the result will be a link error -- which seems to be intentional per the discussion around the design of the element atomic memcpy. If scalar registers with more than 16 bytes are available on the platform, then we'd end up creating the intrinsic call and then blowing up when lowering it to a libcall in SelectionDAGBuilder -- this is the part I'd like to avoid. TLI seems to be concerned with system lib calls (libc, libm, etc), and not the __llvm_ lib calls, so I don't think that's a suitable place for this. dneilson: Theoretically it's possible, yes. Lib calls for sizes 1, 2, 4, 8, and 16 bytes are currently…
		NewCall->addParamAttr(argNo, Attribute::getWithAlignment(NewCall->getContext(), alignment));
		};
		setAlignment(0, SI->getAlignment());
		annaUnsubmitted Not Done Reply Inline Actions If the spec requires atomic loads and stores to have an alignment, shouldn't this be an assert, rather than a check and return? anna: If the spec requires atomic loads and stores to have an alignment, shouldn't this be an assert…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Spec requires atomic loads/stores to have alignment, but has no such requirement for non-atomic loads/stores. It's possible that one of source or dest will be non-atomic. I'll clarify the comment. dneilson: Spec requires atomic loads/stores to have alignment, but has no such requirement for non-atomic…
		setAlignment(1, LI->getAlignment());
		} else {
		NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);
		}
		annaUnsubmitted Done Reply Inline Actions Nit: no need of braces for single line else. anna: Nit: no need of braces for single line else.
		reamesUnsubmitted Done Reply Inline Actions This is incorrect. We can't allow a misaligned atomic memcpy. If we can't ensure the load and store is sufficiently aligned, we must reject the transform. Fix and add test please. reames: This is incorrect. We can't allow a misaligned atomic memcpy. If we can't ensure the load and…
		dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Funny enough, I caught this when updating the version of this patch that's overtop of the changed atomic memcpy intrinsic; just didn't port it to this version. Will do that... dneilson: Funny enough, I caught this when updating the version of this patch that's overtop of the…
NewCall->setDebugLoc(SI->getDebugLoc());		NewCall->setDebugLoc(SI->getDebugLoc());

DEBUG(dbgs() << " Formed memcpy: " << *NewCall << "\n"		DEBUG(dbgs() << " Formed memcpy: " << *NewCall << "\n"
<< " from load ptr=" << LoadEv << " at: " << LI << "\n"		<< " from load ptr=" << LoadEv << " at: " << LI << "\n"
<< " from store ptr=" << StoreEv << " at: " << SI << "\n");		<< " from store ptr=" << StoreEv << " at: " << SI << "\n");

// Okay, the memcpy has been formed. Zap the original store and anything that		// Okay, the memcpy has been formed. Zap the original store and anything that
// feeds into it.		// feeds into it.
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	// step 2: detect instructions corresponding to "x2 = x1 & (x1 - 1)"
if (!SubOneOp)		if (!SubOneOp)
return false;		return false;

Instruction *SubInst = cast<Instruction>(SubOneOp);		Instruction *SubInst = cast<Instruction>(SubOneOp);
ConstantInt *Dec = dyn_cast<ConstantInt>(SubInst->getOperand(1));		ConstantInt *Dec = dyn_cast<ConstantInt>(SubInst->getOperand(1));
if (!Dec \|\|		if (!Dec \|\|
!((SubInst->getOpcode() == Instruction::Sub && Dec->isOne()) \|\|		!((SubInst->getOpcode() == Instruction::Sub && Dec->isOne()) \|\|
(SubInst->getOpcode() == Instruction::Add &&		(SubInst->getOpcode() == Instruction::Add &&
Dec->isAllOnesValue()))) {		Dec->isAllOnesValue()))) {
		reamesUnsubmitted Done Reply Inline Actions unrelated whitespace reames: unrelated whitespace
return false;		return false;
}		}
}		}

// step 3: Check the recurrence of variable X		// step 3: Check the recurrence of variable X
{		{
PhiX = dyn_cast<PHINode>(VarX1);		PhiX = dyn_cast<PHINode>(VarX1);
if (!PhiX \|\|		if (!PhiX \|\|
▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll

This file was added.

				; RUN: opt -basicaa -loop-idiom -loop-idiom-unordered-atomic-memcpy < %s -S \| FileCheck %s
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

				;; memcpy.unordered.atomic formation (atomic load & store)
				define void @test1(i64 %Size) nounwind ssp {
				; CHECK-LABEL: @test1(
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1, i1 false, i1 true, i1 true, i8 1)
				; CHECK-NOT: store
				; CHECK: ret void
				bb.nph:
				%Base = alloca i8, i32 10000
				%Dest = alloca i8, i32 10000
				br label %for.body

				for.body: ; preds = %bb.nph, %for.body
				%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
				%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
				%DestI = getelementptr i8, i8* %Dest, i64 %indvar
				%V = load atomic i8, i8* %I.0.014 unordered, align 1
				store atomic i8 %V, i8* %DestI unordered, align 1
				%indvar.next = add i64 %indvar, 1
				%exitcond = icmp eq i64 %indvar.next, %Size
				br i1 %exitcond, label %for.end, label %for.body

				annaUnsubmitted Done Reply Inline Actions Please add the `CHECK`s at the start of the function. That's the pattern followed usually in tests. anna: Please add the `CHECK`s at the start of the function. That's the pattern followed usually in…
				for.end: ; preds = %for.body, %entry
				ret void
				}

				;; memcpy.unordered.atomic formation (atomic store, normal load)
				define void @test2(i64 %Size) nounwind ssp {
				; CHECK-LABEL: @test2(
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1, i1 false, i1 true, i1 false, i8 1)
				; CHECK-NOT: store
				; CHECK: ret void
				bb.nph:
				%Base = alloca i8, i32 10000
				%Dest = alloca i8, i32 10000
				br label %for.body

				for.body: ; preds = %bb.nph, %for.body
				%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
				%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
				%DestI = getelementptr i8, i8* %Dest, i64 %indvar
				%V = load i8, i8* %I.0.014, align 1
				store atomic i8 %V, i8* %DestI unordered, align 1
				%indvar.next = add i64 %indvar, 1
				%exitcond = icmp eq i64 %indvar.next, %Size
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				;; memcpy.unordered.atomic formation (normal store, atomic load)
				define void @test3(i64 %Size) nounwind ssp {
				; CHECK-LABEL: @test3(
				; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1, i1 false, i1 false, i1 true, i8 1)
				; CHECK-NOT: store
				; CHECK: ret void
				bb.nph:
				%Base = alloca i8, i32 10000
				%Dest = alloca i8, i32 10000
				br label %for.body

				for.body: ; preds = %bb.nph, %for.body
				%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
				%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
				%DestI = getelementptr i8, i8* %Dest, i64 %indvar
				%V = load atomic i8, i8* %I.0.014 unordered, align 1
				store i8 %V, i8* %DestI, align 1
				%indvar.next = add i64 %indvar, 1
				%exitcond = icmp eq i64 %indvar.next, %Size
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}

				; Make sure that atomic memset doesn't get recognized by mistake
				define void @test_nomemset(i8* %Base, i64 %Size) nounwind ssp {
				annaUnsubmitted Done Reply Inline Actions please add a test for memset_patternN not being recognized as well. These tests should change once support is added for both memsets. anna: please add a test for memset_patternN not being recognized as well. These tests should change…
				; CHECK-LABEL: @test_nomemset(
				; CHECK-NOT: call void @llvm.memset
				; CHECK: store
				; CHECK: ret void
				bb.nph: ; preds = %entry
				br label %for.body

				for.body: ; preds = %bb.nph, %for.body
				%indvar = phi i64 [ 0, %bb.nph ], [ %indvar.next, %for.body ]
				%I.0.014 = getelementptr i8, i8* %Base, i64 %indvar
				store atomic i8 0, i8* %I.0.014 unordered, align 1
				%indvar.next = add i64 %indvar, 1
				%exitcond = icmp eq i64 %indvar.next, %Size
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				ret void
				}
				reamesUnsubmitted Done Reply Inline Actions Please add at least one test case for each element size in (2, 4, 8) reames: Please add at least one test case for each element size in (2, 4, 8)
				dneilsonAuthorUnsubmitted Not Done Reply Inline Actions Will do. dneilson: Will do.

This is an archive of the discontinued LLVM Phabricator instance.

[Atomics][LoopIdiom] Recognize unordered atomic memcpyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99477

include/llvm/IR/IRBuilder.h

lib/IR/IRBuilder.cpp

lib/Transforms/Scalar/LoopIdiomRecognize.cpp

test/Transforms/LoopIdiom/unordered-atomic-memcpy.ll

[Atomics][LoopIdiom] Recognize unordered atomic memcpy
ClosedPublic