Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations.
- Updating to reflect changes in prerequisite change.
- Moved CHECKs in test case to start of procs.
Note: Test currently does not pass verification; need to propagate alignment info from the loads/stores onto the intrinsic pointer args.
I'm pretty sure this patch is buggy in a fairly major way. isLegalStore is a helper function used when matching three types of intrinsics: memset, memcpy, and memset_patternN. It looks like you're accidentally matching unorder atomics in all three idioms without the corresponding intrinsic support. At minimum, I'd want test cases showing these transforms *not* triggering.
I suspect that the code would become much cleaner if you restructured isLegalStore to return an enum of *which* intrinsic it was a legal store for. This could be a preparatory patch before this one to make it easier to reason about.
Why have this off by default?
This predicate shows up a lot and confuses the code. Either remove the option or think about how to factor this out more cleanly.
Unrelated whitespace. Please remove.
Can this be either a tiernary or a helper lambda to remove the potentially uninitialized variable?
This size limit is target specific. (Including which functions are defined.) For the moment, I'd be fine just making this a nicely commented global variable.
Can this be sunk inside isLegalStore/isLegalLoad? It feels strange to have it here.
We also need to worry about alignment for the atomic loads and stores don't we? Or is that guaranteed by the fact they're atomic? Either case, comment potentially warranted?
Good catch on isLegalStore; easy enough to clean up, and add a test case for. I'll think about the refactoring; you're probably right on that one...
My inclination is to ultimately remove the option. It's just there during initial development.
Implied by the load/store. ex, from the load langref: "align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee."
I *do* need to propagate the align info from the load/store to the pointer arg, though, and I haven't done that yet.
It's probably worth mirroring that LangRef verbiage on alignment from atomic load/store into the atomic memcpy.
- Address functional issue re: don't want to accidentally flag unordered atomic stores as okay for memset & memset_pattern.
- Add test case to verify functional issue is non-issue.
- Add align attribute to the pointer args on the generated intrinsic call.
TODO Remaining: Remove the command-line opt that toggles the transform.
Looking at the code in LoopIdiomRecognize::collectStores, it might be better to refactor it first. Firstly, there's couple of TODOs regarding adding more patterns such as memcmp, memmove etc.
So, having something like this : isLegalStore(SI, ForMemset, ForMemsetPattern, ForMemcpy) seems messy. Adding something like an enum of patterns we are recognizing against, will be useful.
Nit: extra semi colon.
This predicate is really confusing. isSimple = !isAtomic && !isVolatile
Can you pls add a comment here stating that this is not supported for memset and memset_patternN.
Actually, there are 2 booleans (RecogUnorderedAtomicMemcpy and ForUnorderedAtomic ) and they are being used for different purposes throughout this function. Could you perhaps do an early bail out and state the requirements at the start of the function?
Also, add an assert at the caller of isLegalStore that unordered atomic stores was legal only for memcpy.
Nit: end with period.
Nit: no need of braces for single line else.
Marking as needing changes, based on comments inline.
Daniel, as mentioned in previous comment: it might be better to start off with a refactoring NFC patch which would make the code which uses isLegaslStore cleaner. First, instead of bool, return the pattern for which this is valid. Creating an enum would be good - it states clearly which patterns are currently supported.
The caller of isLegalStore collects these stores based on the pattern, but there is no check that exactly one pattern is selected. Also there is an implicit ordering on which pattern is checked.
Once the NFC is reviewed and checked in, coming back to this patch and cleaning up the booleans and the various checks in isLegalStore might be easier.
I'm removing the option entirely in the next diff.
It's the double negation... always confusing.
!ForUnordered && !isSimple
Which is exactly what's wanted here.
One (RecogUnorderedAtomicMemcpy) was a command-line arg to turn on the idiom recognition for unordered atomic memcpy. I've just removed the option entirely.
I would prefer to revert the if statement.
It is simpler to follow if the short case is handled first.
Some comments inline.
We can do 2 things here: Add a hidden cl option for the size that is default to 16 for now, instead of the hardcoded value.
You could take a look if TTI has that target specific information you need. Perhaps getRegisterBitWidth?
If the spec requires atomic loads and stores to have an alignment, shouldn't this be an assert, rather than a check and return?
please add a test for memset_patternN not being recognized as well. These tests should change once support is added for both memsets.
I've changed this to query the runtime lib for whether an unordered atomic memcpy with the given store size exists. If it does, then we can create the memcpy; else we can't because we won't be able to lower it into a lib call.
Spec requires atomic loads/stores to have alignment, but has no such requirement for non-atomic loads/stores. It's possible that one of source or dest will be non-atomic. I'll clarify the comment.