Introduce LoadSite and StoreSite to enable us to deal with target specific load/store intrinsics with characteristics similar to those of regular load and store instructions. Remove ParseMemoryInst from EarlyCSE, and replace it with LoadSite and StoreSite.
Diff Detail
Event Timeline
Overall, heading in the right direction. I'm still not particularly happy about the complexity in EarlyCSE to handle target intrinsics at all, but this is definitely looking better than what's there currently.
Please update with context. The current diff is too hard to review in full. Some comments given, but they're really incomplete.
include/llvm/IR/LoadStoreSite.h | ||
---|---|---|
1 ↗ | (On Diff #21905) | I might call this something like MemoryAccessSite.h or the like. Mild preference, don't bother changing this until other things are settled. |
33 ↗ | (On Diff #21905) | I might call this MemoryAccessBase, but that's a minor preference. My thought process is that it allows a logic extension for an AtomicRMWSite at some point in the future. |
69 ↗ | (On Diff #21905) | I'd wrap getInt in a helper function which makes the purpose clearer to the reader. The following seems cleaner and easier to read: if (I.isTargetIntrinsic()) TTI->getTargetMemoryIntrinsicPointerOperand( cast<IntrinsicInstTy>(Inst)); |
76 ↗ | (On Diff #21905) | You've got this logic duplicated in a few places. Please common. |
99 ↗ | (On Diff #21905) | Having this here is sorta okay, but it really feels like this needs to live somewhere that applies easily to all intrinsics. We have the readonly/readnone attributes, why not just mark the underling intrinsics appropriately? |
110 ↗ | (On Diff #21905) | This shouldn't be necessary. If EarlyCSE doesn't know how to insert bitcasts, it really, really should. |
146 ↗ | (On Diff #21905) | This should probably be private. |
lib/Analysis/TargetTransformInfo.cpp | ||
269 | What do you mean by "Atomic" here? AtomicRMW (the instruction?). Atomic (as in memory ordering?) The naming is very unclear. | |
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
539 | Surely there are other intrinsics which read from memory? I really doubt the correctness of this code. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
567 | a) naming, b) inverted conditional (bug!) |
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
471 | The naming of these functions is inconsistent. I'm fine with either scheme, but please be consistent. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
469 | Volatile != Simple. Please use appropriate naming to match underling instruction. | |
482 | Er, is this a separate bugfix? I don't understand where this is coming from w.r.t. the requested refactor. | |
510 | It really feels like the functionality for whether an intrinsic reads from memory should live inside mayReadFromMemory. | |
545 | Given this is testing whether a location matches not whether the instruction matches, I would like to see something which made that explicit. |
Hi Philip,
Thanks for reviewing these changes! I'll upload a new patch with context.
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
471 | The inconsistency in naming is to indicate that isTargetIntrinsicLikeLoad may be invoked on any intrinsic instruction, whereas isTargetMemoryIntrinsic..... may be invoked on intrinsics that are either like loads or like stores. The asserts in TargetTransformInfo.cpp check for these properties. I renamed these from isTargetMemoryIntrinsic* to isTargetIntrinsic* in the new patch. I kept the asserts for now. Maybe they should be renamed to something like isTargetLoadOrStoreIntrinsicAtomic instead? Alternatively, we could do away with different query functions, and just have something like MemoryIntrinsicInfo getTargetMemoryIntrinsic(IntrinsicInst*); with overloaded operator bool(), which would return true if the MemoryIntrinsicInfo contains a supported intrinsic. The overhead would be that we would probably need to keep one MemoryIntrinsicInfo struct in LoadStoreSiteBase. This could perhaps avoid any inconsistencies due to having different query functions. | |
include/llvm/IR/LoadStoreSite.h | ||
1 ↗ | (On Diff #21905) | Yes, let's deal with renaming once other issues are addressed. |
33 ↗ | (On Diff #21905) | As per your previous suggestion, let's deal with renaming once other issues are addressed. |
69 ↗ | (On Diff #21905) | Done in the updated patch. |
76 ↗ | (On Diff #21905) | Commoned in the updated patch. |
99 ↗ | (On Diff #21905) | On AArch64 and ARM at least, store intrinsics are not marked as readnone by the front-end, and so mayReadFromMemory would always return true. For regular store instructions, mayReadFromMemory always returns false for non-atomic stores. This change allows the back-end to treat store intrinsics like regular stores. I don't think it's legal to mark store intrinsics with readnone. When readnone is used on a function, then the function "does not write through any pointer arguments", according to the LLVM Language Reference Manual. |
110 ↗ | (On Diff #21905) | I put this here for completeness, in case some other optimization makes use of this API. For example, it would return false when checking "store i32 %val, i32* %ptr" and "load %val2 = load i16, i16* %ptr2". This mirrors the check for %0 = bitcast i32* %a to i8* and %5 = bitcast i32* %a to i8* While the check for non-intrinsic loads and stores is not necessary when using this API in EaryCSE, I don't think I should just return "true" unconditionally for regular loads and stores. |
146 ↗ | (On Diff #21905) | Changed in the updated patch. |
lib/Analysis/TargetTransformInfo.cpp | ||
269 | This is supposed to check for memory ordering, similar to isAtomic() when invoked on regular loads and stores. The new name, isTargetIntrinsicAtomic(), is probably not any better. Is the comment in TargetTransformInfo.h enough? | |
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
539 | There are, but only a few intrinsics are supported for now (the ones in isTargetIntrinsicLikeLoad(...) and isTargetIntrinsicLikeStore(...)). The assert makes sure that we are dealing with supported load/store intrinsics. Perhaps, a better way would be to just have one query function for load/store intrinsics. Something like MemoryIntrinsicInfo getTargetMemoryIntrinsic(IntrinsicInst *Inst); which would return all the needed information inside the MemoryIntrinsicInfo struct? Then we could do away with separate APIs, which may be confusing. The overhead would be the inclusion of this struct in LoadStoreSiteBase. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
469 | Originally, the check was // Ignore volatile loads. The check "if (MemInst.isVolatile())" does the same check underneath, at least for regular loads and stores. The name of MemInst.isVolatile() is wrong - it should have been MemInst.isSimple()). The new check if (!LS.isSimple()) restores the original check that was in place. The comment , " Ignore volatile loads." should probably be " Ignore volatile and atomic loads". | |
482 | Yes, this bug was pointed out by Hal, and could affect Power intrinsics from what I understand. I thought I'd include it here. If you prefer, I can do this under a seperate bug fix once the refactor is sorted out. | |
510 | AArch64 store intrinsics are not marked as readnone, and mayReadFromMemory will always return true (unlike for regular non-atomic stores). So, we wouldn't get a chance to common intrinsic stores. It doesn't look to be legal to mark store intrinsics as readnone, according to the LLVM Language Reference Manual. | |
545 | The previous change, isMatchingMemLoc, has a misleading name. It checks for both the memory operand and the matching id. This is what LoadStoreSiteBase::Match does as well. I renamed LoadStoreSiteBase::Match to LoadStoreSiteBase::MatchInstructionAndLocation in the new patch. | |
567 | MemInst.isVolatile() is not correctly named - it checks for !isSimple() for regular loads and stores. Previous to MemInst change, the check was if (SI->isSimple()) The new change, if (SS.isSimple()), restores the original check. |
A few changes to address Philip's comments for LoadSite, StoreSite and the target query functions. Uploading with full context.
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
501 | From what I understand, IntrReadArgMem results in an intrinsic being annotated with the readonly attribute. However, the readonly attribute doesn't imply that the intrinsic has only one pointer argument. | |
532 | I think I need to provide a better comment here. For regular, non-atomic stores, Instruction::mayReadFromMemory() returns false. This is not the case for calls unless they are marked with readnone, and store intrinsics are not marked with readnone. |
A few standalone comments before I go through the entire patch again.
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
539 | I believe you've changed the name since the last version. Make sure the comments indicate that this only applies to target load and store intrinsics. I'm then fine with that. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
469 | Update the comment please. | |
482 | I would strongly prefer this be done separately. | |
510 | You misunderstood my comment. I was suggesting that the *function* mayReadFromMemory should check whether the intrinsic is a target specific store and return false. I'm fine with this being a separate refactor, but this type of special case doesn't belong in a particular transform pass when the answer is the same for anyone who asks. |
Ok, I'm running out of comments to make. The EarlyCSE code is much improved here; thank you for taking the time to do this.
I'm not entirely thrilled with how complex the target interface is, but that can be addressed incrementally. What you proposing appears to be correct and is now clearly documented enough to not be confusing.
If you do last one last update with all the naming and small things addressed, I'll do one more run through for a final LGTM. I considered giving a conditional LGTM now, but there a few too many issues that were deferred. (In particular, separation of unrelated bug fixes and naming.) The next round should be fast though.
include/llvm/IR/LoadStoreSite.h | ||
---|---|---|
41 ↗ | (On Diff #23125) | Leaving the member state unchanged after a call to setInstruction seems bug prone at best. Please either assert or clear the underlying data. (Looking through the calling code, I don't believe this is currently an issue. This is purely future proofing.) |
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
544 | I would prefer you not make the variable const rather than need a const_cast here. |
Sorry for taking so long to get back to this. This update addresses Philip's comments.
Sorry for taking so long to get back to this. The updated patch is revised to address previous comments. Also moved LoadStoreSite.h to include/llvm/Analysis, since it makes use of TargetTransformInfo.
include/llvm/IR/LoadStoreSite.h | ||
---|---|---|
41 ↗ | (On Diff #23125) | Addressed in the updated patch - the state is always updated. |
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
544 | Done in the updated patch. | |
588 | I changed this a little, so that mayTargetIntrinsicReadFromMemory and mayTargetIntrinsicWriteToMemory apply to all intrinsics that are supported by the target (if they choose to do so). The default implementation in TargetTransformInfoImplBase is: bool mayTargetIntrinsicReadFromMemory(const IntrinsicInst *II) { return !cast<CallInst>(II)->doesNotAccessMemory(); } I also introduced bool mayReadFromMemory(const Instruction *I) and bool mayWriteToMemory(const Instruction *I) to TTI that may invoke mayTargetIntrinsicReadFromMemory and mayTargetIntrinsicWriteToMemory, respectively. This lets us do away with the kludgy check in EarlyCSE. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
511–513 | Updated in the new patch. | |
524 | Will be done in the updated patch - the bug will be addressed separately. The bug can only affect non-AArch64 users, of which there are none as far I can see. | |
550 | Thanks for clarifying. Reworked in the updated patch - the checks are now done using TTI.mayReadFromMemory(Inst). |
Please post this patch with full context, see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface
One final round of comments. This looks really really close to ready to check in. Please make the requested changes (and nothing else) and I'll give it a LGTM once the patch is updated. Any further enhancements should go in separate follow on patches.
include/llvm/Analysis/LoadStoreSite.h | ||
---|---|---|
48 | auto | |
100 | This API feels really awkward. For now, I'll ask that you pull this function out of the helper class and make it a static function in EarlyCSE. That at least localizes the awkwardness to the place it's used. | |
123 | Same as previous. | |
include/llvm/Analysis/TargetTransformInfo.h | ||
501 | It looks like you may have merged two patches by accident here? | |
504 | Important detail that needs to be in this comment: not all load like target instructions will return true. This is correct because the optimizer must treat intrinsics calls conservatively. i.e. this is an optimization hook the target can hook into, not a correctness hook. | |
551 | Why not just mark the intrinsics with readonly or readnone? Unless you have a very good reason, please do that. | |
561 | This is the wrong place for this API. It should probably be on LoadSiteBase. It doesn't need to be part of TTI. | |
568 | Same as above. | |
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
342 | A slightly clearer way to phrase these assertions would be that the intrinsic passes either the LikeLoad or LikeStore predicate. This allows a target to override those two and not have to update the others to get a conservatively correct answer. | |
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
518 | You should assert that the intrinsic is in either the load or store list. Otherwise, this could return an incorrect result for an unknown intrinsic. Alternatively, it could be: | |
523 | Same. | |
534 | Submit separately please. | |
596 | If you rephrase the assertions above, you can just call the existing check functions and remove all these comments. Please do so. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
312–315 | Separable change. Please submit separately. | |
549 | Ah, given where this is used, ignore my comment about putting this on LoadStoreBase. Making it a static function in EarlyCSE is better for the moment. Still not on TTI, but LSB isn't a good place either. | |
592 | Use the assignment idiom as above. |
I agree with Philip's comments (except, perhaps, for the one on which I've commented below).
include/llvm/Analysis/LoadStoreSite.h | ||
---|---|---|
100 | Philip, why do you feel this is awkward? It seems that if the point of the abstraction is to hide the differentiation between regular load/stores and related target intrinsics. As such, it seems that this is an important part of the abstraction. | |
126 | MatchInstructionAndLocation -> matchInstructionAndLocation |
include/llvm/Analysis/LoadStoreSite.h | ||
---|---|---|
100 | Er, I've forgotten it's been so long since I looked at this. This was a minor comment and can be safely ignored. This can be addressed post commit if needed; I really don't want to block submission of this given how long it's been outstanding. |
include/llvm/Analysis/LoadStoreSite.h | ||
---|---|---|
48 | Will change to auto throughout in the updated patch. | |
100 | I'll keep isCompatible in LoadStoreSite for now. I'll move matchInstructionAndLocation to EarlyCSE. | |
123 | Will move it to EarlyCSE. | |
126 | Will rename and move to EarlyCSE. | |
include/llvm/Analysis/TargetTransformInfo.h | ||
501 | Git format-patch added a bit of additional context - removing areInlineCompatible, and then adding it back in. The function areInlineCompatible is part of TTI already. | |
504 | Will update in the new patch. | |
551 | I did this mainly for consistency. I need mayTargetIntrinsicReadFromMemory to return false for store-like intrinsics (which can't have the readnone attribute set). So, I thought I'd introduce mayTargetIntrinsicWriteToMemory. I'll remove mayTargetIntrinsicWriteToMemory in the updated patch, as load intrinsics coming from clang are already marked with the "readonly" attribute. (At least AArch64 ones are). | |
561 | I put this function (and mayWriteToMemory) here so that we can use a single check in EarlyCSE, as in: if (TTI.mayReadFromMemory(Inst)) ... Otherwise, if it's moved to LoadStoreBase, we have to check along the lines: StoreSite S(Inst, TTI); I can move mayReadFromMemory(cons Instruction *I) to EarlyCSE? | |
568 | Will move it to EarlyCSE. | |
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
342 | I will remove the assert for isTargetIntrinsicAtomic, and call II->isAtomic() for the default implementation. For isTargetIntrinsicVolatile, I'll add the asserts if an intrinsic isn't like a load or a store, and return true otherwise to err on the conservative side. I will keep asserts for getTargetIntrinsicPointerOperand, getOrCreateResultFromTargetIntrinsic and getTargetIntrinsicMatchingId, as these must be implemented by the target if they are called. | |
lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
518 | I will add a check for load/store intrinsic for isTargetIntrinsicVolatile. For isTargetIntrinsicAtomic, I'll return Inst->isAtomic() in the base TTI implementation. | |
523 | Will change this in the updated patch. | |
534 | Will change in the updated patch. | |
596 | I'll add a check for volatility and atomicity in mayTargetIntrinsicReadFromMemory in the updated patch, and remove the comments. I'll remove mayTargetIntrinsicWriteToMemory altogether as load intrinsics are marked as readonly. | |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
312–315 | Will do in the updated patch. | |
549 | Will make it a static function in the updated patch. | |
592 | Will fix in the updated patch. |
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
501 |
Not exactly. Such an intrinsic is read-only, but BasicAA also (automatically) learns the "arg mem" part of the property so that AA->getModRefBehavior can return FMRB_OnlyReadsArgumentPointees. Also, argmemonly is now a generic IR attribute that can be applied to any function. | |
518 | If this is important, why not add function attributes for these? |
auto