This is a simplified version of https://reviews.llvm.org/D70223. Since we can at least be confident that single element store won't be worse than vector store, it's clearer to put them into InstCombine. And there's already some logic about transforming a shufflevector affecting one element into insertelement, we can take care of the only case.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@spatel @jdoerfert Should/could we use attributor to keep track of the 'storeable' memory range?
llvm/test/Transforms/InstCombine/single-element-store.ll | ||
---|---|---|
1 ↗ | (On Diff #235102) | Regenerate with update_test_checks.py ? |
InstCombine has some minimal load/store transforms, so this might be ok to add here, but InstCombine is probably not a pass that should try to do anything more complicated with memory ops.
This patch has the same requirements that were requested here:
https://reviews.llvm.org/D70223#1755719
...but it does not implement those.
In other words, the following tests will be miscompiled (and so they should be added to trunk independently before this patch tries to go any further):
define void @insert_store_addr(<16 x i8>* %p, <16 x i8>* %q, i8 %s) { %ld = load <16 x i8>, <16 x i8>* %p %ins = insertelement <16 x i8> %ld, i8 %s, i32 3 store <16 x i8> %ins, <16 x i8>* %q ; store to different address ret void } define void @insert_store_mem_mod(<16 x i8>* %p, <16 x i8>* %q, i8 %s) { %ld = load <16 x i8>, <16 x i8>* %p store <16 x i8> zeroinitializer, <16 x i8>* %q ; do pointers alias? %ins = insertelement <16 x i8> %ld, i8 %s, i32 3 store <16 x i8> %ins, <16 x i8>* %p ret void }
Addressed comments:
- Add check for address of load and store.
- Add check for any memory write instructions between load and store.
- Add more test cases for cases above. (Thanks to spatel)
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1202 | Not an expert of opt, I just wonder is it more proper to check if Load->getPointerOperand() MustAlias SI.getPointerOperand(), since there might be bitcasts of these pointers? |
For simple examples like this we already do. Run the Attributor on them and you get dereferenceable annotations for the argument pointers.
Nevertheless, we can lose information. My suggestions to keep it is to emit something like this:
`call void @llvm.assume(i1 true) ["derefereceable"(%ptr, sizeof(access-we-just-removed))]`
I do have a RFC on the mailing list on this topic but no conclusion yet. If we decide to go forward with it we start emitting them and I
we write a combiner pass as well as integration into things like the Attributor to use the information.
Long story short, I think doing this is fine, once we have better assume capabilities we should emit one to keep the information about dereferenceability.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1202 | I'd just call stripPointerCastSameRepresentation() on both pointers. That should give you all cases you care about (wrt. casts). Though, the match above might take care of it actually. | |
1222 | The GEP should be inbounds and the store should keep the nontemporal flag if present on the original. |
- Create inbounds store
- Strip casted pointer before comparison
- Make sure load and store belong to the same BB
- Keep nontemporal metadata of store
I think this is safe now (although limited to a single basic block), but I don't have enough experience with memop transforms to confidently approve. Please get a 2nd look from @lebedev.ri @jdoerfert @efriedma or someone else.
I'm also not sure if there's a better way to preserve metadata. For example, we have "copyMetadataForLoad()" - is there something like that that we can use here?
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1199 | Using 'auto *' is recommended with dyn_cast. | |
1219–1223 | Using 'auto' on these lines is not recommended. | |
llvm/test/Transforms/InstCombine/load-insert-store.ll | ||
7–8 | Need to regenerate the CHECK lines? All tests should have 'inbounds' on the GEP, right? | |
74 | Please add a test comment to explain the 2 transforms. Something like: ; p and q may alias, so it is not safe to remove the first load. ; r is known to not alias the other pointers, so it is safe to remove the second load. |
I think you need to check the elements are byte-sized?
I'm a little concerned we're doing this too early, and it will end up blocking optimizations like GVN (since it can't reason about the partially overlapping store). Not sure how likely that is to trigger in practice, though.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1210 | AliasAnalysis has a dedicated method getModRefInfo() to compute the exact property you want without caring about the specific kind of instruction. |
Address some comments.
- Change some use of auto.
- Update test cases with comments.
- Use getModRefInfo.
The merge check bot should have some problems in resolving parent-child revision with some already committed. Currently, they have no problem applying into master and tests are passed.
llvm/test/Transforms/InstCombine/load-insert-store.ll | ||
---|---|---|
7–8 | Sure. In some cases inbounds might be simplified, now we have them :-) |
- Use dedicated method to check ModRefInfo.
- Add tests about calls.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1195 | I think m_Load() is used for matching a load to some specific address. m_Load(Src) set Src as address of the load. But here we need the load instr to do more checks. Is that necessary? |
Unit tests: pass. 61802 tests passed, 0 failed and 781 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1207–1209 | Hm, how do we know both the load and the store are in the same basic block? | |
1210 | This looks better, but i don't have prior expirience with ModRefInfo, |
(marking as reviewed)
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1207–1209 | (I would expect that there should either already be a function to do this check, |
- Wrap memory modified check into a single method (which can help other methods)
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1207–1209 | We checked parent of Load and SI above, if they aren't the same, exit. | |
1210 | hmm.. As the doc/comment says, I think this bit indicates that the mem location may/must be modified, and there's another bit about it's a 'must' or a 'may'. So the three bit should each mean May Mod Ref. So Mod means it may write and mustAlias not found, while MustMod means it may write and mustAlias found. Here we only need to care if it may write or not (the bit), regardless of alias. Maybe I'm not correct. I found a related commit, rG50db8a, for a look :) |
Unit tests: pass. 61912 tests passed, 0 failed and 783 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Hm, i don't fully recall the story here, but @qiucf, are you still interested in this?
I think this looks about right now, with some nits.
@spatel, @jdoerfert: any other thoughts?
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1206 | We indeed can't easily (if at all?) handle non-constant case. Likewise i suspect constantexprs will give is trouble here, so i strongly recommend using m_ConstantInt(). | |
1209 | I think we are better-off early-returning instead. | |
1224 | Needs explicit Align(1) now. commonAlignment( SI.getAlign(), Idx->getUniqueInteger().getLimitedValue() * (NewElement->getType()->getScalarSizeInBits() / 8)) But that can be done in a followup patch, | |
llvm/test/Transforms/InstCombine/load-insert-store.ll | ||
2 | I wasn't sure whether or not this is endian-sensitive, alive2 says it's not, but maybe we should be proactive and do ; RUN: opt < %s -instcombine -S -data-layout=e | FileCheck %s --check-prefix=ALL --check-prefixes=CHECK,LE ; RUN: opt < %s -instcombine -S -data-layout=E | FileCheck %s --check-prefix=ALL --check-prefixes=CHECK,BE | |
4 | Add same test but with <8 x i16> (so it's not byte-sized)? |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
1190–1191 | Do we care whether these are single-use? |
One thing I've realized reading this thread again is that it's not only the compiler that can get confused by a wrong-width store; the CPU itself can also run into issues with store->load forwarding. See recent discussion http://lists.llvm.org/pipermail/llvm-dev/2020-May/141837.html . So whether this transform is worthwhile might depend on the context.
Right. And if we know problematic context/conditions, we can always perform reverse fold later on, in backend. Correct?
The C memory model doesn't allow reversing the fold in a lot of cases without encoding more information in the IR. Also, I'm not sure what the heuristic looks like, so I'm not sure how hard it is to compute in the backend.
I feel there are opportunities here as our vector handling needs to improve a lot.
Though, without more data to decide (1) where to put this (in the pipeline) and (2) when to enable this (platform, context,...), it is hard to say for sure.
Generally, I would like to have this in-tree (potentially under a flag).
I think the code looks good, though we can add FIXMEs for missing parts, e.g, better alignment, the load version of this, ...
We have -vector-combine now, and it only runs late in the pipeline. If we put this transform in there, that should bypass concerns about interfering with GVN. That pass also has access to the cost model, so we could try to limit the transform if there are known cases where it would not be profitable (ie, no need to try to implement a codegen reversal).
Thanks for comments!
I'll try moving this into vector combine. Maybe we can do/know more by help from TTI.
llvm/test/Transforms/InstCombine/load-insert-store.ll | ||
---|---|---|
2 | We have few docs explicitly talked about this, and I remember endianness doesn't matter. But adding BE/LE check is fine. Thanks. |
Do we care whether these are single-use?