Page MenuHomePhabricator

[InstCombine] Convert vector store to scalar store if only one element updated
Needs ReviewPublic

Authored by qiucf on Dec 23 2019, 12:23 AM.

Details

Reviewers
spatel
nemanjai
andrewrk
fhahn
efriedma
bogner
jdoerfert
lebedev.ri
Group Reviewers
Restricted Project
Summary

This is a simplified version of https://reviews.llvm.org/D70223. Since we can at least be confident that single element store won't be worse than vector store, it's clearer to put them into InstCombine. And there's already some logic about transforming a shufflevector affecting one element into insertelement, we can take care of the only case.

Diff Detail

Event Timeline

qiucf created this revision.Dec 23 2019, 12:23 AM
RKSimon added subscribers: jdoerfert, RKSimon.

@spatel @jdoerfert Should/could we use attributor to keep track of the 'storeable' memory range?

llvm/test/Transforms/InstCombine/single-element-store.ll
1 ↗(On Diff #235102)

Regenerate with update_test_checks.py ?

qiucf updated this revision to Diff 235104.Dec 23 2019, 12:52 AM

Update test using auto-genearate tool.

qiucf marked an inline comment as done.Dec 23 2019, 12:52 AM
lebedev.ri added inline comments.Dec 23 2019, 12:55 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1206

I'd strongly suggest to simply create a new store.

llvm/test/Transforms/InstCombine/single-element-store.ll
8 ↗(On Diff #235104)

The alignment doesn't seem correct to me?

spatel requested changes to this revision.Dec 23 2019, 5:47 AM

InstCombine has some minimal load/store transforms, so this might be ok to add here, but InstCombine is probably not a pass that should try to do anything more complicated with memory ops.

This patch has the same requirements that were requested here:
https://reviews.llvm.org/D70223#1755719
...but it does not implement those.

In other words, the following tests will be miscompiled (and so they should be added to trunk independently before this patch tries to go any further):

define void @insert_store_addr(<16 x i8>* %p, <16 x i8>* %q, i8 %s) {
  %ld = load <16 x i8>, <16 x i8>* %p
  %ins = insertelement <16 x i8> %ld, i8 %s, i32 3
  store <16 x i8> %ins, <16 x i8>* %q ; store to different address
  ret void
}

define void @insert_store_mem_mod(<16 x i8>* %p, <16 x i8>* %q, i8 %s) {
  %ld = load <16 x i8>, <16 x i8>* %p
  store <16 x i8> zeroinitializer, <16 x i8>* %q ; do pointers alias?
  %ins = insertelement <16 x i8> %ld, i8 %s, i32 3
  store <16 x i8> %ins, <16 x i8>* %p
  ret void
}
This revision now requires changes to proceed.Dec 23 2019, 5:47 AM
qiucf updated this revision to Diff 235318.Dec 25 2019, 11:18 PM

Addressed comments:

  • Add check for address of load and store.
  • Add check for any memory write instructions between load and store.
  • Add more test cases for cases above. (Thanks to spatel)
qiucf marked an inline comment as done.Dec 25 2019, 11:18 PM
lkail added a subscriber: lkail.Dec 26 2019, 12:01 AM
lkail added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1202

Not an expert of opt, I just wonder is it more proper to check if Load->getPointerOperand() MustAlias SI.getPointerOperand(), since there might be bitcasts of these pointers?

@spatel @jdoerfert Should/could we use attributor to keep track of the 'storeable' memory range?

For simple examples like this we already do. Run the Attributor on them and you get dereferenceable annotations for the argument pointers.
Nevertheless, we can lose information. My suggestions to keep it is to emit something like this:

`call void @llvm.assume(i1 true) ["derefereceable"(%ptr, sizeof(access-we-just-removed))]`

I do have a RFC on the mailing list on this topic but no conclusion yet. If we decide to go forward with it we start emitting them and I
we write a combiner pass as well as integration into things like the Attributor to use the information.

Long story short, I think doing this is fine, once we have better assume capabilities we should emit one to keep the information about dereferenceability.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1202

I'd just call stripPointerCastSameRepresentation() on both pointers. That should give you all cases you care about (wrt. casts). Though, the match above might take care of it actually.

1222

The GEP should be inbounds and the store should keep the nontemporal flag if present on the original.

qiucf updated this revision to Diff 235556.Dec 29 2019, 11:24 PM
  • Create inbounds store
  • Strip casted pointer before comparison
  • Make sure load and store belong to the same BB
  • Keep nontemporal metadata of store
jdoerfert resigned from this revision.Dec 29 2019, 11:32 PM

I am generally fine with this but others are reviewing this change already.

qiucf marked 3 inline comments as done.Jan 4 2020, 8:23 AM

Ping..

spatel added a comment.Jan 6 2020, 9:02 AM

I think this is safe now (although limited to a single basic block), but I don't have enough experience with memop transforms to confidently approve. Please get a 2nd look from @lebedev.ri @jdoerfert @efriedma or someone else.

I'm also not sure if there's a better way to preserve metadata. For example, we have "copyMetadataForLoad()" - is there something like that that we can use here?

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1199

Using 'auto *' is recommended with dyn_cast.

1219–1223
llvm/test/Transforms/InstCombine/load-insert-store.ll
7–8

Need to regenerate the CHECK lines? All tests should have 'inbounds' on the GEP, right?

74

Please add a test comment to explain the 2 transforms. Something like:

; p and q may alias, so it is not safe to remove the first load.
; r is known to not alias the other pointers, so it is safe to remove the second load.

I think you need to check the elements are byte-sized?

I'm a little concerned we're doing this too early, and it will end up blocking optimizations like GVN (since it can't reason about the partially overlapping store). Not sure how likely that is to trigger in practice, though.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1210

AliasAnalysis has a dedicated method getModRefInfo() to compute the exact property you want without caring about the specific kind of instruction.

qiucf updated this revision to Diff 236816.Jan 8 2020, 6:59 AM
qiucf marked 5 inline comments as done.

Address some comments.

  • Change some use of auto.
  • Update test cases with comments.
  • Use getModRefInfo.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

qiucf marked an inline comment as done.Jan 8 2020, 7:56 AM

The merge check bot should have some problems in resolving parent-child revision with some already committed. Currently, they have no problem applying into master and tests are passed.

llvm/test/Transforms/InstCombine/load-insert-store.ll
7–8

Sure. In some cases inbounds might be simplified, now we have them :-)

lebedev.ri requested changes to this revision.Jan 8 2020, 8:07 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1195

Match m_Load() specifically?

1210
  1. Is it bitfield? Should we be checking if (AA->getModRefInfo(&*BBI, MemoryLocation::get(&SI) & ModRefInfo::Mod) ?
  2. This does correctly work for calls right, not just instructions?
This revision now requires changes to proceed.Jan 8 2020, 8:07 AM
qiucf updated this revision to Diff 237891.Jan 14 2020, 2:18 AM
qiucf marked 2 inline comments as done.
  • Use dedicated method to check ModRefInfo.
  • Add tests about calls.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1195

I think m_Load() is used for matching a load to some specific address. m_Load(Src) set Src as address of the load. But here we need the load instr to do more checks. Is that necessary?

Unit tests: pass. 61802 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

lebedev.ri added inline comments.Jan 14 2020, 2:39 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1207–1209

Hm, how do we know both the load and the store are in the same basic block?

1210

This looks better, but i don't have prior expirience with ModRefInfo,
so this still looks suspicious to me - is this conservatively correct?
isModSet() looks for MustMod, which seems stronger than Mod?

lebedev.ri requested changes to this revision.Jan 14 2020, 8:40 AM

(marking as reviewed)

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1207–1209

(I would expect that there should either already be a function to do this check,
or this should be refactored into a helper function from the getgo.)

This revision now requires changes to proceed.Jan 14 2020, 8:40 AM
qiucf updated this revision to Diff 238436.Jan 16 2020, 2:40 AM
qiucf marked 4 inline comments as done.
  • Wrap memory modified check into a single method (which can help other methods)
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1207–1209

We checked parent of Load and SI above, if they aren't the same, exit.

1210

hmm.. As the doc/comment says, I think this bit indicates that the mem location may/must be modified, and there's another bit about it's a 'must' or a 'may'. So the three bit should each mean May Mod Ref.

So Mod means it may write and mustAlias not found, while MustMod means it may write and mustAlias found. Here we only need to care if it may write or not (the bit), regardless of alias.

Maybe I'm not correct. I found a related commit, rG50db8a, for a look :)

Unit tests: pass. 61912 tests passed, 0 failed and 783 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

lebedev.ri resigned from this revision.Jan 25 2020, 3:19 AM