This is an archive of the discontinued LLVM Phabricator instance.

Preserve nonnull metadata on Loads through SROA & mem2reg.
ClosedPublic

Authored by luqmana on Nov 24 2016, 2:23 PM.

Details

Summary

https://llvm.org/bugs/show_bug.cgi?id=31142 :

SROA was dropping the nonnull metadata on loads from allocas that got optimized out. This patch simply preserves nonnull metadata on loads through SROA and mem2reg.

Event Timeline

luqmana updated this revision to Diff 79258.Nov 24 2016, 2:23 PM
luqmana retitled this revision from to Preserve nonnull metadata on Loads through SROA & mem2reg..
luqmana updated this object.
luqmana added a reviewer: chandlerc.
luqmana added a subscriber: llvm-commits.
chandlerc edited edge metadata.Nov 24 2016, 3:37 PM

I don't get it, this patch looks like it is *removing* SROA's existing logic to propagate metadata? And it doesn't add any new test cases, just deletes an existing test?

luqmana updated this revision to Diff 79266.EditedNov 24 2016, 4:16 PM
luqmana edited edge metadata.

Whoops, that was my mistake. The diff was inverted and should be corrected now.

davide added a subscriber: davide.Nov 24 2016, 4:35 PM

Are you always guaranteed that the range doesn't contain zero?

luqmana added a comment.EditedNov 24 2016, 4:46 PM

Are you always guaranteed that the range doesn't contain zero?

Is it valid for the range to contain 0 if the load was already marked as nonnull?

arielb1 added a subscriber: arielb1.Dec 1 2016, 1:47 PM

Could you also preserve !range metadata? It behaves similarly to nonzero, and inttoptr/ptrtoint interchanges them.

efriedma added inline comments.
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
405

This isn't a legal transform: you're assigning "nonnull" to a completely unrelated instruction (which could have other users) just because it happens to be a LoadInst. I guess you could get away with this if you can prove there aren't any other uses of the value, though.

luqmana added inline comments.Dec 16 2016, 9:39 PM
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
405

Hmmm, I could just add a check for it being the only use but is it illegal? Just walking through this code:

  • We have an alloca with a single store (OnlyStore)
  • The value stored was from a Load (ReplVal)
  • Now, we have a Load (LI) from this alloca that's dominated by this store and it has !nonnull.

Adding !nonnull to ReplVal seems reasonable since (assuming) the !nonnull on LI was valid, we know that value must be non null. So, can we not just propagate it back to ReplVal?

efriedma added inline comments.Dec 19 2016, 10:31 AM
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
405

(assuming) the !nonnull on LI was valid, we know that value must be non null

!nonnull basically means "if this load would return null, we can treat it as poison". So you can't propagate backwards unless you can prove that the load will actually execute (LI post-dominates ReplVal), and that the loaded value will be used in a way that causes undefined behavior (a branch or call which uses LI post-dominates ReplVal). In practice, this is essentially impossible to check in LLVM.

arielb1 added inline comments.Dec 19 2016, 11:40 AM
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
405

I am quite sure I was told that !nonnull causes UB rather than poison. Is there some real documentation for that?

I am quite sure I was told that !nonnull causes UB rather than poison. Is there some real documentation for that?

Err, no, you're probably right; I haven't really worked with it, and LangRef doesn't say anything, so "must" defaults to undefined behavior.

You still can't propagate backwards without proving post-dominance.

I am quite sure I was told that !nonnull causes UB rather than poison. Is there some real documentation for that?

Err, no, you're probably right; I haven't really worked with it, and LangRef doesn't say anything, so "must" defaults to undefined behavior.

You still can't propagate backwards without proving post-dominance.

That should hold in this case no? Lines 361-381 assert that LI is dominated by the Store and the Store trivially dominates ReplVal?

Lines 361-381 assert that LI is dominated by the Store and the Store trivially dominates ReplVal?

Post-dominates, not dominates... they're sort of similar, but post-dominance involves following the control flow edges the other way. A compiler textbook should cover this.

luqmana updated this revision to Diff 82032.Dec 19 2016, 4:26 PM

Address comments from review.

  • Preserve nonnull metadata on Loads through SROA & mem2reg.
  • Pass PostDominatorTree to PromoteMemoryToRegister.
  • Assert Post-Dominance of Load from alloca before propogating metadata.

Lines 361-381 assert that LI is dominated by the Store and the Store trivially dominates ReplVal?

Post-dominates, not dominates... they're sort of similar, but post-dominance involves following the control flow edges the other way. A compiler textbook should cover this.

Ah yes, missed that detail. Updated to also consider post-dominance.

luqmana updated this revision to Diff 82033.Dec 19 2016, 4:41 PM

Remove unnecessary changes.

PostDominatorTree doesn't provide the kind of post-dominance you need for this transformation: LLVM basic blocks have implicit edges which exit the function early, and those edges aren't reflected in the tree. For example, execution doesn't actually continue after a call to exit(). (See also the llvm::isGuaranteedToTransferExecutionToSuccessor() utility.)

PostDominatorTree doesn't provide the kind of post-dominance you need for this transformation: LLVM basic blocks have implicit edges which exit the function early, and those edges aren't reflected in the tree. For example, execution doesn't actually continue after a call to exit(). (See also the llvm::isGuaranteedToTransferExecutionToSuccessor() utility.)

Hmm, is it possible then to do this sort of analysis in LLVM in this context? Or is this optimization just not possible currently?

I don't think there's any existing analysis which computes the sort of post-dominator tree you want.

Anyway, I'm not sure this is really what you want... have you experimented with inserting llvm.assume calls?

I don't think there's any existing analysis which computes the sort of post-dominator tree you want.

Anyway, I'm not sure this is really what you want... have you experimented with inserting llvm.assume calls?

The problem wasn't that. We do have llvm.assume which InstCombine turns assume( (load addr) != null ) to a load addr !nonnull. Then, SROA/mem2reg end up eating the !nonnull metadata.

Some more details at https://github.com/rust-lang/rust/issues/37945 which was linked in the bug link in the description.

The problem wasn't that. We do have llvm.assume which InstCombine turns assume( (load addr) != null ) to a load addr !nonnull. Then, SROA/mem2reg end up eating the !nonnull metadata.

You can do the reverse transform in mem2reg: load addr !nonnull -> assume( (load addr) != null ), then when the load gets erased the nonnull assumption is preserved.

The problem wasn't that. We do have llvm.assume which InstCombine turns assume( (load addr) != null ) to a load addr !nonnull. Then, SROA/mem2reg end up eating the !nonnull metadata.

You can do the reverse transform in mem2reg: load addr !nonnull -> assume( (load addr) != null ), then when the load gets erased the nonnull assumption is preserved.

Or we could stop instcombine from doing this in the first place.

I think we should canonicalize on the assume formulation as it is strictly more general than the nonnull metadata formulation.

luqmana updated this revision to Diff 82692.Dec 29 2016, 12:51 PM

Use assume to preserve nonnull-ness of Load.

The problem wasn't that. We do have llvm.assume which InstCombine turns assume( (load addr) != null ) to a load addr !nonnull. Then, SROA/mem2reg end up eating the !nonnull metadata.

You can do the reverse transform in mem2reg: load addr !nonnull -> assume( (load addr) != null ), then when the load gets erased the nonnull assumption is preserved.

Or we could stop instcombine from doing this in the first place.

I think we should canonicalize on the assume formulation as it is strictly more general than the nonnull metadata formulation.

For this change I just implemented @efriedma's suggestion. I think changing how nonnull-ness is canonicalized is outside the scope of this.

Could you clean up the formatting to match LLVM coding standards? At the very least clang-format would help here (lots of lines over 80 columns, indent looks all wrong).

luqmana updated this revision to Diff 82698.Dec 29 2016, 2:39 PM

Run clang-format.

spatel added a subscriber: spatel.Jan 3 2017, 2:37 PM
spatel added inline comments.
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
964–977

Please make a helper function so we don't have duplicated code.

luqmana updated this revision to Diff 82982.Jan 3 2017, 5:44 PM
Move duplicate code to helper function.
spatel added inline comments.Jan 4 2017, 8:11 AM
test/Transforms/SROA/preserve-nonnull.ll
2

I don't think you want to include -instcombine here and create a dependency on its canonicalization of the assume to metadata - especially since that may change based on the comments in this review and PR31518 ( https://llvm.org/bugs/show_bug.cgi?id=31518 ).

You can use "utils/update_test_checks.py" to auto-generate exact CHECK lines for your test.

luqmana updated this revision to Diff 83090.Jan 4 2017, 11:20 AM

Simplify test and remove instcombine dependency.

spatel added a comment.Jan 4 2017, 5:48 PM

Thanks for the changes.

Someone else can correct me if I'm wrong, but I think we're delaying improvements in nonnull optimization until we have a clang hack to avoid disastrous optimizations of libc functions:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Thanks for the changes.

Someone else can correct me if I'm wrong, but I think we're delaying improvements in nonnull optimization until we have a clang hack to avoid disastrous optimizations of libc functions:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Yes, that thread seems largely in favor of getting something in place in Clang first.

If no one else works on a patch to Clang I may in a week or so, but sadly my queue is very full at the moment.

hfinkel added a subscriber: hfinkel.Jan 5 2017, 5:46 AM

Thanks for the changes.

Someone else can correct me if I'm wrong, but I think we're delaying improvements in nonnull optimization until we have a clang hack to avoid disastrous optimizations of libc functions:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Yes, that thread seems largely in favor of getting something in place in Clang first.

If no one else works on a patch to Clang I may in a week or so, but sadly my queue is very full at the moment.

I don't think that's relevant to this patch. Once we have the nonnull in metadata form, we should preserve it. The particular problem with optimizing based on nonnull function-argument annotations is propagating the nonnull annotation from a callsite back into the caller. For that, we should wait on associated Clang updates (so that we can avoid doing it for memcpy and friends based on header-file annotations). This change should proceed.

Bunch of little comments below.

I'd like to see two high level thinsg before this goes in though:

  1. I want to make sure Eli is happy with this approach w.r.t. any semantic differences between !nonnull metadata and the assume. Haven't seen an update from him since the switch of formulation.
  1. I'd like at least a test case and a FIXME around the case where SROA rewrites the load and store to be integer loads and stores. This will happen fairly often due to things like unions, memcpy and other "bag of bits" interpretations of memory operations. We should work to not destroy !nonnull in the process by translating it to an assume. I'm happy for this to be handled in a subsequent patch, but I'd like to at least document the issue for anyone who ends up working on it.
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
41

clang-format's header sort is wrong here (because PromoteMemToReg != PromoteMemoryToRegister).

306–313

This deosn't look like the right formatting... was clang-format misconfigured?

967–968

Here as well.

test/Transforms/SROA/preserve-nonnull.ll
7–8

Do you need these?

10

I would at least add the @ and maybe the define to this so that random other thinsg containing this string don't match confusingly down the road.

16

it's good to not use unnamed values in tests as they make future edits to the test brittle.

luqmana updated this revision to Diff 83271.Jan 5 2017, 10:43 AM
Fix some formatting and update test.

Bunch of little comments below.

I'd like to see two high level thinsg before this goes in though:

  1. I want to make sure Eli is happy with this approach w.r.t. any semantic differences between !nonnull metadata and the assume. Haven't seen an update from him since the switch of formulation.
  1. I'd like at least a test case and a FIXME around the case where SROA rewrites the load and store to be integer loads and stores. This will happen fairly often due to things like unions, memcpy and other "bag of bits" interpretations of memory operations. We should work to not destroy !nonnull in the process by translating it to an assume. I'm happy for this to be handled in a subsequent patch, but I'd like to at least document the issue for anyone who ends up working on it.

Updated the patch to address the little comments. As for the second point, I think that's better handled in a separate patch. If Eli is happy with the current approach I'd like to move forward with just this patch.

I want to make sure Eli is happy with this approach w.r.t. any semantic differences between !nonnull metadata and the assume. Haven't seen an update from him since the switch of formulation.

Semantically, this seems consistent with existing code dealing with !nonnull. That said, creating calls to @llvm.assume has the potential to cause performance regressions; there are a bunch of places in the optimizer which don't handle them well, so it's probably worth checking that this change doesn't create more problems than it solves.

I want to make sure Eli is happy with this approach w.r.t. any semantic differences between !nonnull metadata and the assume. Haven't seen an update from him since the switch of formulation.

Semantically, this seems consistent with existing code dealing with !nonnull. That said, creating calls to @llvm.assume has the potential to cause performance regressions; there are a bunch of places in the optimizer which don't handle them well, so it's probably worth checking that this change doesn't create more problems than it solves.

Are there any existing performance tests so I can determine if this might cause a regression?

Well, you can run the LLVM testsuite (http://llvm.org/docs/lnt/quickstart.html)... but that's written entirely written in C and C++, and I don't think clang produces nonnull metadata at all. Maybe you have some Rust performance tests you could use?

Semantically, this seems consistent with existing code dealing with !nonnull. That said, creating calls to @llvm.assume has the potential to cause performance regressions; there are a bunch of places in the optimizer which don't handle them well, so it's probably worth checking that this change doesn't create more problems than it solves.

Apologies if this isn't directly related to this patch, but one thing I noticed and mentioned in D28337: InstCombine is not calling into InstSimplify (and hence value tracking) with a context instruction in many cases. And so computeKnownBitsFromAssume() is not triggered in those cases. If we fix that (assuming that the lack of context instruction optional parameter is an oversight), we should make better use of the assumption machinery.

Apologies if this isn't directly related to this patch, but one thing I noticed and mentioned in D28337: InstCombine is not calling into InstSimplify (and hence value tracking) with a context instruction in many cases. And so computeKnownBitsFromAssume() is not triggered in those cases. If we fix that (assuming that the lack of context instruction optional parameter is an oversight), we should make better use of the assumption machinery.

I was more referring to the fact than m_OneUse etc. doesn't work the way you want it to in the presence of llvm.assume calls.

Well, you can run the LLVM testsuite (http://llvm.org/docs/lnt/quickstart.html)... but that's written entirely written in C and C++, and I don't think clang produces nonnull metadata at all. Maybe you have some Rust performance tests you could use?

Hmmm, I recall us not using as much assume due to supposed LLVM slowdowns but don't have any on hand.

So at least running the test suite I didn't come across any performance regressions. Could we land this and observe the buildbots or something for any regressions and back out as necessary?

Chandler, are you planning to continue reviewing this?

hfinkel added inline comments.Feb 9 2017, 4:24 PM
lib/Transforms/Utils/PromoteMemoryToRegister.cpp
408

Even if we're going to do this in general, we shouldn't do this when we can otherwise prove that the address was nonnull. Can you call isKnownNonNullAt and only add the intrinsic when we need it?

The same applies to other places where you call addAssumeNonNull.

luqmana updated this revision to Diff 87957.Feb 9 2017, 8:02 PM

Don't add assume if the value is already known to be nonnull.

I'm not familiar with this code, but I'd like to see this functionality, so any assistance in reviewing is appreciated.
Other than the inline comment, this is good now?

lib/Transforms/Scalar/SROA.cpp
2391–2404

Is the intent to extend this for some subset of metadata types beyond just MD_nonnull?

Either way, just make this one line for now?
NewLI->copyMetadata(LI, LLVMContext::MD_nonnull);

luqmana updated this revision to Diff 90035.Feb 28 2017, 7:57 AM

Use copyMetadata method instead of manually copying metadata.

Hal's earlier comment said we don't need to hold this up for clang, but for reference that fix is proposed now:
D30806

@chandlerc / @davide / @efriedma / @hfinkel - can you continue your review? I don't know enough to approve.

I was hoping Chandler would continue to review, but I guess I can do it.

The test here is really bare-bones; it doesn't really cover all the relevant codepaths. It doesn't check the effect of the isKnownNonNullAt, and it doesn't cover one of the codepaths in PromoteMemoryToRegister.cpp at all.

Otherwise looks fine per earlier discussion.

I was hoping Chandler would continue to review, but I guess I can do it.

FWIW, I'm happy to continue it or for you to continue it. I was planning to GC these threads when the clang patch actually landed, but if you can get to them sooner, so much the better.

The test here is really bare-bones; it doesn't really cover all the relevant codepaths. It doesn't check the effect of the isKnownNonNullAt, and it doesn't cover one of the codepaths in PromoteMemoryToRegister.cpp at all.

FWIW, I think this really needs to be addressed before it lands.

luqmana updated this revision to Diff 92597.Mar 21 2017, 9:57 PM

I added another test for all the relevant codepaths in PromoteMemoryToRegister.
I also addressed one case I missed earlier: when the alloca in question has
multiple stores but all within the same basic block.

efriedma accepted this revision.Mar 22 2017, 10:30 AM

LGTM.

Luqman, thanks for putting up with the long delay in reviewing this.

Do you have commit access, or should I commit this for you?

This revision is now accepted and ready to land.Mar 22 2017, 10:30 AM

LGTM.

Luqman, thanks for putting up with the long delay in reviewing this.

Do you have commit access, or should I commit this for you?

Thanks! No worries and I can commit it.

This revision was automatically updated to reflect the committed changes.