Page MenuHomePhabricator

Add support for !noundef metatdata on loads
ClosedPublic

Authored by aqjune on Thu, Oct 8, 8:40 AM.

Details

Summary

This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.

This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.

The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.

Diff Detail

Event Timeline

aqjune created this revision.Thu, Oct 8, 8:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Oct 8, 8:40 AM
aqjune requested review of this revision.Thu, Oct 8, 8:40 AM

While this is certainly in line with the prior art, I would like to revisit the trade-offs of metadata on loads vs assume. While we need to make assume side-effect free (but potentially UB so we keep control dependences), it would allow us to reduce the number of encodings. Let's say you promote a load with noundef, you might introduce an "assume nodundef" for the promoted value anyway. That begs the question why not use assume anyway. (FWIW, when metadata was introduced for these things assume did not allow non-boolean properties.)

I'd be interested what others think :)

aqjune added a comment.Thu, Oct 8, 9:16 AM

Oops, just saw the reply after sending the mail to llvm-dev.
Yes, I agree that llvm.assume can encode the noundef information more precisely and it's a benefit. I'm also happy to hear other's opinions.

I created D89054 for discussion purposes.

nikic added a comment.Thu, Oct 8, 10:01 AM

While this is certainly in line with the prior art, I would like to revisit the trade-offs of metadata on loads vs assume. While we need to make assume side-effect free (but potentially UB so we keep control dependences), it would allow us to reduce the number of encodings. Let's say you promote a load with noundef, you might introduce an "assume nodundef" for the promoted value anyway. That begs the question why not use assume anyway. (FWIW, when metadata was introduced for these things assume did not allow non-boolean properties.)

For some frontends (that model safe languages) it might be that *all* loads are noundef. While the work on assume bundles has improved the interaction of assumes with other optimizations, I think there are sufficient remaining issues that this kind of mass usage would probably result in regressions, especially as the optimization benefit of this metadata is currently rather minor. Additionally the sheer quantity of necessary assumes would likely have a non-trivial adverse effect on compile-time. If your original IR had 15% loads, and now you add one extra assume for every load, that will be a significant increase in instructions (not that metadata is completely free either).

15% loads, and now you add one extra assume for every load, that will be a significant increase in instructions (not that metadata is completely free either).

llvm.assume(i1 true) ["noundef"(%a), "noundef"(%b)] works right now just fine, we even merge assumes already with a pass in-tree (IIRC, @Tyker). The operand bundles could/should also take more operands to make it even simpler.

That said, I think @nikic is right that there is a trade off we have to consider and actually measure. I don't know what is best. What I know is that I would like us to employ assume aggressively to "retain knowledge". That will mean we have to deal with many llvm.assumes either way and I'd much rather have one encoding and smarts to deal with it than multiple ones.

For some frontends (that model safe languages) it might be that *all* loads are noundef.

I would consider a function/module attribute for this.

nikic added a comment.Thu, Oct 8, 1:06 PM

For some frontends (that model safe languages) it might be that *all* loads are noundef.

I would consider a function/module attribute for this.

Only loads emitted by the frontend would be noundef, optimization may introduce or move loads in a way that they are no longer noundef. I think such an attribute would violate LLVMs usual attribute design, as optimizations would now have to worry about dropping function attributes when performing speculation etc (is there any precedent for something like this?) We'd also lose the noundef property for all loads in a function at once, if a single one of them is speculated.

For some frontends (that model safe languages) it might be that *all* loads are noundef.

I would consider a function/module attribute for this.

Only loads emitted by the frontend would be noundef, optimization may introduce or move loads in a way that they are no longer noundef. I think such an attribute would violate LLVMs usual attribute design, as optimizations would now have to worry about dropping function attributes when performing speculation etc (is there any precedent for something like this?) We'd also lose the noundef property for all loads in a function at once, if a single one of them is speculated.

Yeah, good point ;). Bundling noundef in a single assume still works though ;)

I made D89219 that updates relevant functions to exploit llvm.assume's noundef operand bundle.

BTW, given v = op; call llvm.assume()["noundef"(v)], if we want to move v across the assume call, the call should be appropriately moved with v. I wonder whether there is a handy way to do this, since things get complicated when the assume has several operand bundles that are not using v.

What about leaving call llvm.assume whenever the load !noundef is moved to somewhere else? I guess this is what LLVM is doing as well.

!noundef is a syntactic sugar of llvm.assume()["noundef"(..)], but seems helpful for the brevity and size of the bitcode.

uenoku added a subscriber: uenoku.Mon, Oct 12, 7:25 AM

I made D89219 that updates relevant functions to exploit llvm.assume's noundef operand bundle.

Thx!

BTW, given v = op; call llvm.assume()["noundef"(v)], if we want to move v across the assume call, the call should be appropriately moved with v. I wonder whether there is a handy way to do this, since things get complicated when the assume has several operand bundles that are not using v.

Operand bundles are unknown uses. You should not be able to break things. Hoisting v does and should not impact the operand bundles. If you want to sink v you need to know about "droppable uses" and "drop them", SROA knows about them nowadays for example.

What about leaving call llvm.assume whenever the load !noundef is moved to somewhere else? I guess this is what LLVM is doing as well.

If --enable-knowledge-retention is set, we should certainly do that if we would otherwise strip the metadata (not only for noundef).

!noundef is a syntactic sugar of llvm.assume()["noundef"(..)], but seems helpful for the brevity and size of the bitcode.

I agree, especially if there is no llvm.assume available in the context to attach the information. If there is, I doubt there is overhead. My idea for the future is that we will have enough llvm.assumes such that this would not be a problem, though I won't block the metadata because of this. (Maybe it was wrong to start the discussion in this review then)

Tyker added a comment.Mon, Oct 12, 1:41 PM

What about leaving call llvm.assume whenever the load !noundef is moved to somewhere else? I guess this is what LLVM is doing as well.

If --enable-knowledge-retention is set, we should certainly do that if we would otherwise strip the metadata (not only for noundef).

this is not implemented the assume building currently don't look at any metadata.

Sorry for my change in stance.
May I proceed with this patch?

If --enable-knowledge-retention is set, we should certainly do that if we would otherwise strip the metadata (not only for noundef).

Yes, I think so because metadata is still common.

OK, let's go with this for now. All looks good but I don't get one thing, see below.

llvm/docs/LangRef.rst
9279

Why do we need index? and what is index?

aqjune added inline comments.Thu, Oct 15, 4:10 PM
llvm/docs/LangRef.rst
9279

An index is the name of the metadata node, and it is used as a placeholder that represents the existence of !noundef key. This is analogous to what !nonnull does.

define void @f() {
  %v = load i32, i32* [[P:%.*]], align 4, !noundef !0
}
!0 = !{}

To be precise, the first sentence should end with '... {} or a single metadata name <index> ...'. I'll update this sentence as well as !nonnull's sentence.


About the verbosity of this empty metadata node:
The empty node is necessary because metadata is stored as a key-value store (MDAttachmentMap).
This makes textual representation slightly more verbose compared to attaching !noundef without value. To remove these, two workarounds are here:

  1. When parsing metadata attached to instructions, if there is no value, automatically add empty node
  2. Add a special state 'has a key but not value' to MDAttachmentMap's each slot

The first approach causes a discrepancy between IR and bitcode. The second one might be a more viable option, but it breaks invariants that currently holds (e.g., MDAttachmentMap::getAll's returned values are all valid metadata)

aqjune updated this revision to Diff 298513.Thu, Oct 15, 4:29 PM

Update wording of LangRef

aqjune updated this revision to Diff 298518.Thu, Oct 15, 4:43 PM

Undo unnecessary changes

Use 'metadata node' instead of 'metadata name', because metadata node's contents can be written in place
e.g., load %ptr, !range !{i32 0, i32 3}

Use <empty_node> instead of <index> for nonnull/invariant.start/etc

Seems like the word index isn't conveying helpful information, so I renamed it into empty_node

jdoerfert accepted this revision.Fri, Oct 16, 7:27 AM

LGTM. Would be best to split it now. Also one nit below.

llvm/docs/LangRef.rst
9184

The rewrite could be commited as NFC cleanup first. That part really makes things better :)

9278

reference to well defined (IIRC we have a section, right?)

This revision is now accepted and ready to land.Fri, Oct 16, 7:27 AM
aqjune updated this revision to Diff 298802.Fri, Oct 16, 9:45 PM

Remove .rej

This revision was landed with ongoing or failed builds.Fri, Oct 16, 9:50 PM
This revision was automatically updated to reflect the committed changes.
aqjune marked an inline comment as done.Fri, Oct 16, 9:57 PM

Changes in LangRef other than the name rewrite didn't seem very critical, so undid them.

Thanks!

llvm/docs/LangRef.rst
9179

The last !noundef !<empty_node> was accidentally added by 701cf4b5a59c .