This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Add `noundef` attribute to documentation
ClosedPublic

Authored by guiand on Jun 22 2020, 10:03 AM.

Details

Summary

[LangRef] Introduce noundef attribute for fully-defined function params

LLVM currently does not require function parameters or return values
to be fully initialized, and does not care if they are poison. This can
be useful if the frontend ABI makes no such demands, but may prevent
helpful backend transformations in case they do. Specifically, the C
and C++ languages require all scalar function operands to be fully
determined.

Introducing this attribute is of particular use to MemorySanitizer
today, although other transformations may benefit from it as well.
We can modify MemorySanitizer instrumentation to provide modest (17%)
space savings where frozen is present.

This commit only adds the attribute to the Language Reference, and
the actual implementation of the attribute will follow in a separate
commit.

Diff Detail

Event Timeline

guiand created this revision.Jun 22 2020, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2020, 10:03 AM
nikic added a subscriber: nikic.Jun 22 2020, 10:34 AM
nikic added inline comments.Jun 22 2020, 1:59 PM
llvm/docs/LangRef.rst
1255

Should be just frozen, without the quotes. Quotes indicate a string attribute.

1259

This description is a bit too frontend-centric. I'd suggest to at least add

If the parameter is poison or contains undefined bits, the behavior is undefined.

to describe the actual IR semantic this has.

Also, can this be applied to return values? If so, it should say so.

Thanks for splitting this off!

I'm still in favor of nopoison as a name but so far no one else supports that name it seems. At least I would prefer that name if this means "not poison or instant UB".
Now, the documentation talks about undefined bits, which is something different. Maybe frozen makes sense after all.


As mentioned by @nikic, we should not talk about ABI, frontends, backends, etc. in the lang ref if not strictly necessary. I think describing the attribute as the absence of undefined or poison bits in the value representation is what we want :)

llvm/docs/LangRef.rst
3666

Can we make this two bullet points.
Return values are not really operands so maybe: "The operand of a return instruction if the function or invoking call site has a `frozen` attribute in the return value position".
I would also not talk about Arguments here as we describe instructions. So maybe say "The operand of a :ref:call <i_call> or :ref:invoke <i_invoke> instruction for which the call site has a `frozen` attribute at the respective position."

How about "noundef" for a name? I'm not a fan of names that start with "no", but "frozen" is not so obvious at a glance.

aqjune added a comment.EditedJun 22 2020, 4:26 PM

noundef seems better to me, as it means the value can't be poison as well (because a poison value can change to undef).
I'm fine with the suggestion as far as it is added to LangRef that the value is frozen if noundef is attached and it isn't an aggregate type with padding bits (if definition is changed as I suggested).

llvm/docs/LangRef.rst
1259

For aggregate types, I think it is okay to relax this constraint to allowing aggregates with undef padding bits. It will be great if an aggregate value that is constructed with a chain of v = insertvalue (insertvalue undef, const1, idx), const2, idx2 can have this attribute attached regardless of whether the aggregate type has paddings or not. Inspecting padding bits which are not accessible without using memory operation may be overly restrictive.
For my previous concern (giving a guarantee that it has no undef bit in memory representation), this should be needed in certain future cases, but not now maybe.

eugenis added inline comments.Jun 22 2020, 5:01 PM
llvm/docs/LangRef.rst
1259

Yes, that's a good point. From MSan point of view, if {i8, i32} is passed as a call argument, we don't care about the padding in the middle - the type definition has all we need to understand which bits must be checked for definedness, and we'd like the attribute to reflect that - i.e. the property of the IR value itself and not its storage representation.

(if it's coerced to i64, then we care).

bbn added a subscriber: bbn.Jun 22 2020, 7:46 PM
uenoku added a subscriber: uenoku.Jun 22 2020, 11:18 PM
nlopes added a subscriber: nlopes.Jun 23 2020, 7:07 AM
nlopes added inline comments.
llvm/docs/LangRef.rst
1260

What happens when this condition is violated? e.g., when one passes a poison value as argument? Do we get UB? (I think so, but it's worth to make it clear)

jdoerfert added inline comments.Jun 23 2020, 7:57 AM
llvm/docs/LangRef.rst
1259

Does this mean we still need more than nopoison?

1260

Yes, this should be one of the only "value attribute" that will cause UB. The others should poison the value on violation and combined with this one you get UB.

eugenis added inline comments.Jun 23 2020, 12:31 PM
llvm/docs/LangRef.rst
1259

Sorry, I don't understand - are saying the attribute can be specified in terms of poison values and not undef? I don't think it would work, we need it to cover something that's loaded from an uninitialized memory location and passed to a function call as is, and those are undef, not poison.

aqjune added inline comments.Jun 24 2020, 12:18 AM
llvm/docs/LangRef.rst
1259

Yes, we need an attribute for undef bits; a constraint for absence of poison may not be enough.
In the long term, I'd prefer merging the notion of undef value poison value into one (by leaving poison only), then nopoison should be enough. But seems like it is a pretty tough way to go.

noundef seems better to me, as it means the value can't be poison as well (because a poison value can change to undef).

+1

Other comments in this patch still need to be addressed as well.

guiand updated this revision to Diff 273427.Jun 25 2020, 10:12 AM
guiand retitled this revision from [LangRef] Add `frozen` attribute to documentation to [LangRef] Add `noundef` attribute to documentation.
guiand edited the summary of this revision. (Show Details)

Changed attribute name from frozen to noundef, addressed comments

guiand marked 5 inline comments as done.Jun 25 2020, 10:13 AM
arsenm added a subscriber: arsenm.Jun 25 2020, 10:17 AM
arsenm added inline comments.
llvm/docs/BitCodeFormat.rst
1061

Can you commit this documentation change for the existing attributes separately

nlopes added inline comments.Jun 25 2020, 11:36 AM
llvm/docs/LangRef.rst
1257

can we write undef instead of uninitialized pls? LLVM doesn't really have uninitialized values.
Otherwise LGTM.

nikic added inline comments.Jun 25 2020, 12:54 PM
llvm/docs/LangRef.rst
1258

I don't understand what "inferable" is intended to mean in this context.

guiand marked an inline comment as done.Jun 25 2020, 1:15 PM
guiand added inline comments.
llvm/docs/LangRef.rst
1258

Quoting from other comments here:

For aggregate types, I think it is okay to relax this constraint to allowing aggregates with undef padding bits. It will be great if an aggregate value that is constructed with a chain of v = insertvalue (insertvalue undef, const1, idx), const2, idx2 can have this attribute attached regardless of whether the aggregate type has paddings or not. Inspecting padding bits which are not accessible without using memory operation may be overly restrictive.

From MSan point of view, if {i8, i32} is passed as a call argument, we don't care about the padding in the middle - the type definition has all we need to understand which bits must be checked for definedness, and we'd like the attribute to reflect that - i.e. the property of the IR value itself and not its storage representation.

This is what I tried to capture with this description. Is there some better way I might express this succinctly?

efriedma added inline comments.Jun 25 2020, 2:24 PM
llvm/docs/LangRef.rst
1258

Not sure we have any existing terminology to describe this concept.

C++ calls these bits "pading bits" (vs. the "value representation"). Maybe we need a section in LangRef to describe this. Probably doesn't belong specifically in the discussion of the meaning of noundef, though, since it applies more generally. Maybe add some description in the definition of struct/array types, and link to it from here.

nikic added inline comments.Jun 25 2020, 11:59 PM
llvm/docs/LangRef.rst
1258

I think if you replace "inferable" with "inaccessible" or "unobservable", that would be more obvious. At least, I don't think there is any way to access/observe these padding bits or even their existence from LLVM IR (without a pre-existing pointer indirection).

I think, we could even remove the padding sentence and say "undefined or poison" bits. The latter seems to match our existing language. The former should be OK because a value of type { i8, i32 } has 8 + 32 bits. The fact that the store and alloca size is different does not mean the value has undefined bits. If people agree with this interpretation we could also add a sentence explaining this difference.

I think, we could even remove the padding sentence and say "undefined or poison" bits. The latter seems to match our existing language. The former should be OK because a value of type { i8, i32 } has 8 + 32 bits. The fact that the store and alloca size is different does not mean the value has undefined bits. If people agree with this interpretation we could also add a sentence explaining this difference.

That matches my understanding as well.

We could also add a note that value bits here explicitly do NOT mean storage representation to avoid any misunderstanding.

guiand updated this revision to Diff 273869.Jun 26 2020, 5:30 PM

Addressed comments

guiand marked 6 inline comments as done.Jun 26 2020, 5:33 PM
nlopes accepted this revision.Jun 27 2020, 1:21 AM
This revision is now accepted and ready to land.Jun 27 2020, 1:21 AM
jdoerfert accepted this revision.Jun 28 2020, 10:46 AM

LGTM. Thanks for working on this. This was needed for a long while and I think the new wording is proper.

reverse ping. Let's land this :)

guiand added a comment.Jul 7 2020, 5:44 PM

Should I land this without waiting for the other two patches?

Should I land this without waiting for the other two patches?

Either way works. Don't want us to forget to land it ;)

nikic added a comment.Jul 8 2020, 9:59 AM

Should I land this without waiting for the other two patches?

I'd suggest to land this together with the LLVM part of D81678 (which LGTM) and land the clang part that actually starts using this and requires the massive test changes separately.

This revision was automatically updated to reflect the committed changes.