This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify the behavior of memory access instructions when pointers/sizes aren't well-defined
ClosedPublic

Authored by aqjune on Sep 20 2020, 3:27 PM.

Details

Summary

This is a patch to LangRef that clarifies the behavior of load/store/memset/memcpy/memmove when the pointers or sizes are not well-defined
as well.

MSan detects a case when e.g., only lower bits of address are garbage when -msan-check-access-address is enabled, and it does not directly conflict with this patch because a C program should not use a pointer with undef bits and reasonable optimizations do not convert a well-defined pointer into a pointer with undef bits.

This patch contains a definition of a well-defined value as well.

Diff Detail

Event Timeline

aqjune created this revision.Sep 20 2020, 3:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2020, 3:27 PM
aqjune requested review of this revision.Sep 20 2020, 3:27 PM
aqjune edited the summary of this revision. (Show Details)Sep 20 2020, 3:28 PM
aqjune updated this revision to Diff 293041.Sep 20 2020, 3:52 PM

Fix typo

aqjune edited the summary of this revision. (Show Details)Sep 20 2020, 3:52 PM
eugenis added inline comments.Sep 21 2020, 3:33 PM
llvm/docs/LangRef.rst
3732

This is a definition of "well-defined" - make it bold and perhaps make sure it appears in the ToC. Searching through the document, this terms appears with different meaning in several places.

Otherwise LGTM

jdoerfert requested changes to this revision.Sep 21 2020, 3:36 PM

I block this until the llvm-dev discussion concludes. From what I can tell, people there seam to suggest this is not the right approach.

This revision now requires changes to proceed.Sep 21 2020, 3:36 PM
aqjune updated this revision to Diff 293347.Sep 21 2020, 11:18 PM

Update the contents to follow the conclusion of the discussion at llvm-dev.

I think the behavior of the sanitizer is still valid.
The logic couldn't be applied to IR because existing dereferenceability checking analyses accept pointers with undef bits.

Made the definition of a well-defined value clearer.
A definition of 'frozen value' is omitted because contexts of their current uses say the meaning straightforwardly.

Memset/memcpy/memmove's len are allowed to have undef bits as well.

aqjune retitled this revision from [LangRef] State that pointers and/or sizes of memory access instructions are well-defined to [LangRef] Clarify the behavior of memory access instructions when pointers/sizes aren't well-defined.Sep 21 2020, 11:27 PM
aqjune edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Sep 23 2020, 10:17 AM
llvm/docs/LangRef.rst
3747

I'm not too happy with the word "never". It implies there can be change and I much rather talk about values with the implicit assumption that it is the dynamic value.

aqjune updated this revision to Diff 293813.Sep 23 2020, 11:30 AM
aqjune marked an inline comment as done.
aqjune edited the summary of this revision. (Show Details)

Update the definition of a well-defined value to show that it is defined per a program execution

jdoerfert added inline comments.Sep 24 2020, 6:50 AM
llvm/docs/LangRef.rst
3747

A constant of a single value, non-vector type is well defined if it is a non-undef constant.

Do we need "single value" at all? When are aggregates different? If so, we need a different name.

The sentence about integer constants should then be removed from the first paragraph.

aqjune added inline comments.Sep 24 2020, 11:57 AM
llvm/docs/LangRef.rst
3747

It is a single value type that is not a vector type - I followed the categorization of a single value type described at LangRef. When it is an aggregate type, it follows the description in the first paragraph.

I think the sentence about integer constants in the first paragraph helps emphasize that a well-defined value is a dynamic concept, but not a strong opinion.

jdoerfert added inline comments.Sep 24 2020, 1:48 PM
llvm/docs/LangRef.rst
3747

Fair. can we link https://llvm.org/docs/LangRef.html#single-value-types ?

I don't see how separating integer constants makes it less confusing, or I don't understand the problem.
When I read it I asked myself why integer constants are different, or if they were not.

aqjune updated this revision to Diff 294208.Sep 24 2020, 5:19 PM

Add a link, remove a sentence

aqjune added inline comments.Sep 24 2020, 5:22 PM
llvm/docs/LangRef.rst
3747

Since it can be interpreted as that an integer constant is special, simply removed it

jdoerfert accepted this revision.Sep 24 2020, 5:37 PM

LGTM, I'm no MSAN expert and can't speak to the interaction, this makes sense to me and matches our llvm-dev discussion. Feel free to add the discussion link in the commit message too.

This revision is now accepted and ready to land.Sep 24 2020, 5:37 PM

MSan will complain about any undefined bits in a pointer.
We have not seen a false positive caused by this so far, but it sounds like we should relax the requirement. Does it make sense for MSan to allow undef in the lower bits up to the known dereferenceable range of the pointer?

MSan will complain about any undefined bits in a pointer.
We have not seen a false positive caused by this so far, but it sounds like we should relax the requirement. Does it make sense for MSan to allow undef in the lower bits up to the known dereferenceable range of the pointer?

I think it depends on whether the frontend languages support nondeterministic accesses.
Optimizations cannot introduce undef bits to the pointers in general, because it can change a program that deterministically accesses memory locations to a program that nondeterministically does it.
if it is illegal for the frontend languages (that MSan is targetting) to generate such pointers, I think the check is still valid.
Also, even if it is not UB, it might be still useful - D86000 detects behavior that isn't undefined but is practically problematic as well, so it might be okay.