This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Make !range metadata imply noundef for load & call results
AbandonedPublic

Authored by mkazantsev on Oct 16 2022, 10:55 PM.

Details

Summary

According to LangRef:

If the loaded or returned value is not in the specified range, the behavior is undefined. 
...
The range should not represent the full or empty set. That is, a!=b.

These statements together mean that the result of load which contains !range metadata
is immediate of UB if undef is loaded, because undef may take the value missing in the
range. It automatically implies it's not poison either.

Diff Detail

Event Timeline

mkazantsev created this revision.Oct 16 2022, 10:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2022, 10:55 PM
mkazantsev requested review of this revision.Oct 16 2022, 10:55 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript

This is correct as specified, but I think there were plans to change !range and !nonnull to return poison, and require an additional !noundef to make them immediate UB, which is the way it works for attributes. I'm not sure what the state of that is though.

This is correct as specified, but I think there were plans to change !range and !nonnull to return poison, and require an additional !noundef to make them immediate UB, which is the way it works for attributes. I'm not sure what the state of that is though.

Do you want me to follow-up with inference of !noundef for loads with such attributes in some pass, or?..

This is correct as specified, but I think there were plans to change !range and !nonnull to return poison, and require an additional !noundef to make them immediate UB, which is the way it works for attributes. I'm not sure what the state of that is though.

FWIW, Alive2 currently models !range as being UB iff !poison && range_is_violated.
This patch proposes to use poison || range_is_violated.
The Alive2 version is bogus in terms of composition of refinement (IPO) as it prevents changing poison to a value that violates the range. But in the past we realized that the proposed (strong UB) semantics didn't work (verification failed for a few optimizations; I don't remember the details).

There's an advantage of keeping poison disentangled, which is that it enables hoisting by dropping just !noundef. I think it's always best to yield poison whenever we can to make movement easier (of loads, calls is less important).

So I agree the best thing to do is to change the semantics, so that a range violation yields poison.

FWIW, there's already a patch for clang to add !noundef to loads by my GSoC student: https://reviews.llvm.org/D134410

This is correct as specified, but I think there were plans to change !range and !nonnull to return poison, and require an additional !noundef to make them immediate UB, which is the way it works for attributes. I'm not sure what the state of that is though.

+1

In some sense, "range implies noundef" is the same thing as "dereferenceable implies noundef". There is no conceptual difference between "integer in specified range" and "pointer in a set of valid pointers". If we do it for dereferenceable pointers, why impede this for ranges?

nikic added a comment.Oct 18 2022, 8:55 AM

In some sense, "range implies noundef" is the same thing as "dereferenceable implies noundef". There is no conceptual difference between "integer in specified range" and "pointer in a set of valid pointers". If we do it for dereferenceable pointers, why impede this for ranges?

The relevant difference in how this metadata gets used: APIs using range metadata generally only prove facts modulo poison. If computeKnownBits() says that the top bits of the value are zero, what this really means is that either the top bits are zero or the value is poison. The same is not true for dereferenceable metadata: This metadata is used to speculate loads, and having a "dereferenceable or poison" pointer would be useless in that context.

In some sense, "range implies noundef" is the same thing as "dereferenceable implies noundef". There is no conceptual difference between "integer in specified range" and "pointer in a set of valid pointers". If we do it for dereferenceable pointers, why impede this for ranges?

The relevant difference in how this metadata gets used: APIs using range metadata generally only prove facts modulo poison. If computeKnownBits() says that the top bits of the value are zero, what this really means is that either the top bits are zero or the value is poison. The same is not true for dereferenceable metadata: This metadata is used to speculate loads, and having a "dereferenceable or poison" pointer would be useless in that context.

I'm still not getting this. The code I'm modifying says "dereferenceable pointers don't have to be marked as noundef, but we can still infer noundef from the fact they are dereferenceable". But at the same time we say "loads with range *have* to be marked as noundef explicitly, and we don't want to imply it". This approach is self-contradictory. If you insist that frontend must mark every load with range as !noundef, I'm fine with that. But in this case let's delete

  I->hasMetadata(LLVMContext::MD_dereferenceable) ||
  I->hasMetadata(LLVMContext::MD_dereferenceable_or_null))
return true;

and demand that they are also marked as !noundef explicitly. It should be same for them.

mkazantsev abandoned this revision.Dec 20 2022, 11:04 PM