This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Inbounds does not imply nsw
ClosedPublic

Authored by nikic on Nov 2 2020, 12:05 PM.

Details

Summary

The more precise known bits analysis for GEPs introduced in D86364 assumes that inbounds implies nsw for the additions. This is not the case, as the base pointer is an unsigned value.

I was not able to come up with a test case where this actually makes a difference, because KnownBits::computeMul() is too imprecise (doing something like adding a non-negative base and a non-negative offset fails because multiplication by 1 loses the non-negativity information.)

Diff Detail

Event Timeline

nikic created this revision.Nov 2 2020, 12:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2020, 12:05 PM
nikic requested review of this revision.Nov 2 2020, 12:05 PM
nikic retitled this revision from [ValueTracking] Inbounds goes not imply nsw to [ValueTracking] Inbounds does not imply nsw.
aqjune added a comment.Nov 2 2020, 1:07 PM

The change makes sense to me; inbounds doesn't guarantee that the base address + offset is less than the maximum of a signed integer value.

lebedev.ri added a subscriber: lebedev.ri.

Please can you explain what problem is this solving?
Either this is missing words "however if we'd add all offsets together, and then add the final offset to the base pointer, that addition is NSW",
or this is inconsistent with langref/D68342.

nikic added a comment.Nov 2 2020, 1:38 PM

Please can you explain what problem is this solving?
Either this is missing words "however if we'd add all offsets together, and then add the final offset to the base pointer, that addition is NSW",
or this is inconsistent with langref/D68342.

Other way around. The offset multiplications/additions are all nsw, but the final addition to the base is not. If %base = 0x7fffffff and %offset = 1, then gep inbounds %base, %offset is perfectly legal (assuming an allocated object at that location), despite wrapping the signed space. In terms of wrapping behavior of the base addition, "inbounds" implies "nusw" which (in the general case) implies neither nsw nor nuw.

After actually re-reading D68342, i agree.
Does accumulating the offset and then applying the total offset lead to precision loss?

llvm/lib/Analysis/ValueTracking.cpp
1349

Presumably NSW should be here?

nikic added inline comments.Nov 2 2020, 2:09 PM
llvm/lib/Analysis/ValueTracking.cpp
1349

It would be legal to use here, yes, but this API does not accept nowrap flags.

nlopes added a comment.Nov 2 2020, 2:28 PM

FWIW, here's a related bug (fixed already): https://bugs.llvm.org/show_bug.cgi?id=42699

nikic added a comment.Nov 2 2020, 3:16 PM

Hm, I think we need to clarify this in LangRef. We definitely assume this interpretation (unsigned base and signed offset) in some places (e.g. https://github.com/llvm/llvm-project/blob/c938b4a1ed43f3075155e16a7c2792ca8c122258/llvm/lib/Analysis/ScalarEvolution.cpp#L5061-L5072 and I'm pretty sure I've seen it elsewhere as well), but LangRef is really not clear on this point. It's also not completely obvious where the assumption that the pointer address space is unsigned comes from. E.g. on x86-64 the canonical address space is signed (but I don't know about other architectures). We need to clarify whether having an allocated object at [0xffffffff, 0x00000001] is legal (signed address space), [0x7fffffff, 0x80000001] is legal (unsigned address space) or both.

nlopes added a comment.Nov 3 2020, 5:31 AM

Hm, I think we need to clarify this in LangRef. We definitely assume this interpretation (unsigned base and signed offset) in some places (e.g. https://github.com/llvm/llvm-project/blob/c938b4a1ed43f3075155e16a7c2792ca8c122258/llvm/lib/Analysis/ScalarEvolution.cpp#L5061-L5072 and I'm pretty sure I've seen it elsewhere as well), but LangRef is really not clear on this point. It's also not completely obvious where the assumption that the pointer address space is unsigned comes from. E.g. on x86-64 the canonical address space is signed (but I don't know about other architectures). We need to clarify whether having an allocated object at [0xffffffff, 0x00000001] is legal (signed address space), [0x7fffffff, 0x80000001] is legal (unsigned address space) or both.

Agreed. One thing that seems certain (though not mentioned in LangRef) is that a single allocation can only use up to half of the address space. Otherwise can't have positive and negative offsets; we need 1 bit for the sign.
For the issue you mention, we need a little survey of the hardware out there to understand if there's commonality or not (I've no clue).

qcolombet accepted this revision.Nov 4 2020, 9:47 AM
This revision is now accepted and ready to land.Nov 4 2020, 9:47 AM
This revision was landed with ongoing or failed builds.Nov 13 2020, 8:58 AM
This revision was automatically updated to reflect the committed changes.