This is an archive of the discontinued LLVM Phabricator instance.

[AA] Split up LocationSize::unknown()
ClosedPublic

Authored by nikic on Nov 17 2020, 11:20 AM.

Details

Summary

Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way.

Diff Detail

Event Timeline

nikic created this revision.Nov 17 2020, 11:20 AM
nikic requested review of this revision.Nov 17 2020, 11:20 AM
  • LocationSize::unknownMaybeNegative() seems somewhat ambiguous in isolation; unless I see the other name or have the context from this patch. I'm either keep LocationSize::unknown or make it even more verbose LocationSize::unknownPositiveOrNegative(). I think the most restrictive case ( LocationSize::unknownMaybeNegative() or renamed version) should be the default ~uint64_t(0), but don't feel strongly about it.
  • clang-format
llvm/include/llvm/Analysis/MemoryLocation.h
284

Default to the most generic? unknownMaybeNegative

llvm/lib/Analysis/BasicAliasAnalysis.cpp
1350

Could you clarify the TODO?

1601

This is not unknownMaybeNegative() because of the above check !cast<ConstantInt>(PVGEP->idx_begin())->isNegative() ?

I would set this to unknownMaybeNegative() and leave the TODO.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
521

For checks in loops using AST, I'd use LocationSize::unknownMaybeNegative().

The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way.

I haven't looked at the patch yet but I agree with this.

  • LocationSize::unknownMaybeNegative() seems somewhat ambiguous in isolation; unless I see the other name or have the context from this patch. I'm either keep LocationSize::unknown or make it even more verbose LocationSize::unknownPositiveOrNegative().

I'd like to not reuse the existing unknown() name, to force all users (including out of tree users) to make a decision one way or another. Not particularly attached to the names though. We could also drop the "unknown" prefix and make them LocationSize::nonNegative() and LocationSize::positiveOrNegative() for example.

Generally I'm not really fond of "negative" sizes as a concept, it just seemed like the most concise way to explain it. It would be more accurate to say that the size is always positive, but there may be a negative offset. So maybe LocationSize::nonNegative(), LocationSize::withNonNegativeOffset() and LocationSize::withAnyOffset(), where the first two are aliases but can be used according to desired semantics. Can't say I really like this either though.

Or maybe LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer().

  • LocationSize::unknownMaybeNegative() seems somewhat ambiguous in isolation; unless I see the other name or have the context from this patch. I'm either keep LocationSize::unknown or make it even more verbose LocationSize::unknownPositiveOrNegative().

I'd like to not reuse the existing unknown() name, to force all users (including out of tree users) to make a decision one way or another. Not particularly attached to the names though. We could also drop the "unknown" prefix and make them LocationSize::nonNegative() and LocationSize::positiveOrNegative() for example.

Generally I'm not really fond of "negative" sizes as a concept, it just seemed like the most concise way to explain it. It would be more accurate to say that the size is always positive, but there may be a negative offset. So maybe LocationSize::nonNegative(), LocationSize::withNonNegativeOffset() and LocationSize::withAnyOffset(), where the first two are aliases but can be used according to desired semantics. Can't say I really like this either though.

Or maybe LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer().

Either of the options: LocationSize::withNonNegativeOffset() and LocationSize::withAnyOffset(), or LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()` are better IMO.
I don't have strong opinions either way as long as they're reasonably self explanatory, and these meet that criteria.

nikic updated this revision to Diff 306526.Nov 19 2020, 1:48 PM

Rebase, address review.

Add MemoryLocation::getBasedOn() and MemoryLocation::getStartingFrom() helpers.

Still unsure about the naming for the LocationSize method.

nikic added inline comments.Nov 19 2020, 1:53 PM
llvm/include/llvm/Analysis/MemoryLocation.h
284

I have dropped this default entirely in https://github.com/llvm/llvm-project/commit/393b9e9db31a3f83bc8b813ee24b56bc8ed93a49. We should not assume one way or another.

llvm/lib/Analysis/BasicAliasAnalysis.cpp
1350

This TODO is for D91482. I've just dropped the TODO entirely as I committed the hasValue() usage separately.

1601

That's right, I've expanded the TODO a bit. I think we should leave this at unknownNonNegative() for now, as changing it to unknownMaybeNegative() might technically regress results, so it seems like something to evaluate separately.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
521

From cursory inspection, I didn't see any obvious requirement that the loads be increasing, so I agree that this should use unknownMaybeNegative().

nikic updated this revision to Diff 306534.Nov 19 2020, 2:13 PM

Rename to LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer().

nikic updated this revision to Diff 306544.Nov 19 2020, 2:58 PM
nikic edited the summary of this revision. (Show Details)

Rename the MemoryLocation helpers as well. With the new LocationSize naming, it makes sense to call these MemoryLocation::getAfter() and MemoryLocation::getBeforeOrAfter().

nikic updated this revision to Diff 306719.Nov 20 2020, 9:50 AM

Accept AATags in MemoryLocation::getAfter/getBeforeOrAfter helpers, which makes them usable in more cases.

One more comment, the others look good.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
290

getBeforeOrAfter?

nikic added inline comments.Nov 24 2020, 12:31 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
290

If I'm understanding the context here right, this function only deals with accesses for which hasAnalyzableMemoryWrite() returns true, i.e. strcpy, strncpy, strcat and strncat in this branch. These functions don't access memory before the passed pointer.

asbirlea accepted this revision.Nov 24 2020, 3:03 PM

lgtm.
Please wait in case fhahn@ has additional comments.

This revision is now accepted and ready to land.Nov 24 2020, 3:03 PM
fhahn accepted this revision.Nov 25 2020, 3:36 AM

This looks like a great improvement, thanks!

llvm/lib/Analysis/BasicAliasAnalysis.cpp
1742

nit: may be before instead of may be below, as the function is called mayBeBeforePoitner?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
290

yes, only known TLI functions should reach here.

This revision was automatically updated to reflect the committed changes.