This is an archive of the discontinued LLVM Phabricator instance.

[ptr_provenance] Introduce optional ptr_provenance operand to load/store
AcceptedPublic

Authored by jeroen.dobbelaere on Jun 14 2021, 2:53 PM.

Details

Summary

This patch introduces an optional ptr_provenance operand to LoadInst and the StoreInst.

This allows to separate the provenance of the pointer operand from the value of the pointer operand.

This is needed by the full restrict patched (D68484), but will be also useful to resolve problems with
optimizations that, by replacing the pointer operand with an equivalent computation, also happen to
replace its provenance. At the same time, it can speed up some alias analysis phases, as, for the
ptr_provenance path, we can skip most of the computations.

Notes:

  • This patch corresponds to D68488 from the original Full Restrict series.
  • For the specification, a UnknownProvenance provenance can be treated as indicating that it can correspond to any object.
  • For the Load and Store instructions, the absence of the ptr_provenance operand is an indication the we should look at the pointer operand to track the provenance.
  • When cloning a load/store instruction that has a separate ptr_provenance, the ptr_provenance of the clone is set to 'UnknownProvenance'. When that is not wanted, it must be modified explicitely.
    • Note: this results in a different behavior when the original instruction has a ptr_provenance that is identical to the ptr, vs an instruction without the ptr_provenance.

Diff Detail

Event Timeline

jeroen.dobbelaere requested review of this revision.Jun 14 2021, 2:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 2:53 PM

I also plan to add patches for:

  • fixing the handling of load/store instructions in various places where a single (for load) or double operands (for store) are assumed.
  • introduction of an intrinsic that combines the pointer provenance with the pointer value (Corresponding to the llvm.noalias.arg.guard in D68487)
  • Rebased
  • Use a ConstantPointerNull when cloning a load/store instruction with a ptr_provenance.
jeroen.dobbelaere edited the summary of this revision. (Show Details)
  • rebase
  • make use of UnknownProvenance
jeroen.dobbelaere edited reviewers, added: asbirlea; removed: nikic.

LangRef update. Refer to `unknown_provenance` now.

Adding a few items/questions discussed in the open meeting:

  • Clarified item: If a load/store has 2/3 operands out of which only 1/2 are used, these operands are stored in the last position(s). So when adding provenance, the current operands need to be shifted to the first position(s) and the provenance added as the last operand.
  • Discussion on why is cloning a load/store setting the ptr_provenance to unknown_provenance? When cloning a load/store instruction, the cloned instruction needs to have the same number of operands. Otherwise crashes were observed in some passes (example given: LoopVectorization). Why can't the number of operands remain the same, but set the ptr_provenance to the (otherwise default) ptr_operand?

@jeroen.dobbelaere: Could you describe using unknown for the cloning question for future reference?

  • Clarified item: If a load/store has 2/3 operands out of which only 1/2 are used, these operands are stored in the last position(s). So when adding provenance, the current operands need to be shifted to the first position(s) and the provenance added as the last operand.

There are a number of implementations possible for adding an optional argument. They all share that at some moment, they need to know the actual
number of arguments in use. Depending on the implementation, those moments can differ.

It is also important to know that the operands are stored _before_ the this pointer. (See implementation of getIntrusiveOperands())
In order to get to the location of the first operand, we subtract NumUserOperands from this.

The original (obsolete) implementation (variant 0) always reserves the optional argument. The reserved amount of operands is stored in NumUserOperands,
and an extra bit 'NumUserOperandsDelta' was used when we need to know the actual number of operands.

The main issue with variant 0, is that it introduces and extra bit extract + addition in a number of common operations. Not just for load/store instructions, but for all instructions.
This has a measurable compile time impact :(

The split-up compile time impact (and patches) of variant0 can be found here:
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&branch=dobbelaj-snps/perf/test_instructions_20200907-01

So, I came up with 2 other variants that can be seen here:
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&branch=dobbelaj-snps/perf/test_instructions_20200907-02

variant 1: This variant also always reserves the optional argument. But now, NumUserOperands tracks the actual number of operands in use.
As getIntrusiveOperands() uses NumUserOperands() to find back the location of the first operand, we now have a problem when introducing (or removing)
the ptr_provenance operand: when changing NumUserOperands, suddenly all operands are shifted. Because of this behavior, the code for
adding (removing) the operand is slightly more complex, as it is also moving around the operands to compensate for the shifting behavior.

The big benefit is that we now only pay this penalty when using the new operand. So, variant 1 only has a minor compile time impact.

Then I also tried out variant 2. This is similar to variant 1, but they introduce a separate version of Load (and Store) with or without memory
space for the optional argument. The main benefit is that, if the ptr_provenance is never used, no space is reserved for it. But, the initial switching from
a 'standard' Load/Store to a 'Load/Store with optional ptr_provance support' is a lot higher and maintaining the correct usage of the instructions is also
more complex, which tend to result in more subtle errors.

Looking at the compile time and heap impact, and the potential maintenance issues, I settled on using variant 1, which is also the version implemented in this review.

  • Discussion on why is cloning a load/store setting the ptr_provenance to unknown_provenance? When cloning a load/store instruction, the cloned instruction needs to have the same number of operands. Otherwise crashes were observed in some passes (example given: LoopVectorization). Why can't the number of operands remain the same, but set the ptr_provenance to the (otherwise default) ptr_operand?

@jeroen.dobbelaere: Could you describe using unknown for the cloning question for future reference?

One of the goals when introducing the optional parameter, was to reduce the impact on existing code/optimizations as much as possible. When cloning the instruction there are 5 possible cases to be considered:

  1. omit (=do not clone) the ptr_provenance operand. This resulted in problems, as now, the number of operands between the original and the cloned instruction can differ.
  2. clone the ptr_provenance operand when available. This resulted in dominance violations for some optimization passes that are not aware of the provenance operand.
  3. When available, replace the ptr_provenance with the 'pointer' operand in the clone. This results in an extra use of the pointer operand, potentially resulting in worse code for optimizations depending on 'hasOneUse()'.
  4. When available, replace the ptr_provenance with undef in the clone. (Special value for the original Full Restrict)
  5. When available, replace the ptr_provenance with unknown_provenance in the clone.

In the original full restrict patches, (4) is done, as this (together with not copying !noalias metadata) resulted in the safest approach. A undef ptr_provenance is treated there as: look at the pointer operand for the provenance, aka similar to omitting the ptr_provenance, without actually doing it (1), and also similar to (3), but without introducing an extra 'use' on the pointer operand.

When preparing this part of the infrastructure, I wanted to take into account a more generic usage of the ptr_provenance infrastructure. For that, a special constant unknown_provenance is introduced. This indicates that, for the cloned access, we don't know the ptr_provenance (and associated noalias annotations). We can also not depend on the pointer operand for the provenance, as the pointer value can have a different provenance (!). Only when the optimization pass that is introducing the clone is aware of the ptr_provenance operand, the right decision can be taken, which can result in better code. So, (5) is the safe approach with minimal impact on existing code.

Thank you for all the clarifications. I'm convinced this is the right way to move forward at this point.

To add a note from offline discussions: the correct options when cloning are either unknown provenance, like in this patch, where it's possible for some optimizations to be missed, or cloning the ptr_provenance and resolving the dominance (and other?) issues in all passes (that move instructions, but probably not limited to these).
It is not correct to use the ptr_operand: with ptr_provenance, the issue described in https://bugs.llvm.org/show_bug.cgi?id=34548#c93 would be resolved, but not if a clone would go back to using the ptr_operand.

nikic added a subscriber: nikic.Jan 24 2022, 10:09 AM

It's been a while since I looked at this, so please excuse the stupid question...

Why do we model provenance as an additional value on the load/store? Naively, I would think that it would be preferable to have an intrinsic like p' = llvm.with.provenance(p, provenance), because that would work with any use of the pointer -- the ptr_provenance operand is limited to loads and stores, but not with memory-accessing calls or captured pointers, and I don't think it can really be extended in that direction.

It's been a while since I looked at this, so please excuse the stupid question...

Why do we model provenance as an additional value on the load/store? Naively, I would think that it would be preferable to have an intrinsic like p' = llvm.with.provenance(p, provenance), because that would work with any use of the pointer -- the ptr_provenance operand is limited to loads and stores, but not with memory-accessing calls or captured pointers, and I don't think it can really be extended in that direction.

Hi Nikita,

my talk on the 2021 LLVM Dev Conference explains this. See https://www.youtube.com/watch?v=08XwXB3GHck 2021 LLVM Dev Mtg “ptr_provenance and @llvm.noalias: The Tale of Full Restrict”

In short:

  • there is the llvm.with.provenance(p, provenance) equivalent intrinsic: it is called llvm.experimental.ptr.provenance (see D107355)
  • load/store instructions are handled differently, mostly to make the process more efficient and to make the new support easier to integrate with existing optimization passes. Those that just replace the pointer computation won't influence the provenance (once that is set).
nikic added a comment.Jan 25 2022, 1:29 AM

It's been a while since I looked at this, so please excuse the stupid question...

Why do we model provenance as an additional value on the load/store? Naively, I would think that it would be preferable to have an intrinsic like p' = llvm.with.provenance(p, provenance), because that would work with any use of the pointer -- the ptr_provenance operand is limited to loads and stores, but not with memory-accessing calls or captured pointers, and I don't think it can really be extended in that direction.

Hi Nikita,

my talk on the 2021 LLVM Dev Conference explains this. See https://www.youtube.com/watch?v=08XwXB3GHck 2021 LLVM Dev Mtg “ptr_provenance and @llvm.noalias: The Tale of Full Restrict”

In short:

  • there is the llvm.with.provenance(p, provenance) equivalent intrinsic: it is called llvm.experimental.ptr.provenance (see D107355)
  • load/store instructions are handled differently, mostly to make the process more efficient and to make the new support easier to integrate with existing optimization passes. Those that just replace the pointer computation won't influence the provenance (once that is set).

Thanks! So the ptr_provenance is just an optimized representation of the intrinsic, that makes sense.

Have you considered inverting this patch stack and landing the intrinsic first, as that part seems more straightforward than the IR change? I have a suspicion that actually supporting independent address + provenance will require some significant work for various users of getUnderlyingObject(), which may not account for the possibility of address and provenance being different. Updating all users seems like something we can do using just the intrinsic.

Have you considered inverting this patch stack and landing the intrinsic first, as that part seems more straightforward than the IR change? I have a suspicion that actually supporting independent address + provenance will require some significant work for various users of getUnderlyingObject(), which may not account for the possibility of address and provenance being different. Updating all users seems like something we can do using just the intrinsic.

It should be possible, but I am not sure it makes a lot of sense: Looking at the big picture, this proposed set of patches for providing ptr_provenance is a logical next step. My intention is that these patches only go in once all of them have been reviewed and accepted. At that time the basic infrastructure is available for the full restrict patches, which means that the next steps will start making use of this basic infrastructure.
The current users of the 'GetUnderlyingObject' can keep their behavior (for now), being: look for the underlying object following the computational path of the pointer. With more actual usage of the ptr_provenance, some of those GetUnderlyingObject users can switch to the provenance path, which should result in a speedup.
The more tricky changes are related to some of the bugs where the provenance of pointers gets switched and to N2676. Once the base infrastructure is there, we can also fix those.

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 7:00 AM

Ping - Anyone who can help reviewing this ?

jdoerfert accepted this revision.Dec 20 2022, 12:00 PM

LG overall. Comments seem to be fine with this as well. Some nits on the style and order, can be addressed before the merge.

llvm/docs/LangRef.rst
10308

ptr_provenance is after syncscope in the ASM printer, also for loads. I would put ptr_provenance in the end honestly.

llvm/include/llvm/IR/Instructions.h
305

No else after return

465–466

Nit: No else after return.

llvm/lib/IR/Instructions.cpp
1611
1705
This revision is now accepted and ready to land.Dec 20 2022, 12:00 PM
RalfJung added inline comments.
llvm/docs/LangRef.rst
10196–10199

Rust and C also have a notion of "pointer provenance", but it is subtly difference: provenance in those languages is something that flows with pointer values. IOW, a value of pointer type consists of some address in memory, plus some "provenance" metadata. LLVM also has that kind of provenance, it is needed e.g. to explain some of the behavior of getelementptr. (A pointer returned by getelementptr without inbounds can go out-of-bounds but must not be used to access memory outside the bounds of the allocation it started with. In other words, the pointer "remembers" the allocation it is associated with in some way that is separate from the integer address it points to.)

Is it a good idea to use the same term for this slightly different concept? A load operation already receives provenance from its pointer argument, and now with this patchset it *also* receives something else, but also called provenance, via a separate argument.

RalfJung added inline comments.Dec 21 2022, 1:41 PM
llvm/docs/LangRef.rst
646–648

This is confusing, what does it mean that "computations can be omitted"? Computations *that do not alter provenance* can be omitted, sure -- for example getelementptr. But other operations could affect provenance and those are still relevant. (E.g., passing a pointer to a function as a noalias argument gives it a fresh distinct provenance, so such a computation/operation cannot be omitted even on the provenance path.)

Provenance is an inherent part of a value of pointer type, as far as correctness arguments for LLVM analyses and transformations go. Trying to treat it as any less real than the directly observable bits and bytes will only lead to trouble such as https://github.com/llvm/llvm-project/issues/34577.

652–653

In particular this sounds just wrong, if ptr_provenance is merely sugar for a "combine address with provenance" intrinsic: if ptr_provenance is present, it would be a bug for alias analysis to make any conclusions based on the provenance of the pointer argument.

10196–10199

(replying to my own comment after realizing there was prior discussion on that topic here -- sorry, I am completely lost in all this "patch stack" business)

So the ptr_provenance is just an optimized representation of the intrinsic, that makes sense.

Okay so sounds like this is supposed to completely override the provenance that comes with the pointer itself? Does that mean that it is now legal to do something like

  • use getelementptr %ptr1 to compute some %ptr1a that is address-equal to some other %ptr2
  • load %ptr1a ptr_provenance %ptr2_provenance

This would usually be UB if ptr2 points to a different allocation, since ptr1a is "based on" the wrong allocation.

In other words, when ptr_provenance is present, only the address of the pointer argument matters, and for everything else (including things like which allocation this pointer was "derived from"), only ptr_provenance matters?

Hi Ralph,

thanks for the feedback !

Greetings,

Jeroen

llvm/docs/LangRef.rst
646–648

This is confusing, what does it mean that "computations can be omitted"?

This should indeed be: Computations that **do not change provenance** can be omitted.

652–653

Hmm. The wording seems to allow a misinterpretation. You cannot choose if the 'provenance' is that of the pointer computation or if it is the decoupled one. If the provenance is decoupled, that is what you need to use.

10196–10199

In other words, when ptr_provenance is present, only the address of the pointer argument matters, and for everything else (including things like which allocation this pointer was "derived from"), only ptr_provenance matters?

Yes. That is indeed the case.

Note that this is an extension from the original goal of the ptr_provenance. The original goal was to provide a path where restrict annotations were added so that those would not clobber the pointer value. In that world, the pointer value and ptr_provenance would come together again at some point. The extension allows to completely separate the pointer value and the ptr_provenance. This provides a solution to handle this kind of optimizations in a correct way.

RalfJung added inline comments.Dec 28 2022, 8:14 AM
llvm/docs/LangRef.rst
652–653

Okay, that makes sense. So the documentation should then clearly state that in the presence of ptr_provenance, alias analysis *must not* make any conclusions by checking that the ptr argument is "based on" -- only the integer address of that pointer must be used, no other attached information (such as noalias/getelementptr-induced restrictions).

It is hard to even precisely discuss this without documenting in more detail what the provenance model of LLVM is: that every value of pointer type comes with some extra "ghost state" attached to it, which is how restrictions such as "getelementptr stays in the bounds of its allocations" and noalias flow through the program; that that "ghost state" is preserved when the pointer is itself stored to memory and loaded back out of memory; that alias analysis uses this "ghost state" to determine that a certain load/store cannot alias with another. I think documenting this better will also help a lot to avoid issues that the lack of precision around pointer provenance has caused in the past. (This immediately raises some new questions, such as how that provenance of a pointer stored in memory interacts with low-level byte-manipulating memory accesses that read or write some or all of the bytes of that pointer. Over time the LangRef should give answers to all these questions, but even acknowledging these issues would already be a great step forward. I'm happy to share the state of that discussion in Rust in case people are interested; we thought about this quite a bit.)

Rebased to ef545ef62a833152d8975ff16333b57cc41befcc (Jan 9, 2023)

NOTE: still need to update to recent comments.