Page MenuHomePhabricator

[ptr_provenance] Introduce optional ptr_provenance operand to load/store
Needs ReviewPublic

Authored by jeroen.dobbelaere on Jun 14 2021, 2:53 PM.

Details

Summary

This patch introduces an optional ptr_provenance operand to LoadInst and the StoreInst.

This allows to separate the provenance of the pointer operand from the value of the pointer operand.

This is needed by the full restrict patched (D68484), but will be also useful to resolve problems with
optimizations that, by replacing the pointer operand with an equivalent computation, also happen to
replace its provenance. At the same time, it can speed up some alias analysis phases, as, for the
ptr_provenance path, we can skip most of the computations.

Notes:

  • This patch corresponds to D68488 from the original Full Restrict series.
  • For the specification, a UnknownProvenance provenance can be treated as indicating that it can correspond to any object.
  • For the Load and Store instructions, the absence of the ptr_provenance operand is an indication the we should look at the pointer operand to track the provenance.
  • When cloning a load/store instruction that has a separate ptr_provenance, the ptr_provenance of the clone is set to 'UnknownProvenance'. When that is not wanted, it must be modified explicitely.
    • Note: this results in a different behavior when the original instruction has a ptr_provenance that is identical to the ptr, vs an instruction without the ptr_provenance.

Diff Detail

Event Timeline

jeroen.dobbelaere requested review of this revision.Jun 14 2021, 2:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 2:53 PM

I also plan to add patches for:

  • fixing the handling of load/store instructions in various places where a single (for load) or double operands (for store) are assumed.
  • introduction of an intrinsic that combines the pointer provenance with the pointer value (Corresponding to the llvm.noalias.arg.guard in D68487)
  • Rebased
  • Use a ConstantPointerNull when cloning a load/store instruction with a ptr_provenance.
jeroen.dobbelaere edited the summary of this revision. (Show Details)
  • rebase
  • make use of UnknownProvenance
jeroen.dobbelaere edited reviewers, added: asbirlea; removed: nikic.

LangRef update. Refer to `unknown_provenance` now.

Adding a few items/questions discussed in the open meeting:

  • Clarified item: If a load/store has 2/3 operands out of which only 1/2 are used, these operands are stored in the last position(s). So when adding provenance, the current operands need to be shifted to the first position(s) and the provenance added as the last operand.
  • Discussion on why is cloning a load/store setting the ptr_provenance to unknown_provenance? When cloning a load/store instruction, the cloned instruction needs to have the same number of operands. Otherwise crashes were observed in some passes (example given: LoopVectorization). Why can't the number of operands remain the same, but set the ptr_provenance to the (otherwise default) ptr_operand?

@jeroen.dobbelaere: Could you describe using unknown for the cloning question for future reference?

  • Clarified item: If a load/store has 2/3 operands out of which only 1/2 are used, these operands are stored in the last position(s). So when adding provenance, the current operands need to be shifted to the first position(s) and the provenance added as the last operand.

There are a number of implementations possible for adding an optional argument. They all share that at some moment, they need to know the actual
number of arguments in use. Depending on the implementation, those moments can differ.

It is also important to know that the operands are stored _before_ the this pointer. (See implementation of getIntrusiveOperands())
In order to get to the location of the first operand, we subtract NumUserOperands from this.

The original (obsolete) implementation (variant 0) always reserves the optional argument. The reserved amount of operands is stored in NumUserOperands,
and an extra bit 'NumUserOperandsDelta' was used when we need to know the actual number of operands.

The main issue with variant 0, is that it introduces and extra bit extract + addition in a number of common operations. Not just for load/store instructions, but for all instructions.
This has a measurable compile time impact :(

The split-up compile time impact (and patches) of variant0 can be found here:
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&branch=dobbelaj-snps/perf/test_instructions_20200907-01

So, I came up with 2 other variants that can be seen here:
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&branch=dobbelaj-snps/perf/test_instructions_20200907-02

variant 1: This variant also always reserves the optional argument. But now, NumUserOperands tracks the actual number of operands in use.
As getIntrusiveOperands() uses NumUserOperands() to find back the location of the first operand, we now have a problem when introducing (or removing)
the ptr_provenance operand: when changing NumUserOperands, suddenly all operands are shifted. Because of this behavior, the code for
adding (removing) the operand is slightly more complex, as it is also moving around the operands to compensate for the shifting behavior.

The big benefit is that we now only pay this penalty when using the new operand. So, variant 1 only has a minor compile time impact.

Then I also tried out variant 2. This is similar to variant 1, but they introduce a separate version of Load (and Store) with or without memory
space for the optional argument. The main benefit is that, if the ptr_provenance is never used, no space is reserved for it. But, the initial switching from
a 'standard' Load/Store to a 'Load/Store with optional ptr_provance support' is a lot higher and maintaining the correct usage of the instructions is also
more complex, which tend to result in more subtle errors.

Looking at the compile time and heap impact, and the potential maintenance issues, I settled on using variant 1, which is also the version implemented in this review.

  • Discussion on why is cloning a load/store setting the ptr_provenance to unknown_provenance? When cloning a load/store instruction, the cloned instruction needs to have the same number of operands. Otherwise crashes were observed in some passes (example given: LoopVectorization). Why can't the number of operands remain the same, but set the ptr_provenance to the (otherwise default) ptr_operand?

@jeroen.dobbelaere: Could you describe using unknown for the cloning question for future reference?

One of the goals when introducing the optional parameter, was to reduce the impact on existing code/optimizations as much as possible. When cloning the instruction there are 5 possible cases to be considered:

  1. omit (=do not clone) the ptr_provenance operand. This resulted in problems, as now, the number of operands between the original and the cloned instruction can differ.
  2. clone the ptr_provenance operand when available. This resulted in dominance violations for some optimization passes that are not aware of the provenance operand.
  3. When available, replace the ptr_provenance with the 'pointer' operand in the clone. This results in an extra use of the pointer operand, potentially resulting in worse code for optimizations depending on 'hasOneUse()'.
  4. When available, replace the ptr_provenance with undef in the clone. (Special value for the original Full Restrict)
  5. When available, replace the ptr_provenance with unknown_provenance in the clone.

In the original full restrict patches, (4) is done, as this (together with not copying !noalias metadata) resulted in the safest approach. A undef ptr_provenance is treated there as: look at the pointer operand for the provenance, aka similar to omitting the ptr_provenance, without actually doing it (1), and also similar to (3), but without introducing an extra 'use' on the pointer operand.

When preparing this part of the infrastructure, I wanted to take into account a more generic usage of the ptr_provenance infrastructure. For that, a special constant unknown_provenance is introduced. This indicates that, for the cloned access, we don't know the ptr_provenance (and associated noalias annotations). We can also not depend on the pointer operand for the provenance, as the pointer value can have a different provenance (!). Only when the optimization pass that is introducing the clone is aware of the ptr_provenance operand, the right decision can be taken, which can result in better code. So, (5) is the safe approach with minimal impact on existing code.

Thank you for all the clarifications. I'm convinced this is the right way to move forward at this point.

To add a note from offline discussions: the correct options when cloning are either unknown provenance, like in this patch, where it's possible for some optimizations to be missed, or cloning the ptr_provenance and resolving the dominance (and other?) issues in all passes (that move instructions, but probably not limited to these).
It is not correct to use the ptr_operand: with ptr_provenance, the issue described in https://bugs.llvm.org/show_bug.cgi?id=34548#c93 would be resolved, but not if a clone would go back to using the ptr_operand.

nikic added a subscriber: nikic.Jan 24 2022, 10:09 AM

It's been a while since I looked at this, so please excuse the stupid question...

Why do we model provenance as an additional value on the load/store? Naively, I would think that it would be preferable to have an intrinsic like p' = llvm.with.provenance(p, provenance), because that would work with any use of the pointer -- the ptr_provenance operand is limited to loads and stores, but not with memory-accessing calls or captured pointers, and I don't think it can really be extended in that direction.

It's been a while since I looked at this, so please excuse the stupid question...

Why do we model provenance as an additional value on the load/store? Naively, I would think that it would be preferable to have an intrinsic like p' = llvm.with.provenance(p, provenance), because that would work with any use of the pointer -- the ptr_provenance operand is limited to loads and stores, but not with memory-accessing calls or captured pointers, and I don't think it can really be extended in that direction.

Hi Nikita,

my talk on the 2021 LLVM Dev Conference explains this. See https://www.youtube.com/watch?v=08XwXB3GHck 2021 LLVM Dev Mtg “ptr_provenance and @llvm.noalias: The Tale of Full Restrict”

In short:

  • there is the llvm.with.provenance(p, provenance) equivalent intrinsic: it is called llvm.experimental.ptr.provenance (see D107355)
  • load/store instructions are handled differently, mostly to make the process more efficient and to make the new support easier to integrate with existing optimization passes. Those that just replace the pointer computation won't influence the provenance (once that is set).
nikic added a comment.Jan 25 2022, 1:29 AM

It's been a while since I looked at this, so please excuse the stupid question...

Why do we model provenance as an additional value on the load/store? Naively, I would think that it would be preferable to have an intrinsic like p' = llvm.with.provenance(p, provenance), because that would work with any use of the pointer -- the ptr_provenance operand is limited to loads and stores, but not with memory-accessing calls or captured pointers, and I don't think it can really be extended in that direction.

Hi Nikita,

my talk on the 2021 LLVM Dev Conference explains this. See https://www.youtube.com/watch?v=08XwXB3GHck 2021 LLVM Dev Mtg “ptr_provenance and @llvm.noalias: The Tale of Full Restrict”

In short:

  • there is the llvm.with.provenance(p, provenance) equivalent intrinsic: it is called llvm.experimental.ptr.provenance (see D107355)
  • load/store instructions are handled differently, mostly to make the process more efficient and to make the new support easier to integrate with existing optimization passes. Those that just replace the pointer computation won't influence the provenance (once that is set).

Thanks! So the ptr_provenance is just an optimized representation of the intrinsic, that makes sense.

Have you considered inverting this patch stack and landing the intrinsic first, as that part seems more straightforward than the IR change? I have a suspicion that actually supporting independent address + provenance will require some significant work for various users of getUnderlyingObject(), which may not account for the possibility of address and provenance being different. Updating all users seems like something we can do using just the intrinsic.

Have you considered inverting this patch stack and landing the intrinsic first, as that part seems more straightforward than the IR change? I have a suspicion that actually supporting independent address + provenance will require some significant work for various users of getUnderlyingObject(), which may not account for the possibility of address and provenance being different. Updating all users seems like something we can do using just the intrinsic.

It should be possible, but I am not sure it makes a lot of sense: Looking at the big picture, this proposed set of patches for providing ptr_provenance is a logical next step. My intention is that these patches only go in once all of them have been reviewed and accepted. At that time the basic infrastructure is available for the full restrict patches, which means that the next steps will start making use of this basic infrastructure.
The current users of the 'GetUnderlyingObject' can keep their behavior (for now), being: look for the underlying object following the computational path of the pointer. With more actual usage of the ptr_provenance, some of those GetUnderlyingObject users can switch to the provenance path, which should result in a speedup.
The more tricky changes are related to some of the bugs where the provenance of pointers gets switched and to N2676. Once the base infrastructure is there, we can also fix those.

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 7:00 AM