This is an archive of the discontinued LLVM Phabricator instance.

[PATCH 03/27] [noalias] [IR] Introduce ptr_provenance for LoadInst/StoreInst
Needs ReviewPublic

Authored by jeroen.dobbelaere on Oct 4 2019, 2:17 PM.

Details

Reviewers
hfinkel
jdoerfert
Summary

This is part of the series started by D68484.

The ptr_provenance parameter tracks dependencies on noalias intrinsics without
blocking optimizations on the pointer value.

Note: the space for the optional ptr_provenance argument will always be reserved for each LoadInst/StoreInst.

Note: this is a stable point and tests should run fine with the patches applied up to this point.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2019, 2:17 PM
jeroen.dobbelaere edited the summary of this revision. (Show Details)Oct 4 2019, 2:19 PM

All that subtle magic about "we may have 2 operands but in some places we need to pretend that we have 3 operands" seems suboptimal to me.
Semantically, can we just always allocate space for 3 operands, but use either nullptr or undef as a magic to deduce whether we actually have the third operand?

llvm/include/llvm/IR/Instructions.h
313

assertion message missing

471

assertion message missing

All that subtle magic about "we may have 2 operands but in some places we need to pretend that we have 3 operands" seems suboptimal to me.
Semantically, can we just always allocate space for 3 operands, but use either nullptr or undef as a magic to deduce whether we actually have the third operand?

It is kind of magic and it took some time to get it right. The technique comes from GlobalVariable where there is either 0 or 1 operands.
There are 2 operands allocated for LoadInst, 3 for StoreInst. The challenge is that the operands are allocated before the instruction.
When the NumUserOperands changes, suddenly all operands are shifted. That is not convenient. That's why I choose to keep NumUserOperands constant, and to
add an extra 'delta'. Any better solution that works is welcome though.

Added assert message; clang-format

jeroen.dobbelaere marked 2 inline comments as done.Oct 7 2019, 7:47 AM
a.elovikov added inline comments.
llvm/lib/IR/Instructions.cpp
4150

Why is it safe? Consider we have aliasing load and store, we clone them so load.clone and load.store now have undef in the sidechannels, so optimizations are free to choose the values for them that would mean noalias.

jeroen.dobbelaere marked an inline comment as done.Oct 10 2019, 1:49 PM
jeroen.dobbelaere added inline comments.
llvm/lib/IR/Instructions.cpp
4150
  • Undef is maybe the wrong value and could indeed trigger problems in future.
  • null is not safe, as that is used to indicate a converted load/store instruction, when it does not depend on a restrict pointer (still in combination with the !noalias metadata, which must be present and indicate what scopes are visible)..
  • Copying the noalias_sidechannel from the original is also not safe. It might not be valid in the new context.
  • Removing the noalias_sidechannel (as was done in some earlier version) also does not work: some passes rightfully expect the clone to have the same number of arguments as the original.
jeroen.dobbelaere retitled this revision from [PATCH 05/38] [noalias] [IR] Introduce noalias_sidechannel for LoadInst/StoreInst to [PATCH 03/26] [noalias] [IR] Introduce ptr_provenance for LoadInst/StoreInst.
jeroen.dobbelaere edited the summary of this revision. (Show Details)
jeroen.dobbelaere retitled this revision from [PATCH 03/26] [noalias] [IR] Introduce ptr_provenance for LoadInst/StoreInst to [PATCH 03/27] [noalias] [IR] Introduce ptr_provenance for LoadInst/StoreInst.Sep 7 2020, 2:36 PM

The performance impact of different parts of this patch can be seen here: https://llvm-compile-time-tracker.com/index.php?branch=dobbelaj-snps/perf/test_instructions_20200907-01

  • Introducing the 'NumUserOperandsDelta' has a measurable impact on the number of instructions as it affects every instruction. Maybe that can be avoided ?
  • Introducing the optional extra argument for LoadInst/StoreInst has (as expected) some effect on the RSS.

I have 3 variants of this patch:

  • variant 0 : always reserve place for the ptr_provenance; Introduce NumOfOperandsDelta to get the correct number of operands
    • extra space overhead for load/store instructions;
    • extra instruction overhead for ALL instructions ('getNumOperand()');
    • low overhead when adding/removing ptr_provenance
    • NOTE: this is the current provided patch
  • variant 1 : always reserve place for the ptr_provenance; Move arguments around when adding/removing ptr_provenance
    • extra space overhead for load/store instructions;
    • NO impact on other instructions;
    • slightly more overhead when accessing operands of load/store instructions;
    • medium overhead when adding/removing ptr_provenance
  • variant 2 : Have a LoadInst, LoadWithProvInst and StoreInst, StoreWithProvInst. Only the 'WithProv' variant reserves place for optional argument. When the ptr_provenance is added, it can involve creating a Load/StoreWithProv based on the original Load/Store. Adding/removing the ptr_provenance argument involves moving arguments around.
    • extra space only needed when we know we will need it.
    • NO impact on other instructions;
    • slightly more overhead when accessing operands of load/store instructions;
    • medium to high overhead when adding ptr_provenance. medium when removing it.
    • higher risk in keeping things correct in optimization: adding the ptr_provenance can require recreating the Load/Store instruction. This might be difficult if the optimization pass is tracking pointers to the original Load/Store instruction.
    • main benefit: almost no impact from this change if the feature is not used (aka -fno-full-restrict)

Any preference in the way forward ? variant 1 is the easy one, as it is just a drop-in. Variant 2 is at the moment less-stable as it requires changes in all optimization passes that need to track the ptr_provenance.

I have 3 variants of this patch:

  • variant 0 : always reserve place for the ptr_provenance; Introduce NumOfOperandsDelta to get the correct number of operands
    • extra space overhead for load/store instructions;
    • extra instruction overhead for ALL instructions ('getNumOperand()');
    • low overhead when adding/removing ptr_provenance
    • NOTE: this is the current provided patch
  • variant 1 : always reserve place for the ptr_provenance; Move arguments around when adding/removing ptr_provenance
    • extra space overhead for load/store instructions;
    • NO impact on other instructions;
    • slightly more overhead when accessing operands of load/store instructions;
    • medium overhead when adding/removing ptr_provenance
  • variant 2 : Have a LoadInst, LoadWithProvInst and StoreInst, StoreWithProvInst. Only the 'WithProv' variant reserves place for optional argument. When the ptr_provenance is added, it can involve creating a Load/StoreWithProv based on the original Load/Store. Adding/removing the ptr_provenance argument involves moving arguments around.
    • extra space only needed when we know we will need it.
    • NO impact on other instructions;
    • slightly more overhead when accessing operands of load/store instructions;
    • medium to high overhead when adding ptr_provenance. medium when removing it.
    • higher risk in keeping things correct in optimization: adding the ptr_provenance can require recreating the Load/Store instruction. This might be difficult if the optimization pass is tracking pointers to the original Load/Store instruction.
    • main benefit: almost no impact from this change if the feature is not used (aka -fno-full-restrict)

Any preference in the way forward ? variant 1 is the easy one, as it is just a drop-in. Variant 2 is at the moment less-stable as it requires changes in all optimization passes that need to track the ptr_provenance.

I think variant 1 is the obvious winner.
Though while i'm not familiar with the code, i'm not sure why you'd need to move args around - can't ptr_provenance be added as a new last 'argument'?

I think variant 1 is the obvious winner.
Though while i'm not familiar with the code, i'm not sure why you'd need to move args around - can't ptr_provenance be added as a new last 'argument'?

This is what is done in the current patch (variant 0). I had to introduce a NumUserOperandsDelta to compensate, as now getNumUserOperands() == NumUserOperands - NumUserOperandsDelta.
This had impact on op_end() and getNumUserOperands():

  • op_begin(): static_cast<Use*>(this)-NumUserOperands
  • op_end(): op_begin() + NumUserOperands - NumUserOperandsDelta
  • getNumUserOperands(): NumUserOperands - NumUserOperandsDelta

The way how variant 1 works is the following:

  • Operands are allocated before the 'this' that points to the instruction
  • LoadInst/StoreInst become associated to the 'VariadicOperandTraits' (instead of the OptionalOperandTraits)
  • op_begin(): static_cast<Use*>(this)-NumUserOperands
  • op_end(): op_begin() + NumUserOperands
  • getNumUserOperands(): NumUserOperands
  • When we switch from 2 to 3 arguments, the operands suddenly switch place (op_begin() changes). The shuffling can be done like:
void setPtrProvenance(Value *PtrProvenance) {
  if (getNumOperands() == 2) {
    setNumOperands(3);
    setOperand(0, getOperand(1));
    setOperand(1, getOperand(2));
  }
  setOperand(2, PtrProvenance);
}

It is even likely that a more efficient mechanism for the shuffling is possible.

lebedev.ri added inline comments.Sep 17 2020, 8:48 AM
llvm/include/llvm/IR/Instructions.h
198

Are there Instruction/Value subclasses already, that define non-trivial destructor?
I think this may be a problem for bumpptr allocators.

jeroen.dobbelaere marked an inline comment as done.Sep 17 2020, 11:58 PM
jeroen.dobbelaere added inline comments.
llvm/include/llvm/IR/Instructions.h
198

Are there Instruction/Value subclasses already, that define non-trivial destructor?
I think this may be a problem for bumpptr allocators.

How would that interfere ?

The non-trivial destructor is is handled in Value::deleteValue() (https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Value.cpp#L123).

I got the base idea by looking at GlobalVariable which uses the OptionalOperandTraits.
(See https://github.com/llvm/llvm-project/blob/3d65030c45971bf041b7cdeb4b3b1378cadc4abe/llvm/include/llvm/IR/GlobalVariable.h#L73 ; But it seems that in the mean time part of the implementation has moved to operator delete)

Of course GlobalVariable is a Constant, but in retrospect, it seems that an Instruction is handled in a similar way.

jeroen.dobbelaere marked an inline comment as done.
jeroen.dobbelaere edited the summary of this revision. (Show Details)

Switch to 'Variant 1': spend more time when a ptr_provenance argument is added or removed. This removes the impact on 'all instructions' in the previous version.

https://llvm-compile-time-tracker.com/index.php?branch=dobbelaj-snps/perf/test_instructions_20200907-02 shows the effect of 'variant 1' and 'variant 2', on a pristine version of llvm (= without other changes related to full restrict).