This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] Treat invariant group insts as bitcasts for load operands
ClosedPublic

Authored by aeubanks on Apr 22 2021, 1:14 PM.

Details

Summary

We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.

Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.

Diff Detail

Event Timeline

aeubanks created this revision.Apr 22 2021, 1:14 PM
aeubanks requested review of this revision.Apr 22 2021, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2021, 1:14 PM
rnk added a comment.Apr 26 2021, 6:47 PM

I added @lebedev.ri as a reviewer, since Roman has done significant work on instcombine/instsimplify.

llvm/lib/Analysis/InstructionSimplify.cpp
5831

We already discarded constants above, so these don't have to be operators, they could be Instructions, BitCastInst, etc.

I think the recursion is unfortunate there. How about something like

static Constant *ConstructLoadOperandConstant(Value *Op) {
  SmallVector<Value*, 16> Worklist;
  
  OpStack.emplace_back(Op);
  for(int I = 0; I != OpStack.size(); ++I) {
    Value* Op = OpStack.back();
    
    if (isa<Constant>(Op))
      break; // YAY!

    if (auto *BC = dyn_cast<BitCastOperator>(Op))
      OpStack.emplace_back(BC->getOperand(0)); // Recurse.

    if (auto *GEP = dyn_cast<GEPOperator>(Op)) {
      if (all_of(ArrayRef<Value*>(GEP->operands()).drop_front(),
          [](Value*Op) { return isa<Constant>(Op); }))
        OpStack.emplace_back(GEP->getOperand(0)); // Recurse.
    }

    if (auto *II = dyn_cast<IntrinsicInst>(Op)) {
      if (II->getIntrinsicID() == Intrinsic::strip_invariant_group ||
          II->getIntrinsicID() == Intrinsic::launder_invariant_group)
        OpStack.emplace_back(II->getOperand(0)); // Recurse.
    }
    
    return nullptr; // Can't look past this instruction. Give up.
  }
  
  // Alright! Reconstruct this chain as constant expressions.
  Constant* NewOp = cast<Constant>(Worklist.pop_back_val());
  while(!OpStack.empty()) {
    Value* Op = OpStack.pop_back_val();

    if (isa<BitCastOperator>(Op))
      NewOp = ConstantExpr::getBitCast(NewOp, Op->getType());

    if (auto *GEP = dyn_cast<GEPOperator>(Op)) {
      SmallVector<Constant *> Idxs;
      Idxs.reserve(GEP->getNumOperands() - 1);
      for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I)
        Idxs.push_back(cast<Constant>(GEP->getOperand(I));
      return ConstantExpr::getGetElementPtr(GEP->getSourceElementType(), NewOp,
                                            Idxs, GEP->isInBounds(),
                                            GEP->getInRangeIndex());
    }

    if (auto *II = dyn_cast<IntrinsicInst>(Op)) {
      if (II->getIntrinsicID() == Intrinsic::strip_invariant_group ||
          II->getIntrinsicID() == Intrinsic::launder_invariant_group)
        NewOp = ConstantExpr::getBitCast(NewOp, Op->getType());  
    }
    
    llvm_unreachable("Instruction handled in first loop but not here?");
  }
}
  
return NewOp;
llvm/lib/Analysis/InstructionSimplify.cpp
5827

Given load from \p Op, see if we can peel off certain pointer adjustments,
that are no-op for the purpose of the performing the load,
in the hope of eventually discovering that we are
actually loading from a constant.

5865–5866

As per D33619. and the default of the switch in SimplifyInstruction(),
we'll no longer call ConstantFoldInstruction(), but just the ConstantFoldLoadFromConstPtr().
Will we loose anything important because of that?

I would guess that iff we actually want to change that, perhaps it should be done afterwards,
and this should just continue to use ConstantFoldInstruction()?

llvm/test/Transforms/InstSimplify/invariant.group-load.ll
2

I'd like to see a bit more tests.
For example, this really needs negative tests.
What should happen if the @A isn't a constant?
One test with some pattern we can't handle would be good (non-constant gep indexes e.g.)

I think the recursion is unfortunate there. How about something like

static Constant *ConstructLoadOperandConstant(Value *Op) {
  SmallVector<Value*, 16> Worklist;
  
  OpStack.emplace_back(Op);
  for(int I = 0; I != OpStack.size(); ++I) {
    Value* Op = OpStack.back();
    
    if (isa<Constant>(Op))
      break; // YAY!

    if (auto *BC = dyn_cast<BitCastOperator>(Op))
      OpStack.emplace_back(BC->getOperand(0)); // Recurse.

    if (auto *GEP = dyn_cast<GEPOperator>(Op)) {
      if (all_of(ArrayRef<Value*>(GEP->operands()).drop_front(),
          [](Value*Op) { return isa<Constant>(Op); }))
        OpStack.emplace_back(GEP->getOperand(0)); // Recurse.
    }

    if (auto *II = dyn_cast<IntrinsicInst>(Op)) {
      if (II->getIntrinsicID() == Intrinsic::strip_invariant_group ||
          II->getIntrinsicID() == Intrinsic::launder_invariant_group)
        OpStack.emplace_back(II->getOperand(0)); // Recurse.
    }
    
    return nullptr; // Can't look past this instruction. Give up.
  }
  
  // Alright! Reconstruct this chain as constant expressions.
  Constant* NewOp = cast<Constant>(Worklist.pop_back_val());
  while(!OpStack.empty()) {
    Value* Op = OpStack.pop_back_val();

    if (isa<BitCastOperator>(Op))
      NewOp = ConstantExpr::getBitCast(NewOp, Op->getType());

    if (auto *GEP = dyn_cast<GEPOperator>(Op)) {
      SmallVector<Constant *> Idxs;
      Idxs.reserve(GEP->getNumOperands() - 1);
      for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I)
        Idxs.push_back(cast<Constant>(GEP->getOperand(I));
      return ConstantExpr::getGetElementPtr(GEP->getSourceElementType(), NewOp,
                                            Idxs, GEP->isInBounds(),
                                            GEP->getInRangeIndex());
    }

    if (auto *II = dyn_cast<IntrinsicInst>(Op)) {
      if (II->getIntrinsicID() == Intrinsic::strip_invariant_group ||
          II->getIntrinsicID() == Intrinsic::launder_invariant_group)
        NewOp = ConstantExpr::getBitCast(NewOp, Op->getType());  
    }
    
    llvm_unreachable("Instruction handled in first loop but not here?");
  }
}
  
return NewOp;

Also if you want to traverse bitcasts and invariant.group intrinsics, you could perhaps use `stripPointerCastsAndOffsets<PSK_ForAliasAnalysis>`

llvm/test/Transforms/InstSimplify/invariant.group-load.ll
13

Could you add similar test for launder.invariant.group as well?

Out of curiosity: have you seen this pattern happening in the wild with -fstrict-vtable-pointers?

Out of curiosity: have you seen this pattern happening in the wild with -fstrict-vtable-pointers?

Yes, it was causing a major size regression in one specific file (with rtti). I've managed to reduce it down to

struct A {
  int i;
  virtual void f();
  A(): i(4) {}
};

const A g;

int foo() {
  const A* a = &g;
  return a->i;
}

$ ./build/rel/bin/clang++ /tmp/a.cc -o - -S -emit-llvm -O2 -fstrict-vtable-pointers

%struct.A = type <{ i32 (...)**, i32, [4 x i8] }>

@_ZL1g = internal constant %struct.A <{ i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV1A, i32 0, inrange i32 0, i32 2) to i32 (...)**), i32 4, [4 x i8] zeroinitializer }>, align 8
@_ZTV1A = external dso_local unnamed_addr constant { [3 x i8*] }, align 8
@llvm.global_ctors = appending global [0 x { i32, void ()*, i8* }] zeroinitializer

; Function Attrs: nofree nosync nounwind readnone uwtable willreturn mustprogress
define dso_local i32 @_Z8hahahahav() local_unnamed_addr #0 {
entry:
  %0 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* bitcast (%struct.A* @_ZL1g to i8*))
  %i = getelementptr inbounds i8, i8* %0, i64 8
  %1 = bitcast i8* %i to i32*
  %2 = load i32, i32* %1, align 8, !tbaa !6
  ret i32 %2
}

$ ./build/rel/bin/clang++ /tmp/a.cc -o - -S -emit-llvm -O2

; Function Attrs: nofree norecurse nosync nounwind readnone uwtable willreturn mustprogress
define dso_local i32 @_Z8hahahahav() local_unnamed_addr #0 {
entry:
  ret i32 4
}

looks like the strip is coming from CodeGenFunction::EmitLValueForField():

if (auto *ClassDef = dyn_cast<CXXRecordDecl>(rec)) {
  if (CGM.getCodeGenOpts().StrictVTablePointers &&
      ClassDef->isDynamicClass()) {
    // Getting to any field of dynamic object requires stripping dynamic
    // information provided by invariant.group.  This is because accessing
    // fields may leak the real address of dynamic object, which could result
    // in miscompilation when leaked pointer would be compared.
    auto *stripped = Builder.CreateStripInvariantGroup(addr.getPointer());
    addr = Address(stripped, addr.getAlignment());
  }
}

is this actually true?

Prazek added a comment.EditedApr 28 2021, 2:51 AM

looks like the strip is coming from CodeGenFunction::EmitLValueForField():

if (auto *ClassDef = dyn_cast<CXXRecordDecl>(rec)) {
  if (CGM.getCodeGenOpts().StrictVTablePointers &&
      ClassDef->isDynamicClass()) {
    // Getting to any field of dynamic object requires stripping dynamic
    // information provided by invariant.group.  This is because accessing
    // fields may leak the real address of dynamic object, which could result
    // in miscompilation when leaked pointer would be compared.
    auto *stripped = Builder.CreateStripInvariantGroup(addr.getPointer());
    addr = Address(stripped, addr.getAlignment());
  }
}

is this actually true?

It is essentially more complex case of leaking the value of the pointer through comparison (documented in the docs)

struct A {

virtual void foo();
int field;

};

struct B {
  void foo() override;
};

external A* clobber(A* a) {
  return new(a) B;
}

void bar(A *a) {
  int* addr =  &a->field;
  A* new_a = clobber(a);
  if (addr == &new_a->field) {
    new_a->foo();
  }
}

Here if you don't strip "virtual information" from the pointer before getting the address of the field, the compiler will be able to figure out that the address
of a and new_a is the same in the branch, which would result in RAUW of new_a for a, resulting in incorrect devirtualization.

Out of curiosity: have you seen this pattern happening in the wild with -fstrict-vtable-pointers?

Yes, it was causing a major size regression in one specific file (with rtti). I've managed to reduce it down to

struct A {
  int i;
  virtual void f();
  A(): i(4) {}
};

const A g;

int foo() {
  const A* a = &g;
  return a->i;
}

$ ./build/rel/bin/clang++ /tmp/a.cc -o - -S -emit-llvm -O2 -fstrict-vtable-pointers

%struct.A = type <{ i32 (...)**, i32, [4 x i8] }>

@_ZL1g = internal constant %struct.A <{ i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV1A, i32 0, inrange i32 0, i32 2) to i32 (...)**), i32 4, [4 x i8] zeroinitializer }>, align 8
@_ZTV1A = external dso_local unnamed_addr constant { [3 x i8*] }, align 8
@llvm.global_ctors = appending global [0 x { i32, void ()*, i8* }] zeroinitializer

; Function Attrs: nofree nosync nounwind readnone uwtable willreturn mustprogress
define dso_local i32 @_Z8hahahahav() local_unnamed_addr #0 {
entry:
  %0 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* bitcast (%struct.A* @_ZL1g to i8*))
  %i = getelementptr inbounds i8, i8* %0, i64 8
  %1 = bitcast i8* %i to i32*
  %2 = load i32, i32* %1, align 8, !tbaa !6
  ret i32 %2
}

$ ./build/rel/bin/clang++ /tmp/a.cc -o - -S -emit-llvm -O2

; Function Attrs: nofree norecurse nosync nounwind readnone uwtable willreturn mustprogress
define dso_local i32 @_Z8hahahahav() local_unnamed_addr #0 {
entry:
  ret i32 4
}

Nice finding!

aeubanks updated this revision to Diff 345276.May 13 2021, 1:53 PM

rebase
add more tests
make operand constant construction not recursive
skip if load operand was initially constant

Prazek added inline comments.May 17 2021, 6:30 AM
llvm/lib/Analysis/InstructionSimplify.cpp
5843–5846

nit: personally I would reverse the logic:

if (II->getIntrinsicID() == Intrinsic::strip_invariant_group &&
    II->getIntrinsicID() == Intrinsic::launder_invariant_group)
  Worklist.push_back(II->getOperand(0));
else
  return nullptr;

I think it is more canonical and easier to read.
I guess also then you could simplify the assert you have later on to share some code (e.g. isInvariantGroupIntrinsic function)

aeubanks updated this revision to Diff 345923.May 17 2021, 10:16 AM

use isLaunderOrStripInvariantGroup(), I forgot I had already made that

lebedev.ri accepted this revision.Jun 1 2021, 9:16 AM

LGTM

llvm/test/Transforms/InstSimplify/invariant.group-load.ll
1

Please precommit these tests.

This revision is now accepted and ready to land.Jun 1 2021, 9:16 AM
aeubanks updated this revision to Diff 349130.Jun 1 2021, 4:33 PM

rebase after precommitted test

This revision was landed with ongoing or failed builds.Jun 1 2021, 4:33 PM
This revision was automatically updated to reflect the committed changes.
clin1 added a subscriber: clin1.Jun 9 2021, 11:42 AM

Hi @aeubanks, we turned up a corner case where jump threading is creating a self-referential GEP:

%tmp27 = getelementptr inbounds i8, i8* %tmp27, i64 1

Surprisingly, this is allowed in unreachable code:
https://lists.llvm.org/pipermail/llvm-dev/2012-October/054719.html

The self reference causes InstSimplify to hang when it's called from JT.
Test case "reduced.ll" attached. Would you mind checking it out? Thank you!

Hi @aeubanks, we turned up a corner case where jump threading is creating a self-referential GEP:

%tmp27 = getelementptr inbounds i8, i8* %tmp27, i64 1

Surprisingly, this is allowed in unreachable code:
https://lists.llvm.org/pipermail/llvm-dev/2012-October/054719.html

The self reference causes InstSimplify to hang when it's called from JT.
Test case "reduced.ll" attached. Would you mind checking it out? Thank you!

The fix should lie in jumpthreading - it should not deal with unreachable blocks.

nikic added a comment.Jun 9 2021, 1:08 PM

While it may make sense to address this in JumpThreading in addition, I believe InstSimplify is intended to be robust against unreachable code. As InstSimplify can thread over phis, I don't think it's even possible for a caller to reliably prevent unreachable code from reaching InstSimplify.

lebedev.ri added inline comments.Jun 10 2021, 2:09 AM
llvm/lib/Analysis/InstructionSimplify.cpp
5830–5831

Then i guess you want to add a SmallPtrSet<Value*> Visited; outside of the loop
insert into it in the beginning of the loop, and if insertion fails - return nullptr;.

aeubanks reopened this revision.Jun 11 2021, 11:49 AM
This revision is now accepted and ready to land.Jun 11 2021, 11:49 AM
aeubanks updated this revision to Diff 351520.Jun 11 2021, 11:49 AM

check for self referential values

This revision was landed with ongoing or failed builds.Jun 15 2021, 12:59 PM
This revision was automatically updated to reflect the committed changes.