This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
LoopUtils.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LICM.cpp
-
LoopSink.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
invariant.group.ll

Differential D99784

[LICM] Hoist loads with invariant.group metadata
AbandonedPublic

Authored by aeubanks on Apr 1 2021, 8:35 PM.

Download Raw Diff

Details

Reviewers

Prazek
rnk
rsmith
lebedev.ri

Commits

rG4c89bcadf6ca: [LICM] Hoist loads with invariant.group metadata

Summary

We want to hoist loads when possible.

However, we can only safely retain metadata if the instruction is
guaranteed to run in the loop. For the purposes of the invariant.group
metadata, keeping it is more important than hoisting it, so do not hoist
a load if it is not guaranteed to run in the loop.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aeubanks created this revision.Apr 1 2021, 8:35 PM

Herald added subscribers: asbirlea, hiraditya. · View Herald TranscriptApr 1 2021, 8:35 PM

aeubanks requested review of this revision.Apr 1 2021, 8:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2021, 8:35 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B96853: Diff 334887.Apr 1 2021, 9:17 PM

+@rsmith who also helped with the design

I think this is correct, I think this is how this is supposed to work.

This revision is now accepted and ready to land.Apr 5 2021, 2:43 PM

This revision was landed with ongoing or failed builds.Apr 8 2021, 9:57 PM

Closed by commit rG4c89bcadf6ca: [LICM] Hoist loads with invariant.group metadata (authored by aeubanks). · Explain Why

This revision was automatically updated to reflect the committed changes.

aeubanks added a commit: rG4c89bcadf6ca: [LICM] Hoist loads with invariant.group metadata.

lebedev.ri added a reverting change: rG1acd9a1a29ac: Revert "[LICM] Hoist loads with invariant.group metadata".May 8 2021, 5:45 AM

Hello. I have just reverted this in rG1acd9a1a29ac30044ecefb6613485d5d168f66ca,
as it bisected to be the first bad commit for a crash.

This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb

$ ./bin/clang++ -O3 -fstrict-vtable-pointers /tmp/test.cpp -g0
$ ./a.out 
Segmentation fault

The "f.fail()" crashes with BUS error, it is compiled into testb,
and the address it is testing is non-sensical.

Please let me know if you are having issues reproducing.
I don't believe this is UB in source code, at least i don't see it.

This revision is now accepted and ready to land.May 8 2021, 5:46 AM

lebedev.ri requested changes to this revision.May 8 2021, 5:47 AM

This revision now requires changes to proceed.May 8 2021, 5:47 AM

Thanks for the repro!

while (true) {
  std::string FPath = StrCat(dir, "index", Idx++, "/");
  std::ifstream f(StrCat(FPath, "size"));
  if (!f.is_open()) break;
  std::string suffix;
  int size;
  f >> size;
  if (f.fail()) // BUS error
    return;
  if (f.good()) {
    std::cout << "read " << size << "\n";
  }
}

Clang is able to devirtualize the fail call on its own. However, it still has IR to perform the virtual base class adjustment:

%13 = bitcast %"class.std::basic_ifstream"* %f to %"class.std::basic_istream"*
%call3 = call nonnull align 8 dereferenceable(16) %"class.std::basic_istream"* @_ZNSirsERi(%"class.std::basic_istream"* nonnull dereferenceable(16) %13, i32* nonnull align 4 dereferenceable(4) %size)
%14 = bitcast %"class.std::basic_ifstream"* %f to i8**
%vtable = load i8*, i8** %14, align 8, !tbaa !10, !invariant.group !12
%vbase.offset.ptr = getelementptr i8, i8* %vtable, i64 -24
%15 = bitcast i8* %vbase.offset.ptr to i64*
%vbase.offset = load i64, i64* %15, align 8
%16 = bitcast %"class.std::basic_ifstream"* %f to i8*
%add.ptr = getelementptr inbounds i8, i8* %16, i64 %vbase.offset
%17 = bitcast i8* %add.ptr to %"class.std::basic_ios"*
%call4 = call zeroext i1 @_ZNKSt9basic_iosIcSt11char_traitsIcEE4failEv(%"class.std::basic_ios"* nonnull dereferenceable(264) %17)

This vtable load is based directly on the %f alloca, and not the result of a launder operation. Because there is no launder operation, LICM hoists this vptr load to the entry block, so it loads uninit stack memory. The virtual base class offset load happens before calling fail, which is where the crash happens.

So: this is a frontend bug, and perhaps a frontend missed optimization. In general, clang does not launder local variable allocas, and that seems like it could be a problem. However, always laundering and stripping every dynamic local variable is probably way overkill, and would block SROA. Maybe we could only use laundered object pointers for the vptr loads.

I had some time to look into it now, sorry that I missed it on first revision.
If I understand the LICM code correctly, it needs to drop all the instruction metadata when hoisting instruction that will not be executed unconditionally. This means that code like:
https://godbolt.org/z/bjP64PbhY

struct A {
  virtual void foo();
};

bool p(int, int);

void loop(A& a, int n) {

  int i = 0;
  for (int i = 0 ; i < n; i++) {
      if (p(i, n))
          a.foo();
  }
}

When vtable load will be hoisted, it will be stripped from invariant.group metadata. This means that we won't gonna be able to devirtualize it further after e.g. inlining this function somewhere (exposing other virtual call or construction of a.
Assuming my understanding is right (please run this example with your patch), I would oppose to always hoist loads with !invariant.group loads. It still might be beneficial for default optimization pipeline, but I worry that it will limit the optimizer across modules (LTO, ThinLTO etc).
I think it would be better to be on the safer side and hoist loads with !invariant.load only if they are executed unconditionally. I tried to prototpe it long time ago here: https://reviews.llvm.org/D45151

only hoist when load is guaranteed to run

I'm wondering if we might be dropping invariant.group metadata on loads we were already hoisting

In D99784#2749459, @rnk wrote:
Thanks for the repro!
while (true) {
  std::string FPath = StrCat(dir, "index", Idx++, "/");
  std::ifstream f(StrCat(FPath, "size"));
  if (!f.is_open()) break;
  std::string suffix;
  int size;
  f >> size;
  if (f.fail()) // BUS error
    return;
  if (f.good()) {
    std::cout << "read " << size << "\n";
  }
}
Clang is able to devirtualize the fail call on its own. However, it still has IR to perform the virtual base class adjustment:
%13 = bitcast %"class.std::basic_ifstream"* %f to %"class.std::basic_istream"*
%call3 = call nonnull align 8 dereferenceable(16) %"class.std::basic_istream"* @_ZNSirsERi(%"class.std::basic_istream"* nonnull dereferenceable(16) %13, i32* nonnull align 4 dereferenceable(4) %size)
%14 = bitcast %"class.std::basic_ifstream"* %f to i8**
%vtable = load i8*, i8** %14, align 8, !tbaa !10, !invariant.group !12
%vbase.offset.ptr = getelementptr i8, i8* %vtable, i64 -24
%15 = bitcast i8* %vbase.offset.ptr to i64*
%vbase.offset = load i64, i64* %15, align 8
%16 = bitcast %"class.std::basic_ifstream"* %f to i8*
%add.ptr = getelementptr inbounds i8, i8* %16, i64 %vbase.offset
%17 = bitcast i8* %add.ptr to %"class.std::basic_ios"*
%call4 = call zeroext i1 @_ZNKSt9basic_iosIcSt11char_traitsIcEE4failEv(%"class.std::basic_ios"* nonnull dereferenceable(264) %17)
This vtable load is based directly on the %f alloca, and not the result of a launder operation. Because there is no launder operation, LICM hoists this vptr load to the entry block, so it loads uninit stack memory. The virtual base class offset load happens before calling fail, which is where the crash happens.

So: this is a frontend bug, and perhaps a frontend missed optimization. In general, clang does not launder local variable allocas, and that seems like it could be a problem. However, always laundering and stripping every dynamic local variable is probably way overkill, and would block SROA. Maybe we could only use laundered object pointers for the vptr loads.

I think the current frontend implementation is correct at least in this instance, the load from the alloca should always result in the same value. Although I'm wondering if it's possible that we end up using one alloca for different dynamic types with different lifetimes. If two different types have the invariant.group metadata on vtable loads, that would be bad.

The problem is that we're moving the load before the store (or call to constructor).
Currently, LICM will hoist loads, but only if it can prove the memory they point to is not modified in the loop. This patch bypasses that check for loads with invariant.group metadata. So the current patch is still wrong. We can only hoist if the corresponding store is outside the loop. With invariant.group metadata we're not worried about the loop changing the value, just that the value was stored to in the first place.

Harbormaster completed remote builds in B104214: Diff 345052.May 12 2021, 11:24 PM

In D99784#2756156, @aeubanks wrote:
In D99784#2749459, @rnk wrote:
Thanks for the repro!
while (true) {
  std::string FPath = StrCat(dir, "index", Idx++, "/");
  std::ifstream f(StrCat(FPath, "size"));
  if (!f.is_open()) break;
  std::string suffix;
  int size;
  f >> size;
  if (f.fail()) // BUS error
    return;
  if (f.good()) {
    std::cout << "read " << size << "\n";
  }
}
Clang is able to devirtualize the fail call on its own. However, it still has IR to perform the virtual base class adjustment:
%13 = bitcast %"class.std::basic_ifstream"* %f to %"class.std::basic_istream"*
%call3 = call nonnull align 8 dereferenceable(16) %"class.std::basic_istream"* @_ZNSirsERi(%"class.std::basic_istream"* nonnull dereferenceable(16) %13, i32* nonnull align 4 dereferenceable(4) %size)
%14 = bitcast %"class.std::basic_ifstream"* %f to i8**
%vtable = load i8*, i8** %14, align 8, !tbaa !10, !invariant.group !12
%vbase.offset.ptr = getelementptr i8, i8* %vtable, i64 -24
%15 = bitcast i8* %vbase.offset.ptr to i64*
%vbase.offset = load i64, i64* %15, align 8
%16 = bitcast %"class.std::basic_ifstream"* %f to i8*
%add.ptr = getelementptr inbounds i8, i8* %16, i64 %vbase.offset
%17 = bitcast i8* %add.ptr to %"class.std::basic_ios"*
%call4 = call zeroext i1 @_ZNKSt9basic_iosIcSt11char_traitsIcEE4failEv(%"class.std::basic_ios"* nonnull dereferenceable(264) %17)
This vtable load is based directly on the %f alloca, and not the result of a launder operation. Because there is no launder operation, LICM hoists this vptr load to the entry block, so it loads uninit stack memory. The virtual base class offset load happens before calling fail, which is where the crash happens.

So: this is a frontend bug, and perhaps a frontend missed optimization. In general, clang does not launder local variable allocas, and that seems like it could be a problem. However, always laundering and stripping every dynamic local variable is probably way overkill, and would block SROA. Maybe we could only use laundered object pointers for the vptr loads.
I think the current frontend implementation is correct at least in this instance, the load from the alloca should always result in the same value. Although I'm wondering if it's possible that we end up using one alloca for different dynamic types with different lifetimes. If two different types have the invariant.group metadata on vtable loads, that would be bad.

The problem is that we're moving the load before the store (or call to constructor).
Currently, LICM will hoist loads, but only if it can prove the memory they point to is not modified in the loop. This patch bypasses that check for loads with invariant.group metadata. So the current patch is still wrong. We can only hoist if the corresponding store is outside the loop. With invariant.group metadata we're not worried about the loop changing the value, just that the value was stored to in the first place.

I'm confused. Was that an outdated comment you posted?
It seems like this current diff (which you updated before commenting!)
gets it right?

No, the current patch is still wrong. We can end up hoisting a load before the corresponding store. We currently don't check for that condition when the invariant.group metadata is present.

lebedev.ri requested changes to this revision.May 31 2021, 9:20 AM

This revision now requires changes to proceed.May 31 2021, 9:20 AM

Matt added a subscriber: Matt.May 31 2021, 9:31 AM

jdoerfert added a subscriber: jdoerfert.Jun 10 2021, 2:05 PM

// Loads/stores with an invariant.group metadata are ok to hoist/sink.

I don't see how this follows from the metadata. Is there documentation I missed?
Even if, we might want to not conflate "invariant" and "speculatable". The latter
would be useful on it's own, see https://reviews.llvm.org/D103907#2811455 .

yeah, this is not feasible with the current implementation of fstrict-vtable-pointers, and not sure how beneficial this is anyway

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

LoopUtils.h

2 lines

lib/

Transforms/

Scalar/

LICM.cpp

20 lines

LoopSink.cpp

2 lines

test/

Transforms/

LICM/

invariant.group.ll

94 lines

Diff 345052

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show All 25 Lines
	class AliasSet;			class AliasSet;
	class AliasSetTracker;			class AliasSetTracker;
	class BasicBlock;			class BasicBlock;
	class BlockFrequencyInfo;			class BlockFrequencyInfo;
	class ICFLoopSafetyInfo;			class ICFLoopSafetyInfo;
	class IRBuilderBase;			class IRBuilderBase;
	class Loop;			class Loop;
	class LoopInfo;			class LoopInfo;
				class LoopSafetyInfo;
	class MemoryAccess;			class MemoryAccess;
	class MemorySSA;			class MemorySSA;
	class MemorySSAUpdater;			class MemorySSAUpdater;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class PredIteratorCache;			class PredIteratorCache;
	class ScalarEvolution;			class ScalarEvolution;
	class ScalarEvolutionExpander;			class ScalarEvolutionExpander;
	class SCEV;			class SCEV;
	▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines
	/// \p TargetExecutesOncePerLoop is true only when it is guaranteed that the			/// \p TargetExecutesOncePerLoop is true only when it is guaranteed that the
	/// target executes at most once per execution of the loop body. This is used			/// target executes at most once per execution of the loop body. This is used
	/// to assess the legality of duplicating atomic loads. Generally, this is			/// to assess the legality of duplicating atomic loads. Generally, this is
	/// true when moving out of loop and not true when moving into loops.			/// true when moving out of loop and not true when moving into loops.
	/// If \p ORE is set use it to emit optimization remarks.			/// If \p ORE is set use it to emit optimization remarks.
	bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,			bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
	Loop CurLoop, AliasSetTracker CurAST,			Loop CurLoop, AliasSetTracker CurAST,
	MemorySSAUpdater *MSSAU, bool TargetExecutesOncePerLoop,			MemorySSAUpdater *MSSAU, bool TargetExecutesOncePerLoop,
				const LoopSafetyInfo *SafetyInfo,
	SinkAndHoistLICMFlags *LICMFlags = nullptr,			SinkAndHoistLICMFlags *LICMFlags = nullptr,
	OptimizationRemarkEmitter *ORE = nullptr);			OptimizationRemarkEmitter *ORE = nullptr);

	/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.			/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
	/// The Builder's fast-math-flags must be set to propagate the expected values.			/// The Builder's fast-math-flags must be set to propagate the expected values.
	Value createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,			Value createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,
	Value *Right);			Value *Right);

	▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {
// Check to see if we can sink this instruction to the exit blocks		// Check to see if we can sink this instruction to the exit blocks
// of the loop. We can do this if the all users of the instruction are		// of the loop. We can do this if the all users of the instruction are
// outside of the loop. In this case, it doesn't even matter if the		// outside of the loop. In this case, it doesn't even matter if the
// operands of the instruction are loop invariant.		// operands of the instruction are loop invariant.
//		//
bool FreeInLoop = false;		bool FreeInLoop = false;
if (!I.mayHaveSideEffects() &&		if (!I.mayHaveSideEffects() &&
isNotUsedOrFreeInLoop(I, CurLoop, SafetyInfo, TTI, FreeInLoop) &&		isNotUsedOrFreeInLoop(I, CurLoop, SafetyInfo, TTI, FreeInLoop) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, &Flags,		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true,
ORE)) {		SafetyInfo, &Flags, ORE)) {
if (sink(I, LI, DT, BFI, CurLoop, SafetyInfo, MSSAU, ORE)) {		if (sink(I, LI, DT, BFI, CurLoop, SafetyInfo, MSSAU, ORE)) {
if (!FreeInLoop) {		if (!FreeInLoop) {
++II;		++II;
salvageDebugInfo(I);		salvageDebugInfo(I);
eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);
}		}
Changed = true;		Changed = true;
}		}
▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {
// Try hoisting the instruction out to the preheader. We can only do		// Try hoisting the instruction out to the preheader. We can only do
// this if all of the operands of the instruction are loop invariant and		// this if all of the operands of the instruction are loop invariant and
// if it is safe to hoist the instruction. We also check block frequency		// if it is safe to hoist the instruction. We also check block frequency
// to make sure instruction only gets hoisted into colder blocks.		// to make sure instruction only gets hoisted into colder blocks.
// TODO: It may be safe to hoist if we are hoisting to a conditional block		// TODO: It may be safe to hoist if we are hoisting to a conditional block
// and we have accurately duplicated the control flow from the loop header		// and we have accurately duplicated the control flow from the loop header
// to that block.		// to that block.
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, &Flags,		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true,
ORE) &&		SafetyInfo, &Flags, ORE) &&
worthSinkOrHoistInst(I, CurLoop->getLoopPreheader(), ORE, BFI) &&		worthSinkOrHoistInst(I, CurLoop->getLoopPreheader(), ORE, BFI) &&
isSafeToExecuteUnconditionally(		isSafeToExecuteUnconditionally(
I, DT, TLI, CurLoop, SafetyInfo, ORE,		I, DT, TLI, CurLoop, SafetyInfo, ORE,
CurLoop->getLoopPreheader()->getTerminator())) {		CurLoop->getLoopPreheader()->getTerminator())) {
hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
MSSAU, SE, ORE);		MSSAU, SE, ORE);
HoistedInstructions.push_back(&I);		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	bool isOnlyMemoryAccess(const Instruction I, const Loop L,
return true;		return true;
}		}
}		}

bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,		bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
Loop CurLoop, AliasSetTracker CurAST,		Loop CurLoop, AliasSetTracker CurAST,
MemorySSAUpdater *MSSAU,		MemorySSAUpdater *MSSAU,
bool TargetExecutesOncePerLoop,		bool TargetExecutesOncePerLoop,
		const LoopSafetyInfo *SafetyInfo,
SinkAndHoistLICMFlags *Flags,		SinkAndHoistLICMFlags *Flags,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
assert(((CurAST != nullptr) ^ (MSSAU != nullptr)) &&		assert(((CurAST != nullptr) ^ (MSSAU != nullptr)) &&
"Either AliasSetTracker or MemorySSA should be initialized.");		"Either AliasSetTracker or MemorySSA should be initialized.");

// If we don't understand the instruction, bail early.		// If we don't understand the instruction, bail early.
if (!isHoistableAndSinkableInst(I))		if (!isHoistableAndSinkableInst(I))
return false;		return false;
Show All 12 Lines	if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
if (LI->hasMetadata(LLVMContext::MD_invariant_load))		if (LI->hasMetadata(LLVMContext::MD_invariant_load))
return true;		return true;

if (LI->isAtomic() && !TargetExecutesOncePerLoop)		if (LI->isAtomic() && !TargetExecutesOncePerLoop)
return false; // Don't risk duplicating unordered loads		return false; // Don't risk duplicating unordered loads

		// Loads/stores with an invariant.group metadata are ok to hoist/sink.
		// However, preserving the metadata is more important than hoisting. So only
		// hoist/sink when we can preserve the metadata, which is when the
		// instruction is guaranteed to execute in the loop.
		if (SafetyInfo) {
		if (LI->hasMetadata(LLVMContext::MD_invariant_group) &&
		SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop)) {
		return true;
		}
		}

// This checks for an invariant.start dominating the load.		// This checks for an invariant.start dominating the load.
if (isLoadInvariantInLoop(LI, DT, CurLoop))		if (isLoadInvariantInLoop(LI, DT, CurLoop))
return true;		return true;

bool Invalidated;		bool Invalidated;
if (CurAST)		if (CurAST)
Invalidated = pointerInvalidatedByLoop(MemoryLocation::get(LI), CurAST,		Invalidated = pointerInvalidatedByLoop(MemoryLocation::get(LI), CurAST,
CurLoop, AA);		CurLoop, AA);
▲ Show 20 Lines • Show All 1,261 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopSink.cpp

Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	static bool sinkLoopInvariantInstructions(Loop &L, AAResults &AA, LoopInfo &LI,
// on B (A appears after B), A needs to be sinked first before B can be		// on B (A appears after B), A needs to be sinked first before B can be
// sinked.		// sinked.
for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {		for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {
Instruction I = &II++;		Instruction I = &II++;
// No need to check for instruction's operands are loop invariant.		// No need to check for instruction's operands are loop invariant.
assert(L.hasLoopInvariantOperands(I) &&		assert(L.hasLoopInvariantOperands(I) &&
"Insts in a loop's preheader should have loop invariant operands!");		"Insts in a loop's preheader should have loop invariant operands!");
if (!canSinkOrHoistInst(*I, &AA, &DT, &L, CurAST, MSSAU.get(), false,		if (!canSinkOrHoistInst(*I, &AA, &DT, &L, CurAST, MSSAU.get(), false,
LICMFlags.get()))		nullptr, LICMFlags.get()))
continue;		continue;
if (sinkInstruction(L, *I, ColdLoopBBs, LoopBlockNumber, LI, DT, BFI,		if (sinkInstruction(L, *I, ColdLoopBBs, LoopBlockNumber, LI, DT, BFI,
MSSAU.get()))		MSSAU.get()))
Changed = true;		Changed = true;
}		}

if (Changed && SE)		if (Changed && SE)
SE->forgetLoopDispositions(&L);		SE->forgetLoopDispositions(&L);
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/invariant.group.ll

This file was added.

				; RUN: opt -S < %s -passes=licm \| FileCheck %s

				declare i8* @llvm.launder.invariant.group.p0i8(i8* %a)

				; CHECK-LABEL: define{{.*}}@f
				define void @f(i32* %x) {
				; CHECK: entry:
				; CHECK-NOT: {{.*}}:
				; CHECK: load {{.*}} !invariant.group
				entry:
				%x_i8 = bitcast i32* %x to i8*
				%x_i8_inv = call i8* @llvm.launder.invariant.group.p0i8(i8* %x_i8)
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]

				%x_inv = bitcast i8* %x_i8_inv to i32*
				%0 = load i32, i32* %x_inv, !invariant.group !0

				call void @a(i32 %0)
				%inc = add nuw nsw i32 %i, 1
				%exitcond.not = icmp eq i32 %inc, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				declare i1 @p(i32)

				; CHECK-LABEL: define{{.*}}@f2
				; load is not unconditionally run within loop, so do not hoist in order to preserve metadata
				define void @f2(i32* dereferenceable(4) align(4) %x, i1 %b) {
				; CHECK: for.body.load:
				; CHECK-NOT: {{.*}}:
				; CHECK: load {{.*}} !invariant.group
				entry:
				%x_i8 = bitcast i32* %x to i8*
				%x_i8_inv = call i8* @llvm.launder.invariant.group.p0i8(i8* %x_i8)
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%i = phi i32 [ 0, %entry ], [ %inc, %for.body.end ]

				%p = call i1 @p(i32 %i)
				br i1 %p, label %for.body.load, label %for.body.end

				for.body.load:
				%x_inv = bitcast i8* %x_i8_inv to i32*
				%0 = load i32, i32* %x_inv, !invariant.group !0

				call void @a(i32 %0)

				br label %for.body.end

				for.body.end:
				%inc = add nuw nsw i32 %i, 1
				%exitcond.not = icmp eq i32 %inc, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				; CHECK-LABEL: define{{.*}}@g
				define void @g(i32* %x) {
				; CHECK: for.body:
				; CHECK-NOT: {{.*}}:
				; CHECK: load {{.*}} !invariant.group
				entry:
				%x_i8 = bitcast i32* %x to i8*
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]

				%x_i8_inv = call i8* @llvm.launder.invariant.group.p0i8(i8* %x_i8)
				%x_inv = bitcast i8* %x_i8_inv to i32*

				%0 = load i32, i32* %x_inv, !invariant.group !0

				call void @a(i32 %0)
				%inc = add nuw nsw i32 %i, 1
				%exitcond.not = icmp eq i32 %inc, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				declare void @a(i32)

				!0 = !{}

This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Hoist loads with invariant.group metadataAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 345052

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Transforms/Scalar/LICM.cpp

llvm/lib/Transforms/Scalar/LoopSink.cpp

llvm/test/Transforms/LICM/invariant.group.ll

[LICM] Hoist loads with invariant.group metadata
AbandonedPublic