Download Raw Diff

Details

Reviewers

asbirlea
nikic

Commits

rG1e9114984490: Replace the custom linked list in LeaderTableEntry with TinyPtrVector.

Summary

The purpose of the custom linked list was to optimize for the case
of a single-element list. It turns out that TinyPtrVector handles
the same basic scenario even better, reducing the size of
LeaderTableEntry by 33%, and requiring only log2(N) allocations
as the size of the list grows. The only downside is that we have
to store the Value's and BasicBlock's in separate vectors, which
is slightly awkward in a few cases. Fortunately that ends up being
entirely encapsulated inside helper functions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

resistor created this revision.May 8 2022, 10:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2022, 10:38 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

resistor requested review of this revision.May 8 2022, 10:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2022, 10:38 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

resistor added a reviewer: asbirlea.May 8 2022, 10:40 PM

Harbormaster completed remote builds in B163416: Diff 427978.May 9 2022, 1:12 AM

It looks like this patch causes a crash when running llvm-test-suite: http://llvm-compile-time-tracker.com/show_error.php?commit=26fd1e7f5ce498fa4d9c28a8dd3e7235466fc03a

nikic added inline comments.May 10 2022, 1:41 AM

llvm/include/llvm/Transforms/Scalar/GVN.h
294	This looks incorrect in multiple ways. If we're at the first instruction, this will step before the begin iterator. And the end iterator is invalidated in either case. I think you need something like this: VI = entry.Val.erase(VI); BI = entry.BB.erase(BI); VE = entry.Val.end(); continue;
llvm/lib/Transforms/Scalar/GVN.cpp
2113	Can be just `return is_contained(entry.BB, BB)`?

nikic added a reviewer: nikic.May 10 2022, 1:42 AM

Address review feedback.

llvm/include/llvm/Transforms/Scalar/GVN.h
294	Doh!
llvm/lib/Transforms/Scalar/GVN.cpp
2113	Done

Harbormaster completed remote builds in B163715: Diff 428404.May 10 2022, 9:57 AM

This still crashes when building llvm-test-suite: http://llvm-compile-time-tracker.com/show_error.php?commit=77bc69cedc32296847573390cf393827b6196d1f There must be something else wrong...

nikic added inline comments.May 11 2022, 8:26 AM

llvm/lib/Transforms/Scalar/GVN.cpp
2109	Hm, I think this is inverted. The old code is a bit hard to understand, but what it's checking is that all BBs are BB, not that any is. So this should be: return all_of(entry.BB, [BB](BasicBlock *EntryBB) { return EntryBB == BB; });

nikic added inline comments.May 11 2022, 8:28 AM

llvm/lib/Transforms/Scalar/GVN.cpp
2109	You had that right in your initial version any my is_contained suggestion was wrong :)

Fix condition.wq

Fix accidentally walking off of the end of the BB vector.

Format

Harbormaster completed remote builds in B163959: Diff 428741.May 11 2022, 1:18 PM

It looks like this change ends up being a slightly negative in terms of instruction count: http://llvm-compile-time-tracker.com/compare.php?from=bf1b81d076f89bd56e86189b013f27dcf4d73ae8&to=efa882a79c027d27f6deff14e75bd9f558dd95d0&stat=instructions

In D125205#3508443, @nikic wrote:

It looks like this change ends up being a slightly negative in terms of instruction count: http://llvm-compile-time-tracker.com/compare.php?from=bf1b81d076f89bd56e86189b013f27dcf4d73ae8&to=efa882a79c027d27f6deff14e75bd9f558dd95d0&stat=instructions

As far as I can tell, the only behavioral difference with the new version is the order of leaders in the table, which biases the candidates returned by findLeader. The hand-rolled linked listed used an unprincipled ordering: the first candidate found was always at the front of the list, but after candidates were in reverse order of discovery. The new version is currently storing them strictly in order of recovery. I can try doing a backwards search in findLeader, which would be more similar to the the old approach on average. Really, we should probably have a more principled heuristic for choosing amongst the leaders, but that seems out of scope for this change.

Replicate the leader table insertion ordering of the old linked list.

Harbormaster completed remote builds in B164229: Diff 429120.May 12 2022, 8:37 PM

Ping

Changes LGTM with inline nits.
@nikic I have not verified compile time impact with the latest diff.

llvm/lib/Transforms/Scalar/GVN.cpp
2108–2109	s/entry/Entry
2131–2133	s/entry/Entry
2228–2230	s/entry/Entry
3040–3043	s/entry/Entry

This revision is now accepted and ready to land.May 24 2022, 1:49 PM

Rebase and address lints

resistor marked 4 inline comments as done.May 25 2022, 11:52 PM

This revision was landed with ongoing or failed builds.May 25 2022, 11:52 PM

Closed by commit rG1e9114984490: Replace the custom linked list in LeaderTableEntry with TinyPtrVector. (authored by resistor). · Explain Why

This revision was automatically updated to reflect the committed changes.

resistor added a commit: rG1e9114984490: Replace the custom linked list in LeaderTableEntry with TinyPtrVector..

Harbormaster completed remote builds in B166420: Diff 432201.May 26 2022, 12:35 AM

Compile-time still looks the same: https://llvm-compile-time-tracker.com/compare.php?from=c2eccc67ce07e9cb374eb0ecdb3038fcb8be08cd&to=1e9114984490b83d4665f12a11f84c83f50ca8f0&stat=instructions That is, mildly negative.

I think that this analysis isn't really correct:

It turns out that TinyPtrVector handles the same basic scenario even better, reducing the size of LeaderTableEntry by 33%, and requiring only log2(N) allocations as the size of the list grows.

For all practical purposes, the number of allocations in the previous implementation was zero, because a bump pointer allocator was used, which is essentially free. In the new scheme, we get real (global allocator) allocations whenever we have more than one entry.

It is true that for a single entry, the size is reduced by 33%. However, if we go to two elements, then we'll end up separately allocating two SmallVectors with 4 elements each, which means we use a total of 14 pointer-sized values instead of 6 (not counting any additional overhead the global allocator adds). So for 2 entries, we actually use more than twice as much memory. This gets better for 3 and 4 entries, and then again much worse at 5 entries, where we leave behind unused inline SmallVector storage.

I don't know what the actual distribution of the number of entries is, but it's not really clear whether the new implementation uses less memory or is more performant in practice.

I assume that there was some kind of motivation for making this change and you observed better resource utilization for some workload?

resistor added a reverting change: rG939a43461ba3: Revert "Replace the custom linked list in LeaderTableEntry with TinyPtrVector.".May 26 2022, 9:51 AM

Diff 428723

llvm/include/llvm/Transforms/Scalar/GVN.h

Show All 13 Lines

#ifndef LLVM_TRANSFORMS_SCALAR_GVN_H		#ifndef LLVM_TRANSFORMS_SCALAR_GVN_H
#define LLVM_TRANSFORMS_SCALAR_GVN_H		#define LLVM_TRANSFORMS_SCALAR_GVN_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include "llvm/ADT/TinyPtrVector.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	private:
LoopInfo *LI = nullptr;		LoopInfo *LI = nullptr;
MemorySSAUpdater *MSSAU = nullptr;		MemorySSAUpdater *MSSAU = nullptr;

ValueTable VN;		ValueTable VN;

/// A mapping from value numbers to lists of Value*'s that		/// A mapping from value numbers to lists of Value*'s that
/// have that value number. Use findLeader to query it.		/// have that value number. Use findLeader to query it.
struct LeaderTableEntry {		struct LeaderTableEntry {
Value *Val;		TinyPtrVector<Value *> Val;
const BasicBlock *BB;		TinyPtrVector<const BasicBlock *> BB;
LeaderTableEntry *Next;
};		};
DenseMap<uint32_t, LeaderTableEntry> LeaderTable;		DenseMap<uint32_t, LeaderTableEntry> LeaderTable;
BumpPtrAllocator TableAllocator;

// Block-local map of equivalent values to their leader, does not		// Block-local map of equivalent values to their leader, does not
// propagate to any successors. Entries added mid-block are applied		// propagate to any successors. Entries added mid-block are applied
// to the remaining instructions in the block.		// to the remaining instructions in the block.
SmallMapVector<Value , Value , 4> ReplaceOperandsWithMap;		SmallMapVector<Value , Value , 4> ReplaceOperandsWithMap;
SmallVector<Instruction *, 8> InstrsToErase;		SmallVector<Instruction *, 8> InstrsToErase;

// Map the block to reversed postorder traversal number. It is used to		// Map the block to reversed postorder traversal number. It is used to
Show All 12 Lines	private:
bool runImpl(Function &F, AssumptionCache &RunAC, DominatorTree &RunDT,		bool runImpl(Function &F, AssumptionCache &RunAC, DominatorTree &RunDT,
const TargetLibraryInfo &RunTLI, AAResults &RunAA,		const TargetLibraryInfo &RunTLI, AAResults &RunAA,
MemoryDependenceResults RunMD, LoopInfo LI,		MemoryDependenceResults RunMD, LoopInfo LI,
OptimizationRemarkEmitter ORE, MemorySSA MSSA = nullptr);		OptimizationRemarkEmitter ORE, MemorySSA MSSA = nullptr);

/// Push a new Value to the LeaderTable onto the list for its value number.		/// Push a new Value to the LeaderTable onto the list for its value number.
void addToLeaderTable(uint32_t N, Value V, const BasicBlock BB) {		void addToLeaderTable(uint32_t N, Value V, const BasicBlock BB) {
LeaderTableEntry &Curr = LeaderTable[N];		LeaderTableEntry &Curr = LeaderTable[N];
if (!Curr.Val) {		Curr.Val.push_back(V);
Curr.Val = V;		Curr.BB.push_back(BB);
Curr.BB = BB;
return;
}

LeaderTableEntry *Node = TableAllocator.Allocate<LeaderTableEntry>();
Node->Val = V;
Node->BB = BB;
Node->Next = Curr.Next;
Curr.Next = Node;
}		}

/// Scan the list of values corresponding to a given		/// Scan the list of values corresponding to a given
/// value number, and remove the given instruction if encountered.		/// value number, and remove the given instruction if encountered.
void removeFromLeaderTable(uint32_t N, Instruction I, BasicBlock BB) {		void removeFromLeaderTable(uint32_t N, Instruction I, BasicBlock BB) {
LeaderTableEntry *Prev = nullptr;		LeaderTableEntry &entry = LeaderTable[N];
LeaderTableEntry *Curr = &LeaderTable[N];		auto VI = entry.Val.begin();
		auto VE = entry.Val.end();
while (Curr && (Curr->Val != I \|\| Curr->BB != BB)) {		auto BI = entry.BB.end();
Prev = Curr;		while (VI != VE) {
Curr = Curr->Next;		if (VI == I && BI == BB) {
}		VI = entry.Val.erase(VI);
		BI = entry.BB.erase(BI);
if (!Curr)		VE = entry.Val.end();
return;

if (Prev) {
Prev->Next = Curr->Next;
} else {
if (!Curr->Next) {
Curr->Val = nullptr;
Curr->BB = nullptr;
} else {		} else {
LeaderTableEntry *Next = Curr->Next;		++VI;
Curr->Val = Next->Val;		++BI;
Curr->BB = Next->BB;
Curr->Next = Next->Next;
}		}
}		}
}		}

// List of critical edges to be split between iterations.		// List of critical edges to be split between iterations.
SmallVector<std::pair<Instruction *, unsigned>, 4> toSplit;		SmallVector<std::pair<Instruction *, unsigned>, 4> toSplit;

// Helper functions of redundant load elimination		// Helper functions of redundant load elimination
		nikicUnsubmitted Done Reply Inline Actions This looks incorrect in multiple ways. If we're at the first instruction, this will step before the begin iterator. And the end iterator is invalidated in either case. I think you need something like this: VI = entry.Val.erase(VI); BI = entry.BB.erase(BI); VE = entry.Val.end(); continue; nikic: This looks incorrect in multiple ways. If we're at the first instruction, this will step before…
		resistorAuthorUnsubmitted Done Reply Inline Actions Doh! resistor: Doh!
bool processLoad(LoadInst *L);		bool processLoad(LoadInst *L);
bool processNonLocalLoad(LoadInst *L);		bool processNonLocalLoad(LoadInst *L);
bool processAssumeIntrinsic(AssumeInst *II);		bool processAssumeIntrinsic(AssumeInst *II);

/// Given a local dependency (Def or Clobber) determine if a value is		/// Given a local dependency (Def or Clobber) determine if a value is
/// available for the load. Returns true if an value is known to be		/// available for the load. Returns true if an value is known to be
/// available and populates Res. Returns false otherwise.		/// available and populates Res. Returns false otherwise.
bool AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,		bool AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 2,099 Lines • ▼ Show 20 Lines	GVNPass::ValueTable::assignExpNewValueNum(Expression &Exp) {
}		}
return {e, CreateNewValNum};		return {e, CreateNewValNum};
}		}

/// Return whether all the values related with the same \p num are		/// Return whether all the values related with the same \p num are
/// defined in \p BB.		/// defined in \p BB.
bool GVNPass::ValueTable::areAllValsInBB(uint32_t Num, const BasicBlock *BB,		bool GVNPass::ValueTable::areAllValsInBB(uint32_t Num, const BasicBlock *BB,
GVNPass &Gvn) {		GVNPass &Gvn) {
LeaderTableEntry *Vals = &Gvn.LeaderTable[Num];		const LeaderTableEntry &entry = Gvn.LeaderTable[Num];
while (Vals && Vals->BB == BB)		return all_of(entry.BB, [BB](const BasicBlock *EntryBB) { return EntryBB == BB; });
		nikicUnsubmitted Not Done Reply Inline Actions Hm, I think this is inverted. The old code is a bit hard to understand, but what it's checking is that all BBs are BB, not that any is. So this should be: return all_of(entry.BB, [BB](BasicBlock EntryBB) { return EntryBB == BB; }); nikic:* Hm, I think this is inverted. The old code is a bit hard to understand, but what it's checking…
		nikicUnsubmitted Not Done Reply Inline Actions You had that right in your initial version any my is_contained suggestion was wrong :) nikic: You had that right in your initial version any my is_contained suggestion was wrong :)
		asbirleaUnsubmitted Done Reply Inline Actions s/entry/Entry asbirlea: s/entry/Entry
Vals = Vals->Next;
return !Vals;
}		}

/// Wrap phiTranslateImpl to provide caching functionality.		/// Wrap phiTranslateImpl to provide caching functionality.
uint32_t GVNPass::ValueTable::phiTranslate(const BasicBlock *Pred,		uint32_t GVNPass::ValueTable::phiTranslate(const BasicBlock *Pred,
		nikicUnsubmitted Done Reply Inline Actions Can be just `return is_contained(entry.BB, BB)`? nikic: Can be just `return is_contained(entry.BB, BB)`?
		resistorAuthorUnsubmitted Done Reply Inline Actions Done resistor: Done
const BasicBlock *PhiBlock,		const BasicBlock *PhiBlock,
uint32_t Num, GVNPass &Gvn) {		uint32_t Num, GVNPass &Gvn) {
auto FindRes = PhiTranslateTable.find({Num, Pred});		auto FindRes = PhiTranslateTable.find({Num, Pred});
if (FindRes != PhiTranslateTable.end())		if (FindRes != PhiTranslateTable.end())
return FindRes->second;		return FindRes->second;
uint32_t NewNum = phiTranslateImpl(Pred, PhiBlock, Num, Gvn);		uint32_t NewNum = phiTranslateImpl(Pred, PhiBlock, Num, Gvn);
PhiTranslateTable.insert({{Num, Pred}, NewNum});		PhiTranslateTable.insert({{Num, Pred}, NewNum});
return NewNum;		return NewNum;
}		}

// Return true if the value number \p Num and NewNum have equal value.		// Return true if the value number \p Num and NewNum have equal value.
// Return false if the result is unknown.		// Return false if the result is unknown.
bool GVNPass::ValueTable::areCallValsEqual(uint32_t Num, uint32_t NewNum,		bool GVNPass::ValueTable::areCallValsEqual(uint32_t Num, uint32_t NewNum,
const BasicBlock *Pred,		const BasicBlock *Pred,
const BasicBlock *PhiBlock,		const BasicBlock *PhiBlock,
GVNPass &Gvn) {		GVNPass &Gvn) {
CallInst *Call = nullptr;		CallInst *Call = nullptr;
LeaderTableEntry *Vals = &Gvn.LeaderTable[Num];		const LeaderTableEntry &entry = Gvn.LeaderTable[Num];
while (Vals) {		for (Value *Val : entry.Val) {
Call = dyn_cast<CallInst>(Vals->Val);		Call = dyn_cast<CallInst>(Val);
		asbirleaUnsubmitted Done Reply Inline Actions s/entry/Entry asbirlea: s/entry/Entry
if (Call && Call->getParent() == PhiBlock)		if (Call && Call->getParent() == PhiBlock)
break;		break;
Vals = Vals->Next;
}		}

if (AA->doesNotAccessMemory(Call))		if (AA->doesNotAccessMemory(Call))
return true;		return true;

if (!MD \|\| !AA->onlyReadsMemory(Call))		if (!MD \|\| !AA->onlyReadsMemory(Call))
return false;		return false;

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
}		}

// In order to find a leader for a given value number at a		// In order to find a leader for a given value number at a
// specific basic block, we first obtain the list of all Values for that number,		// specific basic block, we first obtain the list of all Values for that number,
// and then scan the list to find one whose block dominates the block in		// and then scan the list to find one whose block dominates the block in
// question. This is fast because dominator tree queries consist of only		// question. This is fast because dominator tree queries consist of only
// a few comparisons of DFS numbers.		// a few comparisons of DFS numbers.
Value GVNPass::findLeader(const BasicBlock BB, uint32_t num) {		Value GVNPass::findLeader(const BasicBlock BB, uint32_t num) {
LeaderTableEntry Vals = LeaderTable[num];		const LeaderTableEntry &entry = LeaderTable[num];
if (!Vals.Val) return nullptr;		if (entry.Val.empty())
		return nullptr;
		asbirleaUnsubmitted Done Reply Inline Actions s/entry/Entry asbirlea: s/entry/Entry

Value *Val = nullptr;		Value *Val = nullptr;
if (DT->dominates(Vals.BB, BB)) {		for (size_t i = 0, e = entry.Val.size(); i != e; ++i) {
Val = Vals.Val;		if (DT->dominates(entry.BB[i], BB)) {
if (isa<Constant>(Val)) return Val;		if (isa<Constant>(entry.Val[i]))
		return entry.Val[i];
		if (!Val)
		Val = entry.Val[i];
}		}

LeaderTableEntry* Next = Vals.Next;
while (Next) {
if (DT->dominates(Next->BB, BB)) {
if (isa<Constant>(Next->Val)) return Next->Val;
if (!Val) Val = Next->Val;
}

Next = Next->Next;
}		}

return Val;		return Val;
}		}

/// There is an edge from 'Src' to 'Dst'. Return		/// There is an edge from 'Src' to 'Dst'. Return
/// true if every path from the entry block to 'Dst' passes via this edge. In		/// true if every path from the entry block to 'Dst' passes via this edge. In
/// particular 'Dst' must not be reachable via another edge from 'Src'.		/// particular 'Dst' must not be reachable via another edge from 'Src'.
▲ Show 20 Lines • Show All 772 Lines • ▼ Show 20 Lines	bool GVNPass::iterateOnFunction(Function &F) {

return Changed;		return Changed;
}		}

void GVNPass::cleanupGlobalSets() {		void GVNPass::cleanupGlobalSets() {
VN.clear();		VN.clear();
LeaderTable.clear();		LeaderTable.clear();
BlockRPONumber.clear();		BlockRPONumber.clear();
TableAllocator.Reset();
ICF->clear();		ICF->clear();
InvalidBlockRPONumbers = true;		InvalidBlockRPONumbers = true;
}		}

/// Verify that the specified instruction does not occur in our		/// Verify that the specified instruction does not occur in our
/// internal data structures.		/// internal data structures.
void GVNPass::verifyRemoved(const Instruction *Inst) const {		void GVNPass::verifyRemoved(const Instruction *Inst) const {
VN.verifyRemoved(Inst);		VN.verifyRemoved(Inst);

// Walk through the value number scope to make sure the instruction isn't		// Walk through the value number scope to make sure the instruction isn't
// ferreted away in it.		// ferreted away in it.
for (const auto &I : LeaderTable) {		for (const auto &I : LeaderTable) {
const LeaderTableEntry *Node = &I.second;		const LeaderTableEntry &entry = I.second;
assert(Node->Val != Inst && "Inst still in value numbering scope!");		for (Value *Val : entry.Val) {
		(void)Val;
while (Node->Next) {		assert(Val != Inst && "Inst still in value numbering scope!");
		asbirleaUnsubmitted Done Reply Inline Actions s/entry/Entry asbirlea: s/entry/Entry
Node = Node->Next;
assert(Node->Val != Inst && "Inst still in value numbering scope!");
}		}
}		}
}		}

/// BB is declared dead, which implied other blocks become dead as well. This		/// BB is declared dead, which implied other blocks become dead as well. This
/// function is to add all these blocks to "DeadBlocks". For the dead blocks'		/// function is to add all these blocks to "DeadBlocks". For the dead blocks'
/// live successors, update their phi nodes by replacing the operands		/// live successors, update their phi nodes by replacing the operands
/// corresponding to dead blocks with UndefVal.		/// corresponding to dead blocks with UndefVal.
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Replace the custom linked list in LeaderTableEntry with TinyPtrVector.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428723

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Transforms/Scalar/GVN.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Replace the custom linked list in LeaderTableEntry with TinyPtrVector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428723

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Transforms/Scalar/GVN.cpp

Replace the custom linked list in LeaderTableEntry with TinyPtrVector.
ClosedPublic