This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Analysis/
-
lib/
-
Analysis/
1/3
MemoryDependenceAnalysis.cpp

Differential D84609

[MemDepAnalysis] Cut-off threshold reshuffling
AbandonedPublic

Authored by lebedev.ri on Jul 26 2020, 3:31 PM.

Download Raw Diff

Details

Reviewers

nikic
fhahn
void
joerg
hfinkel
asbirlea
ebrevnov
jdoerfert

Summary

As we have seen/established in D84108, the cut-off thresholds for Memory Dependence Analysis may need tuning.
In particular, currently they are:

scan up to 100 instructions per block. IMO this may be a bit low. (rL179713, changed from 500)
Scan up to 1000 (*sic*) blocks. That seems to be somewhat high.. (D16123)
There is no cumulative limit on scanned instructions, so one query can end up scanning up to 100 instructions/block * 1k blocks = 100k instructions. Just as @nikic noted in https://reviews.llvm.org/D84108#2174113, that is indeed kinda just insane.

I've collected some numbers (with limits originally unset, on test-suite + RawSpeed + darktable),
what is the maximal value of a statistic for a single query per-TU:

Number of instructions scanned in block

  50%     75%     80%     90%     91%     92%     93%     94%     95%     96%     97%     98%     99%   99.5%   99.9%  99.95%    100%
30.00   65.00  114.00  176.30  209.00  246.24  293.21  358.18  388.60  430.00  480.00  593.00  920.00 2077.23 6087.00 6222.00 6222.00

Proposed action: no change, keep it at 100, which is ~80'th percentile - the cut-off is actively helpful.

Number of blocks scanned

 50%    75%    80%    90%    91%    92%    93%    94%    95%    96%    97%    98%    99%  99.5%  99.9% 99.95%   100%
10.0   39.0   55.0   61.0   67.0   77.0   88.0   94.0  104.0  125.0  145.0  179.0  278.0  355.5  670.5 1045.4 1908.0

The current limit is 1000, which is 99.95'th percentile,
i.e. we aren't really enforcing anything with it.
I propose to lower it to 99th percentile (250).

Instructions scanned total in a single query

 50%     75%     80%     90%     91%     92%     93%     94%     95%     96%     97%     98%     99%   99.5%   99.9%  99.95%    100%
66.0   318.0   336.0   463.0   498.0   518.0   571.0   683.0   791.0   986.0  1321.0  2241.0  3872.0  4847.5  8744.3  8828.0 16250.0

I propose to go with 96'th-97'th percentile, and establish a new limit of 1000,
which is a reduction of 100 times from the current implicit limit,
and 25 times less than the new implied limit.

So IOW, we will now be willing to process 4x less blocks per query,
and 100x less instructions per query total.

This does help somewhat with the D84108's lencod context_ini compile-time regression.

That being said, these are rather arbitrary cut-off's, that happen to seem sane
on the code bases viewed, so inevitably there's some code somewhere that doesn't
fit within them, and it may therefore be penalized by lower thresholds.
But compile-time budget is finite, so we have to make some decisions, sometimes.

llvm-compile-time-tracker is unsurprisingly happy about this
http://llvm-compile-time-tracker.com/compare.php?from=b1210c059d1ef084ecd275ed0ffb8343ac3cdfad&to=e3c7a0b96f8800ab21e2d5a8be414e5f83fa1aed&stat=instructions
Geomeans: -O3 -0.17%, ReleaseLTO -0.5%..-1.0%,

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	40 ms	linux > LLVM.Transforms/DeadStoreElimination::inst-limits.ll
	190 ms	linux > Polly.ScopInfo::memcpy-raw-source.ll
	80 ms	windows > LLVM.Transforms/DeadStoreElimination::inst-limits.ll

Event Timeline

lebedev.ri created this revision.Jul 26 2020, 3:31 PM

Herald added subscribers: bmahjour, hiraditya. · View Herald TranscriptJul 26 2020, 3:31 PM

lebedev.ri mentioned this in D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline.Jul 26 2020, 3:31 PM

lebedev.ri edited the summary of this revision. (Show Details)Jul 26 2020, 3:33 PM

Herald added a subscriber: dexonsmith. · View Herald TranscriptJul 26 2020, 3:33 PM

Harbormaster failed remote builds in B65740: Diff 280757!Jul 26 2020, 4:01 PM

lebedev.ri edited the summary of this revision. (Show Details)Jul 27 2020, 12:20 AM

lebedev.ri edited the summary of this revision. (Show Details)

I would suggest to separate an introduction of additional limit for a total number of scanned instructions and reduction of the current limit for number of scanned blocks to a different change sets. Also besides compile time numbers we should evaluate overall impact on performance before making such changes.

Proposed action: no change, keep it at 100, which is ~80'th percentile - the cut-off is actively helpful.

80'th percentile doesn't look great to me. Once we introduce a total limit for scanned instructions I would suggest increasing it to at least 90't percentile. What do you think?

llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
457 ↗	(On Diff #280757)	I think we better avoid adding one more limit here since it complicates the API . It is enough to provide one limit to achieve the desired behavior. In particular a caller should specify a minimum of local and global limits.
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
217–218	I would suggest changing order of decrement and the check (as done in getSimplePointerDependencyFrom) to get more consistent behavior in case of zero default limit.
390	I guess this should be retrieving total instruction limit.
396	Minor note. It's recommended to avoid default capture modes. Would be better if you capture LocalLimit explicitly. Feel free to keep as is though.

Split patch in two, keep only the threshold changes here.

lebedev.ri added a parent revision: D84742: [NFCI]MemDepAnalysis] Introduce global limit on a number of instructions to be traversed during single query.Jul 28 2020, 3:37 AM

Harbormaster failed remote builds in B65989: Diff 281175!Jul 28 2020, 4:45 AM

bmahjour removed a subscriber: bmahjour.Jul 28 2020, 6:44 AM

Some general comments:

The MemDepAnalysis has been known to be problematic for compile-time, so reducing the 100k implicit threshold seems reasonable.
It's natural the compiler tracker will be happy, but can we consider runtime implications due to potential missed optimizations?
Could we evaluate which optimizations are missed, in which passes using MemDepAnalysis along with their run-time impact? (in particular for the benchmarks where we see the large compile-time benefits)

AFAIK, the main passes using MemDepAnalysis are DSE, MemCpyOpt and GVN and there is active work in porting these to MemorySSA. The same analysis of compile-time vs run-time benefits is needed for that switch, so having data from reducing thresholds here will be very valuable for the short term (current patch) on deciding how much to reduce it to, and for the long-term switch to MemorySSA.

lebedev.ri marked an inline comment as done.Jul 28 2020, 1:00 PM

lebedev.ri added inline comments.

llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
457 ↗	(On Diff #280757)	So you're suggesting that the caller should also take care of updating the global limit by how much the local limit got consumed?

In D84609#2179323, @asbirlea wrote:

Some general comments:

The MemDepAnalysis has been known to be problematic for compile-time, so reducing the 100k implicit threshold seems reasonable.

It's natural the compiler tracker will be happy, but can we consider runtime implications due to potential missed optimizations?

Could we evaluate which optimizations are missed, in which passes using MemDepAnalysis along with their run-time impact? (in particular for the benchmarks where we see the large compile-time benefits)

AFAIK, the main passes using MemDepAnalysis are DSE, MemCpyOpt and GVN and there is active work in porting these to MemorySSA. The same analysis of compile-time vs run-time benefits is needed for that switch, so having data from reducing thresholds here will be very valuable for the short term (current patch) on deciding how much to reduce it to, and for the long-term switch to MemorySSA.

Background: the motivation for this patch is the fact that D84108 significantly
regresses compile-time of lencod, especially context_ini.c (+79.97%).
It has been traced to GVN spending most of the time in GVN::processNonLocalLoad(),
in MemoryDependenceResults::getNonLocalPointerDependency().

Since a compile-, and run- time performance assessment will be needed both here,
and in MemorySSA switch, would it be more productive to directly proceed to the latter?
Without having much(any) prior expirience with MemSSA, should it be too complicated ?
How long-term is the switch?

In D84609#2179889, @lebedev.ri wrote:

In D84609#2179323, @asbirlea wrote:

Some general comments:

The MemDepAnalysis has been known to be problematic for compile-time, so reducing the 100k implicit threshold seems reasonable.

It's natural the compiler tracker will be happy, but can we consider runtime implications due to potential missed optimizations?

Could we evaluate which optimizations are missed, in which passes using MemDepAnalysis along with their run-time impact? (in particular for the benchmarks where we see the large compile-time benefits)

AFAIK, the main passes using MemDepAnalysis are DSE, MemCpyOpt and GVN and there is active work in porting these to MemorySSA. The same analysis of compile-time vs run-time benefits is needed for that switch, so having data from reducing thresholds here will be very valuable for the short term (current patch) on deciding how much to reduce it to, and for the long-term switch to MemorySSA.

Background: the motivation for this patch is the fact that D84108 significantly
regresses compile-time of lencod, especially context_ini.c (+79.97%).
It has been traced to GVN spending most of the time in GVN::processNonLocalLoad(),
in MemoryDependenceResults::getNonLocalPointerDependency().

Since a compile-, and run- time performance assessment will be needed both here,
and in MemorySSA switch, would it be more productive to directly proceed to the latter?

I think the two are orthogonal. The current GVN will need replacing with NewGVN which has been bitrotting for a few years now. It's a pass taking a different work-flow, so I don't know if there will be a call-path matching the GVN::processNonLocalLoad() call for example.
It will be very important for the subsequent switch to have the data point of "GVN performs this processing which has relevant run-time impact, regardless of the compile-time spent, hence NewGVN needs to match this" vs "GVN spends a lot of time compiling this without much or any run-time benefit, hence NewGVN need not match it".
I understand it's time-consuming to have this analysis now, but the problem is more complex than the lowering of a cap, and it seems important for future progress to distinguish between such cases, and document the results accordingly (in MemDepAnalysis, but more importantly in GVN).

Without having much(any) prior expirience with MemSSA, should it be too complicated ?
How long-term is the switch?

I don't have a set timeline for this unfortunately, due to other priorities; order of months at this point.

In D84609#2180117, @asbirlea wrote:

In D84609#2179889, @lebedev.ri wrote:

In D84609#2179323, @asbirlea wrote:

Some general comments:

The MemDepAnalysis has been known to be problematic for compile-time, so reducing the 100k implicit threshold seems reasonable.

It's natural the compiler tracker will be happy, but can we consider runtime implications due to potential missed optimizations?

Could we evaluate which optimizations are missed, in which passes using MemDepAnalysis along with their run-time impact? (in particular for the benchmarks where we see the large compile-time benefits)

AFAIK, the main passes using MemDepAnalysis are DSE, MemCpyOpt and GVN and there is active work in porting these to MemorySSA. The same analysis of compile-time vs run-time benefits is needed for that switch, so having data from reducing thresholds here will be very valuable for the short term (current patch) on deciding how much to reduce it to, and for the long-term switch to MemorySSA.

Background: the motivation for this patch is the fact that D84108 significantly
regresses compile-time of lencod, especially context_ini.c (+79.97%).
It has been traced to GVN spending most of the time in GVN::processNonLocalLoad(),
in MemoryDependenceResults::getNonLocalPointerDependency().

Since a compile-, and run- time performance assessment will be needed both here,
and in MemorySSA switch, would it be more productive to directly proceed to the latter?

I think the two are orthogonal. The current GVN will need replacing with NewGVN which has been bitrotting for a few years now. It's a pass taking a different work-flow, so I don't know if there will be a call-path matching the GVN::processNonLocalLoad() call for example.
It will be very important for the subsequent switch to have the data point of "GVN performs this processing which has relevant run-time impact, regardless of the compile-time spent, hence NewGVN needs to match this" vs "GVN spends a lot of time compiling this without much or any run-time benefit, hence NewGVN need not match it".
I understand it's time-consuming to have this analysis now, but the problem is more complex than the lowering of a cap, and it seems important for future progress to distinguish between such cases, and document the results accordingly (in MemDepAnalysis, but more importantly in GVN).

FWIW I think current GVN does too many things not directly related to value numbering, which makes the comparison with NewGVN a bit harder. I think it might make sense to de-couple some of the memory optimizations that do not really interact with the value number from GVN.

In D84609#2180216, @fhahn wrote:

In D84609#2180117, @asbirlea wrote:

In D84609#2179889, @lebedev.ri wrote:

In D84609#2179323, @asbirlea wrote:

Some general comments:

The MemDepAnalysis has been known to be problematic for compile-time, so reducing the 100k implicit threshold seems reasonable.

It's natural the compiler tracker will be happy, but can we consider runtime implications due to potential missed optimizations?

Could we evaluate which optimizations are missed, in which passes using MemDepAnalysis along with their run-time impact? (in particular for the benchmarks where we see the large compile-time benefits)

AFAIK, the main passes using MemDepAnalysis are DSE, MemCpyOpt and GVN and there is active work in porting these to MemorySSA. The same analysis of compile-time vs run-time benefits is needed for that switch, so having data from reducing thresholds here will be very valuable for the short term (current patch) on deciding how much to reduce it to, and for the long-term switch to MemorySSA.

Background: the motivation for this patch is the fact that D84108 significantly
regresses compile-time of lencod, especially context_ini.c (+79.97%).
It has been traced to GVN spending most of the time in GVN::processNonLocalLoad(),
in MemoryDependenceResults::getNonLocalPointerDependency().

Since a compile-, and run- time performance assessment will be needed both here,
and in MemorySSA switch, would it be more productive to directly proceed to the latter?

I think the two are orthogonal. The current GVN will need replacing with NewGVN which has been bitrotting for a few years now. It's a pass taking a different work-flow, so I don't know if there will be a call-path matching the GVN::processNonLocalLoad() call for example.
It will be very important for the subsequent switch to have the data point of "GVN performs this processing which has relevant run-time impact, regardless of the compile-time spent, hence NewGVN needs to match this" vs "GVN spends a lot of time compiling this without much or any run-time benefit, hence NewGVN need not match it".
I understand it's time-consuming to have this analysis now, but the problem is more complex than the lowering of a cap, and it seems important for future progress to distinguish between such cases, and document the results accordingly (in MemDepAnalysis, but more importantly in GVN).

FWIW I think current GVN does too many things not directly related to value numbering, which makes the comparison with NewGVN a bit harder. I think it might make sense to de-couple some of the memory optimizations that do not really interact with the value number from GVN.

I wasn't really asking about NewGVN story, i know it's stagnant somewhat.
I was only asking, would it be better to instead look into porting
GVN::processNonLocalLoad() to be MemorySSA-driven.

In D84609#2180222, @lebedev.ri wrote:

I wasn't really asking about NewGVN story, i know it's stagnant somewhat.
I was only asking, would it be better to instead look into porting
GVN::processNonLocalLoad() to be MemorySSA-driven.

If we're building two analyses (MSSA & MemDepAnalysis) instead of one, I expect we'll see a spike in compile-time. Additionally, MemDepAnalysis is a piece of technical debt that would be nice to replace altogether.
IMO it makes sense to make the transition for the whole pass, or for the pipeline of 3 passes in one go.

ebrevnov added inline comments.Jul 28 2020, 9:58 PM

llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
457 ↗	(On Diff #280757)	That's right.

One more general comment. If we introduce budget for total number of scanned instructions do we still need to limit number of visited blocks?

In D84609#2180272, @asbirlea wrote:

In D84609#2180222, @lebedev.ri wrote:

I wasn't really asking about NewGVN story, i know it's stagnant somewhat.
I was only asking, would it be better to instead look into porting
GVN::processNonLocalLoad() to be MemorySSA-driven.

If we're building two analyses (MSSA & MemDepAnalysis) instead of one, I expect we'll see a spike in compile-time. Additionally, MemDepAnalysis is a piece of technical debt that would be nice to replace altogether.
IMO it makes sense to make the transition for the whole pass, or for the pipeline of 3 passes in one go.

I recently ran some numbers for GVN compile-time impact: http://llvm-compile-time-tracker.com/index.php?branch=nikic/perf/new-gvn-3 From the bottom to the top, the first commit disables LoadPRE, the second disables the entirety of processNonLocalLoad and the last one enables NewGVN (with the corresponding MemorySSA run).

I think the key takeaways here is that the non-local load analysis in GVN is really, really expensive. Per-block MemDep analysis is fairly cheap, but the non-local one is not, and I think GVN is the only MemDep-based pass that uses it.

So, if it is feasible to replace the non-local load analysis (and load PRE) in GVN with something MemorySSA based, I think it's plausible that it will be a compile-time improvement despite the need to construct MemorySSA. And it would become free for a following MemCpyOpt/DSE :)

Of course, this is on the assumption that MemorySSA can actually perform a similar degree of optimization with better compile-time. As these optimizations require walking past MemoryPhis, it's not going to be free, but I would still expect it to be cheaper and have better cutoff control.

lebedev.ri mentioned this in rG1d51dc38d89b: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by….Jul 29 2020, 10:07 AM

In D84609#2182442, @nikic wrote:

In D84609#2180272, @asbirlea wrote:

In D84609#2180222, @lebedev.ri wrote:

I wasn't really asking about NewGVN story, i know it's stagnant somewhat.
I was only asking, would it be better to instead look into porting
GVN::processNonLocalLoad() to be MemorySSA-driven.

If we're building two analyses (MSSA & MemDepAnalysis) instead of one, I expect we'll see a spike in compile-time. Additionally, MemDepAnalysis is a piece of technical debt that would be nice to replace altogether.
IMO it makes sense to make the transition for the whole pass, or for the pipeline of 3 passes in one go.

I recently ran some numbers for GVN compile-time impact: http://llvm-compile-time-tracker.com/index.php?branch=nikic/perf/new-gvn-3 From the bottom to the top, the first commit disables LoadPRE, the second disables the entirety of processNonLocalLoad and the last one enables NewGVN (with the corresponding MemorySSA run).

I think the key takeaways here is that the non-local load analysis in GVN is really, really expensive. Per-block MemDep analysis is fairly cheap, but the non-local one is not, and I think GVN is the only MemDep-based pass that uses it.

So, if it is feasible to replace the non-local load analysis (and load PRE) in GVN with something MemorySSA based, I think it's plausible that it will be a compile-time improvement despite the need to construct MemorySSA. And it would become free for a following MemCpyOpt/DSE :)

Of course, this is on the assumption that MemorySSA can actually perform a similar degree of optimization with better compile-time. As these optimizations require walking past MemoryPhis, it's not going to be free, but I would still expect it to be cheaper and have better cutoff control.

Thank you for this analysis, this is excellent information!
I'll need to look in detail into what processNonLocalLoad does to see if MemorySSA can do better there; noted as the next priority.

As per @asbirlea's comments.

lebedev.ri mentioned this in rGbb7d3af1139c: Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction….Sep 7 2020, 2:24 PM

lebedev.ri abandoned this revision.Jan 17 2022, 2:36 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

MemoryDependenceAnalysis.cpp

8 lines

Diff 281175

llvm/lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
// Limit for the number of instructions to scan in a block.		// Limit for the number of instructions to scan in a block.

static cl::opt<unsigned> BlockScanLimit(		static cl::opt<unsigned> BlockScanLimit(
"memdep-block-scan-limit", cl::Hidden, cl::init(100),		"memdep-block-scan-limit", cl::Hidden, cl::init(100),
cl::desc("The number of instructions to scan in a block in memory "		cl::desc("The number of instructions to scan in a block in memory "
"dependency analysis (default = 100)"));		"dependency analysis (default = 100)"));

static cl::opt<unsigned>		static cl::opt<unsigned>
BlockNumberLimit("memdep-block-number-limit", cl::Hidden, cl::init(1000),		BlockNumberLimit("memdep-block-number-limit", cl::Hidden, cl::init(250),
cl::desc("The number of blocks to scan during memory "		cl::desc("The number of blocks to scan during memory "
"dependency analysis (default = 1000)"));		"dependency analysis (default = 250)"));

// In each of BlockNumberLimit block we are willing to scan up to BlockScanLimit		// In each of BlockNumberLimit block we are willing to scan up to BlockScanLimit
// instructions, but the total count of instructions scanned in all blocks is		// instructions, but the total count of instructions scanned in all blocks is
// at most TotalInstructionCountLimit.		// at most TotalInstructionCountLimit.
// This is related to the NumInstructionsScannedMax statistic.		// This is related to the NumInstructionsScannedMax statistic.
static cl::opt<unsigned> TotalInstructionCountLimit(		static cl::opt<unsigned> TotalInstructionCountLimit(
"memdep-total-instruction-count-limit", cl::Hidden, cl::init(100000),		"memdep-total-instruction-count-limit", cl::Hidden, cl::init(1000),
cl::desc("The number of instructions we are allowed to scan during memory "		cl::desc("The number of instructions we are allowed to scan during memory "
"dependency analysis (default = 100000)"));		"dependency analysis (default = 1000)"));

static cl::opt<unsigned> NumResultsLimit(		static cl::opt<unsigned> NumResultsLimit(
"memdep-num-results-limit", cl::Hidden, cl::init(100),		"memdep-num-results-limit", cl::Hidden, cl::init(100),
cl::desc(		cl::desc(
"Limit on the number of memdep results to process (default = 100)"));		"Limit on the number of memdep results to process (default = 100)"));

/// This is a helper function that removes Val from 'Inst's set in ReverseMap.		/// This is a helper function that removes Val from 'Inst's set in ReverseMap.
///		///
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	MemDepResult MemoryDependenceResults::getCallDependencyFrom(
while (ScanIt != BB->begin()) {		while (ScanIt != BB->begin()) {
Instruction Inst = &--ScanIt;		Instruction Inst = &--ScanIt;
// Debug intrinsics don't cause dependences and should not affect Limit		// Debug intrinsics don't cause dependences and should not affect Limit
if (isa<DbgInfoIntrinsic>(Inst))		if (isa<DbgInfoIntrinsic>(Inst))
continue;		continue;

// Limit the amount of scanning we do so we don't end up with quadratic		// Limit the amount of scanning we do so we don't end up with quadratic
// running time on extreme testcases.		// running time on extreme testcases.
--LocalLimit;		--LocalLimit;
if (!LocalLimit)		if (!LocalLimit)
		ebrevnovUnsubmitted Done Reply Inline Actions I would suggest changing order of decrement and the check (as done in getSimplePointerDependencyFrom) to get more consistent behavior in case of zero default limit. ebrevnov: I would suggest changing order of decrement and the check (as done in…
return MemDepResult::getUnknown();		return MemDepResult::getUnknown();

// If this inst is a memory op, get the pointer it accessed		// If this inst is a memory op, get the pointer it accessed
MemoryLocation Loc;		MemoryLocation Loc;
ModRefInfo MR = GetLocation(Inst, Loc, TLI);		ModRefInfo MR = GetLocation(Inst, Loc, TLI);
if (Loc.Ptr) {		if (Loc.Ptr) {
// A simple instruction.		// A simple instruction.
if (isModOrRefSet(AA.getModRefInfo(Call, Loc)))		if (isModOrRefSet(AA.getModRefInfo(Call, Loc)))
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines

MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(		MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst, unsigned *LocalLimit,		BasicBlock BB, Instruction QueryInst, unsigned *LocalLimit,
unsigned *GlobalLimit) {		unsigned *GlobalLimit) {
bool isInvariantLoad = false;		bool isInvariantLoad = false;

unsigned DefaultLocalLimit = getDefaultBlockScanLimit();		unsigned DefaultLocalLimit = getDefaultBlockScanLimit();
unsigned DefaultGlobalLimit = getDefaultBlockScanLimit();		unsigned DefaultGlobalLimit = getDefaultBlockScanLimit();
		ebrevnovUnsubmitted Not Done Reply Inline Actions I guess this should be retrieving total instruction limit. ebrevnov: I guess this should be retrieving total instruction limit.
if (!LocalLimit)		if (!LocalLimit)
LocalLimit = &DefaultLocalLimit;		LocalLimit = &DefaultLocalLimit;
if (!GlobalLimit)		if (!GlobalLimit)
GlobalLimit = &DefaultGlobalLimit;		GlobalLimit = &DefaultGlobalLimit;

auto _ = make_scope_exit([&, Orig = *LocalLimit]() {		auto _ = make_scope_exit([&, Orig = *LocalLimit]() {
		ebrevnovUnsubmitted Not Done Reply Inline Actions Minor note. It's recommended to avoid default capture modes. Would be better if you capture LocalLimit explicitly. Feel free to keep as is though. ebrevnov: Minor note. It's recommended to avoid default capture modes. Would be better if you capture…
NumBlockInstructionsScannedMax.updateMax(Orig - *LocalLimit);		NumBlockInstructionsScannedMax.updateMax(Orig - *LocalLimit);
});		});

// We must be careful with atomic accesses, as they may allow another thread		// We must be careful with atomic accesses, as they may allow another thread
// to touch this location, clobbering it. We are conservative: if the		// to touch this location, clobbering it. We are conservative: if the
// QueryInst is not a simple (non-atomic) memory access, we automatically		// QueryInst is not a simple (non-atomic) memory access, we automatically
// return getClobber.		// return getClobber.
// If it is simple, we know based on the results of		// If it is simple, we know based on the results of
▲ Show 20 Lines • Show All 1,428 Lines • Show Last 20 Lines