This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2/2
LiveIntervalUnion.h
-
lib/
-
CodeGen/
-
LiveIntervalUnion.cpp
-
LiveRegMatrix.cpp
1/1
RegAllocGreedy.cpp
-
Target/
-
AArch64/
-
AArch64Subtarget.h
-
X86/
-
X86Subtarget.h
-
test/CodeGen/
-
CodeGen/
-
AArch64/
4/10
ragreedy-local-interval-cost.ll
-
X86/
4/4
bug26810.ll
-
greedy_regalloc_bad_eviction_sequence.ll
2/2
i128-mul.ll
-
mmx-arith.ll
-
optimize-max-0.ll

Differential D98232

[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
ClosedPublic

Authored by mtrofin on Mar 8 2021, 8:56 PM.

Download Raw Diff

Details

Reviewers

qcolombet
myatsina
wmi
nikic
sanwou01
dmgreen
xbolva00

Commits

rGce61def529e2: [regalloc] Ensure Query::collectInterferringVregs is called before interval…
rGd40b4911bd9a: [regalloc] Ensure Query::collectInterferringVregs is called before interval…

Summary

The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mtrofin created this revision.Mar 8 2021, 8:56 PM

Herald added subscribers: hiraditya, qcolombet, MatzeB. · View Herald TranscriptMar 8 2021, 8:56 PM

mtrofin requested review of this revision.Mar 8 2021, 8:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2021, 8:56 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mtrofin retitled this revision from WIP - don't submit (yet) to [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration.Mar 8 2021, 9:39 PM

mtrofin edited the summary of this revision. (Show Details)

mtrofin added reviewers: qcolombet, myatsina.

mtrofin edited the summary of this revision. (Show Details)

fixed tests

Herald added a subscriber: pengfei. · View Herald TranscriptMar 8 2021, 9:40 PM

Harbormaster completed remote builds in B92787: Diff 329200.Mar 9 2021, 3:36 AM

Harbormaster completed remote builds in B92795: Diff 329210.Mar 9 2021, 4:40 AM

mtrofin added a reviewer: wmi.Mar 9 2021, 12:10 PM

Gentle reminder - thanks!

dmgreen added a subscriber: dmgreen.Mar 10 2021, 10:52 AM

dmgreen added inline comments.

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	This is quite a large regression. My understanding is that for this test enableAdvancedRASplitCost/consider-local-interval-cost was enabled specifically to prevent this kind of recursive spilling from happening.

Nice catch!

Question below regarding the use of Optional.

llvm/include/llvm/CodeGen/LiveIntervalUnion.h
117	That part of the patch is not strictly needed. I am guessing we want this because that way accessing `InterferingVRegs` without calling `collectInterferingVRegs` first will produce a runtime crash instead of silently checking against something empty. Am I understanding correctly?

This revision is now accepted and ready to land.Mar 10 2021, 10:59 AM

qcolombet added a subscriber: aditya_nandakumar.Mar 10 2021, 11:03 AM

qcolombet added inline comments.

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	The problem is that the previous version was relying on stall information. This is a correctness fix and should get in ASAP IMHO. As far as eviction chains go, we could actually clean them up later in the pipeline. @aditya_nandakumar worked on a prototype offline that solves that as a post RA pass. We're hoping to open source it soon-ish.

dmgreen added inline comments.Mar 10 2021, 11:07 AM

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	If this is a correctness fix, presumably there should be a new test case for what it is fixing?

qcolombet added inline comments.Mar 10 2021, 11:14 AM

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	Good point! From what I understand this is a subtle issue because I don't think we can generate wrong code out of it, it will just do not detect eviction chain properly, or have the compiler spins forever (IIUC what @mtrofin means by "the compiler ices"). I let @mtrofin comment on this.

dmgreen added inline comments.Mar 10 2021, 11:36 AM

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	My understanding is that this test comes from a fairly standard matrix multiply, by the way: https://godbolt.org/z/Ej5sb8 It is a shame to break such an obvious case. I have seen some other cases under Arm where we hit cascading eviction chains like this too - something that consider-local-interval-cost didn't help with. It would be great to see a more reliable fix for that, I look forward to seeing it.

qcolombet added inline comments.Mar 10 2021, 11:54 AM

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	I agree the eviction chains detection is not fool proof. What Aditya (and I to some extend) worked on showed that we should be able to eliminate the need to detect the eviction chain detection all together. The approach we took is just some sort of copy rewriting as described in https://dl.acm.org/doi/10.1145/1811212.1811214, though our implementation is less thorough than this paper :).

mtrofin marked an inline comment as done.Mar 10 2021, 12:03 PM

mtrofin added inline comments.

llvm/include/llvm/CodeGen/LiveIntervalUnion.h
117	That's correct. This patch could land without the Optional, and/or I anyway intend to follow up with a change that avoids this level of API statefulness to avoid others falling into the pitfall.
llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	@qcolombet: by "the compiler ices" I meant that, with the current patch (use of Optional), but without the Q.collectInterferingVRegs() call in the patch, the compiler crashes, supporting the claim that "the API should have been called" (i.e. we were in the past operating on stale information... at least in some cases, because removing the loop right under it leads to a set of tests, disjoint from the ones patched here, that fail) @dmgreen, all - my hope was that someone could help with fixing the regression (I think that was the goal of https://reviews.llvm.org/D35816). I have little context into that. Otherwise it'll take me a bit of time to ramp up on that issue and propose a solution. Question is, are we OK with the bug (==the code sometimes works off stale info) meanwhile?

avoid regression

Harbormaster completed remote builds in B93740: Diff 330556.Mar 14 2021, 11:04 PM

mtrofin added inline comments.Mar 14 2021, 11:37 PM

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	@dmgreen , all - I dug a bit deeper into the regression. PTAL the diff in RegAllocGreedy.cpp, deleted lines at line 1559. The call to checkInterference on line 1555 causes the reset-ing of the earlier-calculated interference collections, rendering the deleted code almost dead. Almost, because the effect of calling query in canEvictInterferenceInRange seems to have an effect - see the i128-mul.ll diff. I'll track that down.

lkail added a subscriber: lkail.Mar 14 2021, 11:58 PM

dmgreen added inline comments.Mar 15 2021, 5:54 AM

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	I've never looked very deeply into the register allocator I'm afraid. It's still a bit of a mystery to me. I ran some benchmarks on the new code though - none of them changed very much which is a good sign.

no regressions

mtrofin marked an inline comment as done.Mar 15 2021, 5:09 PM

mtrofin added inline comments.

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
66	Tracked down the source of the last difference. @dmgreen @qcolombet - PTAL. I will work on simplifying the Query API next, to help avoid the types of pitfalls identified in this patch.

reworded the comment in LiveRangeMatrix.cpp for more clarity (hopefully)

Harbormaster completed remote builds in B93953: Diff 330852.Mar 15 2021, 7:14 PM

Harbormaster completed remote builds in B93951: Diff 330850.

If there's no pushback, given the previous lgtm (and the subsequent removal of regressions), I'll go ahead with submitting this.

I don't feel like I know this code well enough to be confident LGTM'ing it, but the testing I ran still look good. Almost no changes. Thanks for working on the regression!

This revision was landed with ongoing or failed builds.Mar 16 2021, 12:19 PM

Closed by commit rGd40b4911bd9a: [regalloc] Ensure Query::collectInterferringVregs is called before interval… (authored by mtrofin). · Explain Why

This revision was automatically updated to reflect the committed changes.

mtrofin added a commit: rGd40b4911bd9a: [regalloc] Ensure Query::collectInterferringVregs is called before interval….

nikic added a reverting change: rG40bc309911f0: Revert "[regalloc] Ensure Query::collectInterferringVregs is called before….Mar 16 2021, 12:42 PM

I've reverted this change because it causes significant compile-time regressions, e.g. >5% on sqlite: https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions I'm assuming a regression of that size wasn't intentional here.

This revision is now accepted and ready to land.Mar 16 2021, 12:45 PM

In D98232#2630006, @nikic wrote:

I've reverted this change because it causes significant compile-time regressions, e.g. >5% on sqlite: https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions I'm assuming a regression of that size wasn't intentional here.

It may be an effect of the code actually performing correctly (see the CL description). I'll confirm that is the case.

In D98232#2630027, @mtrofin wrote:

In D98232#2630006, @nikic wrote:

I've reverted this change because it causes significant compile-time regressions, e.g. >5% on sqlite: https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions I'm assuming a regression of that size wasn't intentional here.

It may be an effect of the code actually performing correctly (see the CL description). I'll confirm that is the case.

Yup - the increased time in the regalloc is due to the added call to collectInterferringVRegs. A quick verification is to just add the call in the 'old' code, which leads to a similar timing effect.

This is intentional - the fix is a correctness issue (without the fix, the query is actually stale).

@nikic - any pushback to reapplying the patch?

Unfortunately I'm not familiar with RegAlloc, but I somehow doubt that this kind of compile-time hit is intrinsically necessary to address this issue.

This is also missing a test case.

This revision now requires changes to proceed.Mar 16 2021, 2:16 PM

In D98232#2630273, @nikic wrote:

Unfortunately I'm not familiar with RegAlloc, but I somehow doubt that this kind of compile-time hit is intrinsically necessary to address this issue.

The problem is that the previous use of the APIs was incorrect - the code before was sometimes using stale data, because it was incorrectly not calling collectInterferingVRegs. The compile time regression is reflective of the effort required to refresh the data.

I think we have a situation like this:

(status quo) no compile time regression, but incorrect functionality
(patch) compile time regression, correct functionality
remove patch that originally introduced this functionality (https://reviews.llvm.org/D35816), but then there are cases where we hit regressions in code quality
<options that I am not aware of - actionable suggestions very welcome!>

What are the guidelines regarding compile time regressions (e.g. what constitutes acceptable; what is the tradeoff hierarchy, e.g. in this case, buggy code)?

This is also missing a test case.

That's reasonable. I'll craft one up while we explore the options here.

I agree with @mtrofin's assessment, the previous code was incorrect and if we have a compile time hit, at least short term, I think this is the right thing to do.

In D98232#2630290, @mtrofin wrote:

In D98232#2630273, @nikic wrote:

Unfortunately I'm not familiar with RegAlloc, but I somehow doubt that this kind of compile-time hit is intrinsically necessary to address this issue.

The problem is that the previous use of the APIs was incorrect - the code before was sometimes using stale data, because it was incorrectly not calling collectInterferingVRegs. The compile time regression is reflective of the effort required to refresh the data.

I think we have a situation like this:

(status quo) no compile time regression, but incorrect functionality

(patch) compile time regression, correct functionality

remove patch that originally introduced this functionality (https://reviews.llvm.org/D35816), but then there are cases where we hit regressions in code quality

<options that I am not aware of - actionable suggestions very welcome!>

The best option would be to implement this without the compile-time impact or with low impact -- it's probably possible, but may require a larger time investment. If you don't have the resources for that, then please evaluate the third option. On the assumption that disabling this code only makes for a minor code quality regression, that would be preferred over a large compile-time regression. Of course, if disabling this makes for a large code quality regression, this becomes harder to answer.

In D98232#2630344, @nikic wrote:

In D98232#2630290, @mtrofin wrote:

In D98232#2630273, @nikic wrote:

Unfortunately I'm not familiar with RegAlloc, but I somehow doubt that this kind of compile-time hit is intrinsically necessary to address this issue.

The problem is that the previous use of the APIs was incorrect - the code before was sometimes using stale data, because it was incorrectly not calling collectInterferingVRegs. The compile time regression is reflective of the effort required to refresh the data.

I think we have a situation like this:

(status quo) no compile time regression, but incorrect functionality

(patch) compile time regression, correct functionality

remove patch that originally introduced this functionality (https://reviews.llvm.org/D35816), but then there are cases where we hit regressions in code quality

<options that I am not aware of - actionable suggestions very welcome!>

The best option would be to implement this without the compile-time impact or with low impact -- it's probably possible, but may require a larger time investment. If you don't have the resources for that, then please evaluate the third option. On the assumption that disabling this code only makes for a minor code quality regression, that would be preferred over a large compile-time regression. Of course, if disabling this makes for a large code quality regression, this becomes harder to answer.

Looking at the original patch (https://reviews.llvm.org/D35816) it mentions https://bugs.llvm.org/show_bug.cgi?id=26810 which quotes a 25% regression.

@mtrofin I'm not really concerned about regressions on individual test cases, but the average impact. I found https://github.com/llvm/llvm-project/commit/f649f24d388c745d20fab5573d27b822b92818ed with some data from when the option was enabled to AArch64, and it looks like this option has essentially zero average impact and only improves a few rare outliers. This was totally fine at the time, because enabling the option was considered to be essentially free compile-time wise. But now it turns out that this is actually not the case, and the small compile-time impact is actually the result of an implementation bug. The change would have never landed at the time if this fact had been known. A zero geomean improvement on run-time is a terrible trade-off for a 2-3% geomean regression on compile-time.

Unless the situation on X86 is significantly different than for AArch64, it seems pretty clear to me that we should just disable the option until it can be implemented in a way that is both correct and sufficiently cheap.

An update on this: I spent a bit of time looking for ways to elide the compile time penalty, and the short of it is, I couldn't find a way.

Unless someone has a idea how to do that (i.e. elide compile time... etc), I propose we go ahead with a "staged option 3":

step 1: fix the code as per patch, but add a flag that disables, by default, the optimization. This means: no compile time regression; possible corner-case regressions in code quality, but (what should be an) easy way to avoid it for those impacted.

step 2: after <not sure what's reasonable here? 1-2 weeks? 1-2 months? 1 release?) assuming the regressions aren't hurting anyone, we remove the optimization code and flag.

What do folks think?

Thanks!

In D98232#2630459, @nikic wrote:

@mtrofin I'm not really concerned about regressions on individual test cases, but the average impact. I found https://github.com/llvm/llvm-project/commit/f649f24d388c745d20fab5573d27b822b92818ed with some data from when the option was enabled to AArch64, and it looks like this option has essentially zero average impact and only improves a few rare outliers. This was totally fine at the time, because enabling the option was considered to be essentially free compile-time wise. But now it turns out that this is actually not the case, and the small compile-time impact is actually the result of an implementation bug. The change would have never landed at the time if this fact had been known. A zero geomean improvement on run-time is a terrible trade-off for a 2-3% geomean regression on compile-time.

Unless the situation on X86 is significantly different than for AArch64, it seems pretty clear to me that we should just disable the option until it can be implemented in a way that is both correct and sufficiently cheap.

I'd let the others weigh in over whether the corner cases are important to them. My goal here is to improve the evolvability and maintainability of the code; the first patch would canonicalize the uses of query, the next one would avoid needing the multi-steps that were the cause of the bug discussed here. That's a long way of saying "option 3 is actually *very* attractive to me" - but I have no skin in the game.

A potential other option may be to hide the current behavior (as fixed here) behind a flag. Then, if anyone actually has an impactful regression, we can dig deeper, while also giving them a way to unblock (by flipping the flag in their build). Otherwise, after some reasonable time (tbd what that means) we can take silence as indication the regressions don't matter anymore, and do option 3.

I'll also spend a bit more time today to look into whether there's anything that can be salvaged easily (i.e. avoid option 3, and avoid compile time regression).

mtrofin added a reviewer: sanwou01.Mar 22 2021, 8:55 PM

Disabling 'advancedRASplitCost' on x86.

Harbormaster completed remote builds in B95157: Diff 332517.Mar 22 2021, 10:34 PM

Gentle reminder - I disabled the feature by default on x86 (assuming we're OK with the staged approach).

@sanwou01 , given the compile time overhead on x86 may also affect aarch64, should we disable on aarch64 by default, too?

(sorry if the context requires some digging into this patch's log, happy to provide a summary if that's too daunting)

The proposed approach is fine from my side at least. Flipping the switch for AArch64 as well would be good.

llvm/lib/CodeGen/RegAllocGreedy.cpp
1063	nit: A 5 snuck in.
llvm/test/CodeGen/X86/i128-mul.ll
2	Don't think regalloc details are important for this test and the two below. Might want to regenerate the output rather than adding the flag for these. (The two tests above specifically test for regalloc behavior.)

This revision is now accepted and ready to land.Mar 24 2021, 1:21 PM

I'm worried that this comes up in a lot of places. Perhaps rare still, but important cases. The aarch64 example we have is just a matrix multiply, and is 25% slower with all the cascading spills, https://bugs.llvm.org/show_bug.cgi?id=26810 quotes the same. Like I said before though, the option didn't fix some examples of the same thing that we were seeing in ARM, so I'm not sure how reliably better it is.

@aditya_nandakumar @qcolombet any idea when a better fix for that issue might be available?

Could we just make consider-local-interval-cost -O3 only in the meantime? That should alleviate some of the compile time worries, as we have genuine examples of where it is hurting performance.

llvm/test/CodeGen/X86/bug26810.ll
1	If we are enabling this flag by default, we should probably update the tests, not hide them behind a flag.

In D98232#2650231, @dmgreen wrote:

I'm worried that this comes up in a lot of places. Perhaps rare still, but important cases. The aarch64 example we have is just a matrix multiply, and is 25% slower with all the cascading spills, https://bugs.llvm.org/show_bug.cgi?id=26810 quotes the same. Like I said before though, the option didn't fix some examples of the same thing that we were seeing in ARM, so I'm not sure how reliably better it is.

@aditya_nandakumar @qcolombet any idea when a better fix for that issue might be available?

Could we just make consider-local-interval-cost -O3 only in the meantime? That should alleviate some of the compile time worries, as we have genuine examples of where it is hurting performance.

This would be really bad for us, because rust effectively always uses O3, and we expect reasonable compile-time tradeoffs to be made for it as well.

Worth noting that D35816 discussed a number of alternatives to this, one being to handle this in machine copy propagation instead, which should be both much less complex and not have compile-time concerns. The cited disadvantage is that it would only work inside a block. From the examples I've seen long eviction chains inside a single BB seem to be the main problem, so maybe it would be worthwhile to go back to that option. I don't really have a good view on this topic though.

This would be really bad for us, because rust effectively always uses O3, and we expect reasonable compile-time tradeoffs to be made for it as well.

Worth noting that D35816 discussed a number of alternatives to this, one being to handle this in machine copy propagation instead, which should be both much less complex and not have compile-time concerns. The cited disadvantage is that it would only work inside a block. From the examples I've seen long eviction chains inside a single BB seem to be the main problem, so maybe it would be worthwhile to go back to that option. I don't really have a good view on this topic though.

It sounds like -O2 might be a better default for you - as a balance between optimizations and compile time. We have users that care deeply about performance and would be happy to spend extra compile time for it.

I'm not particularly interested in the exact test case here, and the compile time tradeoff does seem high. I would love to see an alternative. But it is just a pretty boring matrix multiply kernel that's going very wrong. If that happens in the inner loop of a ML kernel then that can have a large effect on a lot of people.

Extra compile time is fine for -O3 if it improves a runtime performance of various benchmarks.

3-5% extra compile time for almost no visible perf gain is bad trade off and not only Rust folks would be disapointed.

xbolva00 requested changes to this revision.Mar 28 2021, 12:39 PM

This revision now requires changes to proceed.Mar 28 2021, 12:39 PM

In D98232#2654676, @xbolva00 wrote:

Extra compile time is fine for -O3 if it improves a runtime performance of various benchmarks.

3-5% extra compile time for almost no visible perf gain is bad trade off and not only Rust folks would be disapointed.

@xbolva00 - sorry if I'm missing it, but what changes would you like made?

All - note that this patch just changes defaults (well, and fixes an underlying problem).

It sounds like we don't have data on the frequency of the potential code quality regression. Would looking at a suite of benchmarks, including compression and eigen, on x86 (in a thinlto + FDO build, in an isolated environment) help advance the discussion - or what's an alternative?

Thanks!

While i'm personally not very neurotic regarding compile time,
and agree with @dmgreen that -O3 is by design allowed to consume more time,
here i have to agree with @nikic.

5% compile-time regression is a rather significant time investment.
What does it bring? If it's less than 1% in some obscure proprietary benchmark
then it's one thing, if it's a few percent here and there in SPEC it's another.

In fact, i would strongly insist to follow llvm best practices,
revert the original patch that this patch is trying to fix,
fold this fix into the original change, and post that as a new review.

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

In D98232#2657991, @dmgreen wrote:

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

About how to proceed, to add to the "revert original vs not" aspect, the additional trouble is that the original patch has some changes to weight calculation that aren't controlled by neither consider-local-interval-cost, nor by enableAdvancedRASplitCost().

Any pushback, then, to proceeding as planned (fixing the code, but disabling it by default everywhere)? If anything "breaks", there's a quick unblocker for those affected, and meanwhile there's time to investigate the alternatives others mentioned here.

In D98232#2660309, @mtrofin wrote:

In D98232#2657991, @dmgreen wrote:

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

About how to proceed, to add to the "revert original vs not" aspect, the additional trouble is that the original patch has some changes to weight calculation that aren't controlled by neither consider-local-interval-cost, nor by enableAdvancedRASplitCost().

Any pushback, then, to proceeding as planned (fixing the code, but disabling it by default everywhere)? If anything "breaks", there's a quick unblocker for those affected, and meanwhile there's time to investigate the alternatives others mentioned here.

If it's not going to be enabled *anywhere*, even on the AArch64, it sounds like dead code to me, which shouldn't be there.

In D98232#2660803, @lebedev.ri wrote:

In D98232#2660309, @mtrofin wrote:

In D98232#2657991, @dmgreen wrote:

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

About how to proceed, to add to the "revert original vs not" aspect, the additional trouble is that the original patch has some changes to weight calculation that aren't controlled by neither consider-local-interval-cost, nor by enableAdvancedRASplitCost().

Any pushback, then, to proceeding as planned (fixing the code, but disabling it by default everywhere)? If anything "breaks", there's a quick unblocker for those affected, and meanwhile there's time to investigate the alternatives others mentioned here.

If it's not going to be enabled *anywhere*, even on the AArch64, it sounds like dead code to me, which shouldn't be there.

Not enabled by default. This would still give folks a chance to unblock if they hit regressions by using the flag. We can then remove the portion that's flag controlled when we have the alternative others were mention, or when we feel that, since no one reported regressions, it's safe to remove.

In D98232#2661406, @mtrofin wrote:

In D98232#2660803, @lebedev.ri wrote:

In D98232#2660309, @mtrofin wrote:

In D98232#2657991, @dmgreen wrote:

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

About how to proceed, to add to the "revert original vs not" aspect, the additional trouble is that the original patch has some changes to weight calculation that aren't controlled by neither consider-local-interval-cost, nor by enableAdvancedRASplitCost().

Any pushback, then, to proceeding as planned (fixing the code, but disabling it by default everywhere)? If anything "breaks", there's a quick unblocker for those affected, and meanwhile there's time to investigate the alternatives others mentioned here.

If it's not going to be enabled *anywhere*, even on the AArch64, it sounds like dead code to me, which shouldn't be there.

Not enabled by default. This would still give folks a chance to unblock if they hit regressions by using the flag. We can then remove the portion that's flag controlled when we have the alternative others were mention, or when we feel that, since no one reported regressions, it's safe to remove.

I understand that. The question i'm asking is whether it's useful to have that flag in the first place?
Do we know that people will actually use it?

In D98232#2661432, @lebedev.ri wrote:

In D98232#2661406, @mtrofin wrote:

In D98232#2660803, @lebedev.ri wrote:

In D98232#2660309, @mtrofin wrote:

In D98232#2657991, @dmgreen wrote:

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

About how to proceed, to add to the "revert original vs not" aspect, the additional trouble is that the original patch has some changes to weight calculation that aren't controlled by neither consider-local-interval-cost, nor by enableAdvancedRASplitCost().

Any pushback, then, to proceeding as planned (fixing the code, but disabling it by default everywhere)? If anything "breaks", there's a quick unblocker for those affected, and meanwhile there's time to investigate the alternatives others mentioned here.

If it's not going to be enabled *anywhere*, even on the AArch64, it sounds like dead code to me, which shouldn't be there.

Not enabled by default. This would still give folks a chance to unblock if they hit regressions by using the flag. We can then remove the portion that's flag controlled when we have the alternative others were mention, or when we feel that, since no one reported regressions, it's safe to remove.

I understand that. The question i'm asking is whether it's useful to have that flag in the first place?
Do we know that people will actually use it?

Ah, I see - we don't know, since right now people don't know if they'd have a regression if we flipped the default behavior. If they do, they'd easily unblock themselves, and hopefully report back. The alternative - removing the code outright - is harder to revert.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

I would expect any large out of order cpu to chew through register mov's like they aren't even there. They just end up as register renames. It will be in-order "little" cores where this hurts the most.

Can you remove -consider-local-interval-cost from the tests?

In D98232#2661456, @mtrofin wrote:

In D98232#2661432, @lebedev.ri wrote:

In D98232#2661406, @mtrofin wrote:

In D98232#2660803, @lebedev.ri wrote:

In D98232#2660309, @mtrofin wrote:

In D98232#2657991, @dmgreen wrote:

Yeah, I agree. 5% is too much to pay. https://reviews.llvm.org/D69437 measured this option as a 25% speed increase in a something that was important enough to fix, with a 0.1-0.2% compile time effect. That's a very different question.

(What really would like to see - is it would be great if there was someone in the llvm community who cared about O2. Trying to get the best performance for a low compile time, and optimizing that tradeoff. That can do a lot of benefit for a lot of people. This particular issue I have less interest in, It's just a shame to see the compile get things so embarrassingly wrong!)

But this patch has been in tree a long time. I believe best practice is to fix things as-is, not revert and attempt to re-apply after so long.

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

About how to proceed, to add to the "revert original vs not" aspect, the additional trouble is that the original patch has some changes to weight calculation that aren't controlled by neither consider-local-interval-cost, nor by enableAdvancedRASplitCost().

Any pushback, then, to proceeding as planned (fixing the code, but disabling it by default everywhere)? If anything "breaks", there's a quick unblocker for those affected, and meanwhile there's time to investigate the alternatives others mentioned here.

If it's not going to be enabled *anywhere*, even on the AArch64, it sounds like dead code to me, which shouldn't be there.

Not enabled by default. This would still give folks a chance to unblock if they hit regressions by using the flag. We can then remove the portion that's flag controlled when we have the alternative others were mention, or when we feel that, since no one reported regressions, it's safe to remove.

I understand that. The question i'm asking is whether it's useful to have that flag in the first place?
Do we know that people will actually use it?

Ah, I see - we don't know, since right now people don't know if they'd have a regression if we flipped the default behavior. If they do, they'd easily unblock themselves, and hopefully report back. The alternative - removing the code outright - is harder to revert.

Ah, i see.

In D98232#2661457, @dmgreen wrote:

Quick update on the code quality side, I ran spec2006, the llvm benchmarks, the eigen benchmarks, and a few others, on x86 (FDO, thinlto) with this patch, and with/without enabling consider-local-interval-cost. No significant real effect.

I would expect any large out of order cpu to chew through register mov's like they aren't even there. They just end up as register renames. It will be in-order "little" cores where this hurts the most.

Right, unfortunately I don't have access to those.

Can you remove -consider-local-interval-cost from the tests?

It's disabled by default (I'm assuming it's about your comment on bug26810.ll - I'll reply there)

feedback

llvm/test/CodeGen/X86/bug26810.ll
1	We're not enabling it by default though, that'd regress compile time.
llvm/test/CodeGen/X86/i128-mul.ll
2	sgtm, I'll do that.

Harbormaster completed remote builds in B96541: Diff 334452.Mar 31 2021, 9:38 AM

@aditya_nandakumar @qcolombet any idea when a better fix for that issue might be available?

Given Aditya did the prototype I let him comment on that, but I don't expect we land something anytime soon (i.e., at least a couple of months).

In D98232#2661823, @qcolombet wrote:

@aditya_nandakumar @qcolombet any idea when a better fix for that issue might be available?

Given Aditya did the prototype I let him comment on that, but I don't expect we land something anytime soon (i.e., at least a couple of months).

That sounds like a reasonable estimate of when I can get it upstreamed.

Thanks for the test update. Reluctantly, this LGTM.

llvm/test/CodeGen/X86/bug26810.ll
1	Ah, yes. I meant "If we are disabling this flag.."

In D98232#2661843, @aditya_nandakumar wrote:

In D98232#2661823, @qcolombet wrote:

@aditya_nandakumar @qcolombet any idea when a better fix for that issue might be available?

Given Aditya did the prototype I let him comment on that, but I don't expect we land something anytime soon (i.e., at least a couple of months).

That sounds like a reasonable estimate of when I can get it upstreamed.

llvm/test/CodeGen/X86/bug26810.ll
1	Ah, I wanted to keep some regression testing for the behavior behind the flag.

@xbolva00 - is there anything you'd like to see done to this patch?

No, nothing from me.

This revision is now accepted and ready to land.Apr 1 2021, 7:58 AM

Closed by commit rGce61def529e2: [regalloc] Ensure Query::collectInterferringVregs is called before interval… (authored by mtrofin). · Explain WhyApr 1 2021, 8:33 AM

This revision was automatically updated to reflect the committed changes.

mtrofin added a commit: rGce61def529e2: [regalloc] Ensure Query::collectInterferringVregs is called before interval….

Resurrecting the topic here: @nikic @lebedev.ri @dmgreen @qcolombet @aditya_nandakumar

should we go ahead and revert (most of*) https://reviews.llvm.org/D35816? IIRC, @aditya_nandakumar and @qcolombet 's work would make the eviction chains introduced in https://reviews.llvm.org/D35816 unnecessary. We disabled it, too, with the change here, and haven't heard any complaints.

*The reason I'm saying "most of" is because https://reviews.llvm.org/D35816 had some changes to weight calculation, too. So I'd prefer keeping that. The rest would go (code in regalloc and tests)

mtrofin mentioned this in D112882: [NFC][Regalloc] Ensure Query::interferingVRegs is accurate..Oct 30 2021, 8:52 PM

mtrofin mentioned this in rG34f4fe3a9009: [NFC][Regalloc] Ensure Query::interferingVRegs is accurate..Nov 2 2021, 6:27 PM

should we go ahead and revert (most of*) https://reviews.llvm.org/D35816?

Although I think that's the right approach, neither @aditya_nandakumar or I had time to clean up and push the fix for eviction chains yet.

@aditya_nandakumar do you have an ETA when you believe you can land that?
Should I just take over at this point?

Cheers,
-Quenitn

In D98232#3132500, @qcolombet wrote:

should we go ahead and revert (most of*) https://reviews.llvm.org/D35816?

Although I think that's the right approach, neither @aditya_nandakumar or I had time to clean up and push the fix for eviction chains yet.

@aditya_nandakumar do you have an ETA when you believe you can land that?
Should I just take over at this point?

Cheers,
-Quenitn

Ack - but I think the 2 are somewhat orthogonal; at this point, the old functionality is basically dead code (only exercised by a few tests), unless someone explicitly opts into it. So what I was thinking was to send an email to llvm-dev and see if anyone explicitly passes -mllvm -consider-local-interval-cost. If not (the argument goes), let's yank out the old code. Then, later, Aditya's (or your) patch would come in on a clean slate.

Or does the envisioned patch require most of the stuff that's currently 'dead'?

In D98232#3132500, @qcolombet wrote:

should we go ahead and revert (most of*) https://reviews.llvm.org/D35816?

Although I think that's the right approach, neither @aditya_nandakumar or I had time to clean up and push the fix for eviction chains yet.

@aditya_nandakumar do you have an ETA when you believe you can land that?

Hi Quentin, unfortunately I've not had the time to upstream this or think about cleaning this up and upstreaming it.

Should I just take over at this point?

If you have the cycles, please go for it. I don't want to hold up progress here. Thanks

Cheers,
-Quenitn

Hi Mircea,

If not (the argument goes), let's yank out the old code.

I thought x86 was using that code, but I see that you actually disabled that back in March:

commit ce61def529e2d9ef46b79c9d1f489d69b45b95bf
Author: Mircea Trofin <mtrofin@google.com>
Date:   Mon Mar 8 20:55:53 2021 -0800

    [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration

If not (the argument goes), let's yank out the old code.

Sounds like a plan.

Or does the envisioned patch require most of the stuff that's currently 'dead'?

No, the "envisioned patch" actually runs a post regalloc pass that does the clean up.

Cheers,
-Quentin

Created a topic whether anyone was using the flag:
https://discourse.llvm.org/t/are-you-using-mllvm-consider-local-interval-cost/60744

Patch yanking the code: D121128

In D98232#3132500, @qcolombet wrote:

should we go ahead and revert (most of*) https://reviews.llvm.org/D35816?

Although I think that's the right approach, neither @aditya_nandakumar or I had time to clean up and push the fix for eviction chains yet.

@aditya_nandakumar do you have an ETA when you believe you can land that?
Should I just take over at this point?

Cheers,
-Quenitn

Or does the envisioned patch require most of the stuff that's currently 'dead'?

In D98232#3135324, @qcolombet wrote:
Hi Mircea,

If not (the argument goes), let's yank out the old code.

I thought x86 was using that code, but I see that you actually disabled that back in March:
commit ce61def529e2d9ef46b79c9d1f489d69b45b95bf
Author: Mircea Trofin <mtrofin@google.com>
Date:   Mon Mar 8 20:55:53 2021 -0800

    [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
If not (the argument goes), let's yank out the old code.

Sounds like a plan.

Or does the envisioned patch require most of the stuff that's currently 'dead'?

No, the "envisioned patch" actually runs a post regalloc pass that does the clean up.

Cheers,
-Quentin

Perfect!

The part of the original patch that gives me a bit of pause is the changes to the weights calculation, that part of the code isn't hidden behind a flag and I want to doublecheck if it's effectively dead (my suspicion is "not" - btw, if folks have an insight there, please let me know).

So the plan would be to yank the code behind a flag, and *maybe* the weights part, depending on the above.

mtrofin mentioned this in D121128: [regalloc] Remove -consider-local-interval-cost.Mar 7 2022, 8:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 8:35 AM

mtrofin mentioned this in rG294eca35a00f: [regalloc] Remove -consider-local-interval-cost.Mar 14 2022, 10:49 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

LiveIntervalUnion.h

20 lines

lib/

CodeGen/

LiveIntervalUnion.cpp

19 lines

LiveRegMatrix.cpp

16 lines

RegAllocGreedy.cpp

40 lines

Target/

AArch64/

AArch64Subtarget.h

2 lines

X86/

X86Subtarget.h

2 lines

test/

CodeGen/

AArch64/

ragreedy-local-interval-cost.ll

2 lines

X86/

bug26810.ll

2 lines

greedy_regalloc_bad_eviction_sequence.ll

2 lines

i128-mul.ll

4 lines

mmx-arith.ll

15 lines

optimize-max-0.ll

98 lines

Diff 334708

llvm/include/llvm/CodeGen/LiveIntervalUnion.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	#endif

/// Query interferences between a single live virtual register and a live		/// Query interferences between a single live virtual register and a live
/// interval union.		/// interval union.
class Query {		class Query {
const LiveIntervalUnion *LiveUnion = nullptr;		const LiveIntervalUnion *LiveUnion = nullptr;
const LiveRange *LR = nullptr;		const LiveRange *LR = nullptr;
LiveRange::const_iterator LRI; ///< current position in LR		LiveRange::const_iterator LRI; ///< current position in LR
ConstSegmentIter LiveUnionI; ///< current position in LiveUnion		ConstSegmentIter LiveUnionI; ///< current position in LiveUnion
SmallVector<LiveInterval*,4> InterferingVRegs;		Optional<SmallVector<LiveInterval *, 4>> InterferingVRegs;
		qcolombetUnsubmitted Done Reply Inline Actions That part of the patch is not strictly needed. I am guessing we want this because that way accessing `InterferingVRegs` without calling `collectInterferingVRegs` first will produce a runtime crash instead of silently checking against something empty. Am I understanding correctly? qcolombet: That part of the patch is not strictly needed. I am guessing we want this because that way…
		mtrofinAuthorUnsubmitted Done Reply Inline Actions That's correct. This patch could land without the Optional, and/or I anyway intend to follow up with a change that avoids this level of API statefulness to avoid others falling into the pitfall. mtrofin: That's correct. This patch could land without the Optional, and/or I anyway intend to follow up…
bool CheckedFirstInterference = false;		bool CheckedFirstInterference = false;
bool SeenAllInterferences = false;		bool SeenAllInterferences = false;
unsigned Tag = 0;		unsigned Tag = 0;
unsigned UserTag = 0;		unsigned UserTag = 0;

		public:
		Query() = default;
		Query(const LiveRange &LR, const LiveIntervalUnion &LIU)
		: LiveUnion(&LIU), LR(&LR) {}
		Query(const Query &) = delete;
		Query &operator=(const Query &) = delete;

void reset(unsigned NewUserTag, const LiveRange &NewLR,		void reset(unsigned NewUserTag, const LiveRange &NewLR,
const LiveIntervalUnion &NewLiveUnion) {		const LiveIntervalUnion &NewLiveUnion) {
LiveUnion = &NewLiveUnion;		LiveUnion = &NewLiveUnion;
LR = &NewLR;		LR = &NewLR;
InterferingVRegs.clear();		InterferingVRegs = None;
CheckedFirstInterference = false;		CheckedFirstInterference = false;
SeenAllInterferences = false;		SeenAllInterferences = false;
Tag = NewLiveUnion.getTag();		Tag = NewLiveUnion.getTag();
UserTag = NewUserTag;		UserTag = NewUserTag;
}		}

public:
Query() = default;
Query(const LiveRange &LR, const LiveIntervalUnion &LIU):
LiveUnion(&LIU), LR(&LR) {}
Query(const Query &) = delete;
Query &operator=(const Query &) = delete;

void init(unsigned NewUserTag, const LiveRange &NewLR,		void init(unsigned NewUserTag, const LiveRange &NewLR,
const LiveIntervalUnion &NewLiveUnion) {		const LiveIntervalUnion &NewLiveUnion) {
if (UserTag == NewUserTag && LR == &NewLR && LiveUnion == &NewLiveUnion &&		if (UserTag == NewUserTag && LR == &NewLR && LiveUnion == &NewLiveUnion &&
!NewLiveUnion.changedSince(Tag)) {		!NewLiveUnion.changedSince(Tag)) {
// Retain cached results, e.g. firstInterference.		// Retain cached results, e.g. firstInterference.
return;		return;
}		}
reset(NewUserTag, NewLR, NewLiveUnion);		reset(NewUserTag, NewLR, NewLiveUnion);
Show All 10 Lines	public:
// Was this virtual register visited during collectInterferingVRegs?		// Was this virtual register visited during collectInterferingVRegs?
bool isSeenInterference(LiveInterval *VirtReg) const;		bool isSeenInterference(LiveInterval *VirtReg) const;

// Did collectInterferingVRegs collect all interferences?		// Did collectInterferingVRegs collect all interferences?
bool seenAllInterferences() const { return SeenAllInterferences; }		bool seenAllInterferences() const { return SeenAllInterferences; }

// Vector generated by collectInterferingVRegs.		// Vector generated by collectInterferingVRegs.
const SmallVectorImpl<LiveInterval*> &interferingVRegs() const {		const SmallVectorImpl<LiveInterval*> &interferingVRegs() const {
return InterferingVRegs;		return *InterferingVRegs;
}		}
};		};

// Array of LiveIntervalUnions.		// Array of LiveIntervalUnions.
class Array {		class Array {
unsigned Size = 0;		unsigned Size = 0;
LiveIntervalUnion *LIUs = nullptr;		LiveIntervalUnion *LIUs = nullptr;

Show All 27 Lines

llvm/lib/CodeGen/LiveIntervalUnion.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	for (LiveSegments::const_iterator SI = Segments.begin(); SI.valid(); ++SI) {
return SI.value();		return SI.value();
}		}
return nullptr;		return nullptr;
}		}

// Scan the vector of interfering virtual registers in this union. Assume it's		// Scan the vector of interfering virtual registers in this union. Assume it's
// quite small.		// quite small.
bool LiveIntervalUnion::Query::isSeenInterference(LiveInterval *VirtReg) const {		bool LiveIntervalUnion::Query::isSeenInterference(LiveInterval *VirtReg) const {
return is_contained(InterferingVRegs, VirtReg);		return is_contained(*InterferingVRegs, VirtReg);
}		}

// Collect virtual registers in this union that interfere with this		// Collect virtual registers in this union that interfere with this
// query's live virtual register.		// query's live virtual register.
//		//
// The query state is one of:		// The query state is one of:
//		//
// 1. CheckedFirstInterference == false: Iterators are uninitialized.		// 1. CheckedFirstInterference == false: Iterators are uninitialized.
// 2. SeenAllInterferences == true: InterferingVRegs complete, iterators unused.		// 2. SeenAllInterferences == true: InterferingVRegs complete, iterators unused.
// 3. Iterators left at the last seen intersection.		// 3. Iterators left at the last seen intersection.
//		//
unsigned LiveIntervalUnion::Query::		unsigned LiveIntervalUnion::Query::
collectInterferingVRegs(unsigned MaxInterferingRegs) {		collectInterferingVRegs(unsigned MaxInterferingRegs) {
		if (!InterferingVRegs)
		InterferingVRegs.emplace();

// Fast path return if we already have the desired information.		// Fast path return if we already have the desired information.
if (SeenAllInterferences \|\| InterferingVRegs.size() >= MaxInterferingRegs)		if (SeenAllInterferences \|\| InterferingVRegs->size() >= MaxInterferingRegs)
return InterferingVRegs.size();		return InterferingVRegs->size();

// Set up iterators on the first call.		// Set up iterators on the first call.
if (!CheckedFirstInterference) {		if (!CheckedFirstInterference) {
CheckedFirstInterference = true;		CheckedFirstInterference = true;

// Quickly skip interference check for empty sets.		// Quickly skip interference check for empty sets.
if (LR->empty() \|\| LiveUnion->empty()) {		if (LR->empty() \|\| LiveUnion->empty()) {
SeenAllInterferences = true;		SeenAllInterferences = true;
Show All 12 Lines	while (LiveUnionI.valid()) {
assert(LRI != LREnd && "Reached end of LR");		assert(LRI != LREnd && "Reached end of LR");

// Check for overlapping interference.		// Check for overlapping interference.
while (LRI->start < LiveUnionI.stop() && LRI->end > LiveUnionI.start()) {		while (LRI->start < LiveUnionI.stop() && LRI->end > LiveUnionI.start()) {
// This is an overlap, record the interfering register.		// This is an overlap, record the interfering register.
LiveInterval *VReg = LiveUnionI.value();		LiveInterval *VReg = LiveUnionI.value();
if (VReg != RecentReg && !isSeenInterference(VReg)) {		if (VReg != RecentReg && !isSeenInterference(VReg)) {
RecentReg = VReg;		RecentReg = VReg;
InterferingVRegs.push_back(VReg);		InterferingVRegs->push_back(VReg);
if (InterferingVRegs.size() >= MaxInterferingRegs)		if (InterferingVRegs->size() >= MaxInterferingRegs)
return InterferingVRegs.size();		return InterferingVRegs->size();
}		}
// This LiveUnion segment is no longer interesting.		// This LiveUnion segment is no longer interesting.
if (!(++LiveUnionI).valid()) {		if (!(++LiveUnionI).valid()) {
SeenAllInterferences = true;		SeenAllInterferences = true;
return InterferingVRegs.size();		return InterferingVRegs->size();
}		}
}		}

// The iterators are now not overlapping, LiveUnionI has been advanced		// The iterators are now not overlapping, LiveUnionI has been advanced
// beyond LRI.		// beyond LRI.
assert(LRI->end <= LiveUnionI.start() && "Expected non-overlap");		assert(LRI->end <= LiveUnionI.start() && "Expected non-overlap");

// Advance the iterator that ends first.		// Advance the iterator that ends first.
LRI = LR->advanceTo(LRI, LiveUnionI.start());		LRI = LR->advanceTo(LRI, LiveUnionI.start());
if (LRI == LREnd)		if (LRI == LREnd)
break;		break;

// Detect overlap, handle above.		// Detect overlap, handle above.
if (LRI->start < LiveUnionI.stop())		if (LRI->start < LiveUnionI.stop())
continue;		continue;

// Still not overlapping. Catch up LiveUnionI.		// Still not overlapping. Catch up LiveUnionI.
LiveUnionI.advanceTo(LRI->start);		LiveUnionI.advanceTo(LRI->start);
}		}
SeenAllInterferences = true;		SeenAllInterferences = true;
return InterferingVRegs.size();		return InterferingVRegs->size();
}		}

void LiveIntervalUnion::Array::init(LiveIntervalUnion::Allocator &Alloc,		void LiveIntervalUnion::Array::init(LiveIntervalUnion::Allocator &Alloc,
unsigned NSize) {		unsigned NSize) {
// Reuse existing allocation.		// Reuse existing allocation.
if (NSize == Size)		if (NSize == Size)
return;		return;
clear();		clear();
Show All 16 Lines

llvm/lib/CodeGen/LiveRegMatrix.cpp

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	bool LiveRegMatrix::checkInterference(SlotIndex Start, SlotIndex End,
// Construct artificial live range containing only one segment [Start, End).		// Construct artificial live range containing only one segment [Start, End).
VNInfo valno(0, Start);		VNInfo valno(0, Start);
LiveRange::Segment Seg(Start, End, &valno);		LiveRange::Segment Seg(Start, End, &valno);
LiveRange LR;		LiveRange LR;
LR.addSegment(Seg);		LR.addSegment(Seg);

// Check for interference with that segment		// Check for interference with that segment
for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units) {		for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units) {
if (query(LR, *Units).checkInterference())		// LR is stack-allocated. LiveRegMatrix caches queries by a key that
		// includes the address of the live range. If (for the same reg unit) this
		// checkInterference overload is called twice, without any other query()
		// calls in between (on heap-allocated LiveRanges) - which would invalidate
		// the cached query - the LR address seen the second time may well be the
		// same as that seen the first time, while the Start/End/valno may not - yet
		// the same cached result would be fetched. To avoid that, we don't cache
		// this query.
		//
		// FIXME: the usability of the Query API needs to be improved to avoid
		// subtle bugs due to query identity. Avoiding caching, for example, would
		// greatly simplify things.
		LiveIntervalUnion::Query Q;
		Q.reset(UserTag, LR, Matrix[*Units]);
		if (Q.checkInterference())
return true;		return true;
}		}
return false;		return false;
}		}

Register LiveRegMatrix::getOneVReg(unsigned PhysReg) const {		Register LiveRegMatrix::getOneVReg(unsigned PhysReg) const {
LiveInterval *VRegInterval = nullptr;		LiveInterval *VRegInterval = nullptr;
for (MCRegUnitIterator Unit(PhysReg, TRI); Unit.isValid(); ++Unit) {		for (MCRegUnitIterator Unit(PhysReg, TRI); Unit.isValid(); ++Unit) {
if ((VRegInterval = Matrix[*Unit].getOneVReg()))		if ((VRegInterval = Matrix[*Unit].getOneVReg()))
return VRegInterval->reg();		return VRegInterval->reg();
}		}

return MCRegister::NoRegister;		return MCRegister::NoRegister;
}		}

llvm/lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	BlockFrequency calcGlobalSplitCost(GlobalSplitCandidate &,
bool *CanCauseEvictionChain);		bool *CanCauseEvictionChain);
bool calcCompactRegion(GlobalSplitCandidate&);		bool calcCompactRegion(GlobalSplitCandidate&);
void splitAroundRegion(LiveRangeEdit&, ArrayRef<unsigned>);		void splitAroundRegion(LiveRangeEdit&, ArrayRef<unsigned>);
void calcGapWeights(MCRegister, SmallVectorImpl<float> &);		void calcGapWeights(MCRegister, SmallVectorImpl<float> &);
Register canReassign(LiveInterval &VirtReg, Register PrevReg) const;		Register canReassign(LiveInterval &VirtReg, Register PrevReg) const;
bool shouldEvict(LiveInterval &A, bool, LiveInterval &B, bool) const;		bool shouldEvict(LiveInterval &A, bool, LiveInterval &B, bool) const;
bool canEvictInterference(LiveInterval &, MCRegister, bool, EvictionCost &,		bool canEvictInterference(LiveInterval &, MCRegister, bool, EvictionCost &,
const SmallVirtRegSet &) const;		const SmallVirtRegSet &) const;
bool canEvictInterferenceInRange(LiveInterval &VirtReg, MCRegister PhysReg,		bool canEvictInterferenceInRange(const LiveInterval &VirtReg,
SlotIndex Start, SlotIndex End,		MCRegister PhysReg, SlotIndex Start,
EvictionCost &MaxCost) const;		SlotIndex End, EvictionCost &MaxCost) const;
MCRegister getCheapestEvicteeWeight(const AllocationOrder &Order,		MCRegister getCheapestEvicteeWeight(const AllocationOrder &Order,
LiveInterval &VirtReg, SlotIndex Start,		const LiveInterval &VirtReg,
SlotIndex End, float *BestEvictWeight);		SlotIndex Start, SlotIndex End,
		float *BestEvictWeight) const;
void evictInterference(LiveInterval &, MCRegister,		void evictInterference(LiveInterval &, MCRegister,
SmallVectorImpl<Register> &);		SmallVectorImpl<Register> &);
bool mayRecolorAllInterferences(MCRegister PhysReg, LiveInterval &VirtReg,		bool mayRecolorAllInterferences(MCRegister PhysReg, LiveInterval &VirtReg,
SmallLISet &RecoloringCandidates,		SmallLISet &RecoloringCandidates,
const SmallVirtRegSet &FixedRegisters);		const SmallVirtRegSet &FixedRegisters);

MCRegister tryAssign(LiveInterval&, AllocationOrder&,		MCRegister tryAssign(LiveInterval&, AllocationOrder&,
SmallVectorImpl<Register>&,		SmallVectorImpl<Register>&,
▲ Show 20 Lines • Show All 486 Lines • ▼ Show 20 Lines
///		///
/// \param VirtReg Live range that is about to be assigned.		/// \param VirtReg Live range that is about to be assigned.
/// \param PhysReg Desired register for assignment.		/// \param PhysReg Desired register for assignment.
/// \param Start Start of range to look for interferences.		/// \param Start Start of range to look for interferences.
/// \param End End of range to look for interferences.		/// \param End End of range to look for interferences.
/// \param MaxCost Only look for cheaper candidates and update with new cost		/// \param MaxCost Only look for cheaper candidates and update with new cost
/// when returning true.		/// when returning true.
/// \return True when interference can be evicted cheaper than MaxCost.		/// \return True when interference can be evicted cheaper than MaxCost.
bool RAGreedy::canEvictInterferenceInRange(LiveInterval &VirtReg,		bool RAGreedy::canEvictInterferenceInRange(const LiveInterval &VirtReg,
MCRegister PhysReg, SlotIndex Start,		MCRegister PhysReg, SlotIndex Start,
SlotIndex End,		SlotIndex End,
EvictionCost &MaxCost) const {		EvictionCost &MaxCost) const {
EvictionCost Cost;		EvictionCost Cost;

for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units) {		for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units) {
LiveIntervalUnion::Query &Q = Matrix->query(VirtReg, *Units);		LiveIntervalUnion::Query &Q = Matrix->query(VirtReg, *Units);
		Q.collectInterferingVRegs();

// Check if any interfering live range is heavier than MaxWeight.		// Check if any interfering live range is heavier than MaxWeight.
for (const LiveInterval *Intf : reverse(Q.interferingVRegs())) {		for (const LiveInterval *Intf : reverse(Q.interferingVRegs())) {
// Check if interference overlast the segment in interest.		// Check if interference overlast the segment in interest.
if (!Intf->overlaps(Start, End))		if (!Intf->overlaps(Start, End))
continue;		continue;

// Cannot evict non virtual reg interference.		// Cannot evict non virtual reg interference.
Show All 28 Lines
/// \param Order The allocation order		/// \param Order The allocation order
/// \param VirtReg Live range that is about to be assigned.		/// \param VirtReg Live range that is about to be assigned.
/// \param Start Start of range to look for interferences		/// \param Start Start of range to look for interferences
/// \param End End of range to look for interferences		/// \param End End of range to look for interferences
/// \param BestEvictweight The eviction cost of that eviction		/// \param BestEvictweight The eviction cost of that eviction
/// \return The PhysReg which is the best candidate for eviction and the		/// \return The PhysReg which is the best candidate for eviction and the
/// eviction cost in BestEvictweight		/// eviction cost in BestEvictweight
MCRegister RAGreedy::getCheapestEvicteeWeight(const AllocationOrder &Order,		MCRegister RAGreedy::getCheapestEvicteeWeight(const AllocationOrder &Order,
LiveInterval &VirtReg,		const LiveInterval &VirtReg,
SlotIndex Start, SlotIndex End,		SlotIndex Start, SlotIndex End,
float *BestEvictweight) {		float *BestEvictweight) const {
EvictionCost BestEvictCost;		EvictionCost BestEvictCost;
BestEvictCost.setMax();		BestEvictCost.setMax();
BestEvictCost.MaxWeight = VirtReg.weight();		BestEvictCost.MaxWeight = VirtReg.weight();
MCRegister BestEvicteePhys;		MCRegister BestEvicteePhys;

// Go over all physical registers and find the best candidate for eviction		// Go over all physical registers and find the best candidate for eviction
for (MCRegister PhysReg : Order.getOrder()) {		for (MCRegister PhysReg : Order.getOrder()) {

if (!canEvictInterferenceInRange(VirtReg, PhysReg, Start, End,		if (!canEvictInterferenceInRange(VirtReg, PhysReg, Start, End,
BestEvictCost))		BestEvictCost))
continue;		continue;

// Best so far.		// Best so far.
BestEvicteePhys = PhysReg;		BestEvicteePhys = PhysReg;
}		}
*BestEvictweight = BestEvictCost.MaxWeight;		*BestEvictweight = BestEvictCost.MaxWeight;
return BestEvicteePhys;		return BestEvicteePhys;
}		}

/// evictInterference - Evict any interferring registers that prevent VirtReg		/// evictInterference - Evict any interferring registers that prevent VirtReg
/// from being assigned to Physreg. This assumes that canEvictInterference		/// from being assigned to Physreg. This assumes that canEvictInterference
/// returned true.		/// returned true.
void RAGreedy::evictInterference(LiveInterval &VirtReg, MCRegister PhysReg,		void RAGreedy::evictInterference(LiveInterval &VirtReg, MCRegister PhysReg,
SmallVectorImpl<Register> &NewVRegs) {		SmallVectorImpl<Register> &NewVRegs) {
// Make sure that VirtReg has a cascade number, and assign that cascade		// Make sure that VirtReg has a cascade number, and assign that cascade
		nikicUnsubmitted Done Reply Inline Actions nit: A 5 snuck in. nikic: nit: A 5 snuck in.
// number to every evicted register. These live ranges than then only be		// number to every evicted register. These live ranges than then only be
// evicted by a newer cascade, preventing infinite loops.		// evicted by a newer cascade, preventing infinite loops.
unsigned Cascade = ExtraRegInfo[VirtReg.reg()].Cascade;		unsigned Cascade = ExtraRegInfo[VirtReg.reg()].Cascade;
if (!Cascade)		if (!Cascade)
Cascade = ExtraRegInfo[VirtReg.reg()].Cascade = NextCascade++;		Cascade = ExtraRegInfo[VirtReg.reg()].Cascade = NextCascade++;

LLVM_DEBUG(dbgs() << "evicting " << printReg(PhysReg, TRI)		LLVM_DEBUG(dbgs() << "evicting " << printReg(PhysReg, TRI)
<< " interference: Cascade " << Cascade << '\n');		<< " interference: Cascade " << Cascade << '\n');
▲ Show 20 Lines • Show All 481 Lines • ▼ Show 20 Lines	bool RAGreedy::splitCanCauseLocalSpill(unsigned VirtRegToSplit,

// Check if the local interval will find a non interfereing assignment.		// Check if the local interval will find a non interfereing assignment.
for (auto PhysReg : Order.getOrder()) {		for (auto PhysReg : Order.getOrder()) {
if (!Matrix->checkInterference(Cand.Intf.first().getPrevIndex(),		if (!Matrix->checkInterference(Cand.Intf.first().getPrevIndex(),
Cand.Intf.last(), PhysReg))		Cand.Intf.last(), PhysReg))
return false;		return false;
}		}

// Check if the local interval will evict a cheaper interval.		// The local interval is not able to find non interferencing assignment
float CheapestEvictWeight = 0;		// and not able to evict a less worthy interval, therfore, it can cause a
MCRegister FutureEvictedPhysReg = getCheapestEvicteeWeight(		// spill.
Order, LIS->getInterval(VirtRegToSplit), Cand.Intf.first(),
Cand.Intf.last(), &CheapestEvictWeight);

// Have we found an interval that can be evicted?
if (FutureEvictedPhysReg) {
float splitArtifactWeight =
VRAI->futureWeight(LIS->getInterval(VirtRegToSplit),
Cand.Intf.first().getPrevIndex(), Cand.Intf.last());
// Will the weight of the local interval be higher than the cheapest evictee
// weight? If so it will evict it and will not cause a spill.
if (splitArtifactWeight >= 0 && splitArtifactWeight > CheapestEvictWeight)
return false;
}

// The local interval is not able to find non interferencing assignment and
// not able to evict a less worthy interval, therfore, it can cause a spill.
return true;		return true;
}		}

/// calcGlobalSplitCost - Return the global split cost of following the split		/// calcGlobalSplitCost - Return the global split cost of following the split
/// pattern in LiveBundles. This cost should be added to the local cost of the		/// pattern in LiveBundles. This cost should be added to the local cost of the
/// interference pattern in SplitConstraints.		/// interference pattern in SplitConstraints.
///		///
BlockFrequency RAGreedy::calcGlobalSplitCost(GlobalSplitCandidate &Cand,		BlockFrequency RAGreedy::calcGlobalSplitCost(GlobalSplitCandidate &Cand,
▲ Show 20 Lines • Show All 1,680 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 551 Lines • ▼ Show 20 Lines	public:
unsigned classifyGlobalFunctionReference(const GlobalValue *GV,		unsigned classifyGlobalFunctionReference(const GlobalValue *GV,
const TargetMachine &TM) const;		const TargetMachine &TM) const;

void overrideSchedPolicy(MachineSchedPolicy &Policy,		void overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const override;		unsigned NumRegionInstrs) const override;

bool enableEarlyIfConversion() const override;		bool enableEarlyIfConversion() const override;

bool enableAdvancedRASplitCost() const override { return true; }		bool enableAdvancedRASplitCost() const override { return false; }

std::unique_ptr<PBQPRAConstraint> getCustomPBQPConstraints() const override;		std::unique_ptr<PBQPRAConstraint> getCustomPBQPConstraints() const override;

bool isCallingConvWin64(CallingConv::ID CC) const {		bool isCallingConvWin64(CallingConv::ID CC) const {
switch (CC) {		switch (CC) {
case CallingConv::C:		case CallingConv::C:
case CallingConv::Fast:		case CallingConv::Fast:
case CallingConv::Swift:		case CallingConv::Swift:
Show All 20 Lines

llvm/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 935 Lines • ▼ Show 20 Lines	public:

void getPostRAMutations(std::vector<std::unique_ptr<ScheduleDAGMutation>>		void getPostRAMutations(std::vector<std::unique_ptr<ScheduleDAGMutation>>
&Mutations) const override;		&Mutations) const override;

AntiDepBreakMode getAntiDepBreakMode() const override {		AntiDepBreakMode getAntiDepBreakMode() const override {
return TargetSubtargetInfo::ANTIDEP_CRITICAL;		return TargetSubtargetInfo::ANTIDEP_CRITICAL;
}		}

bool enableAdvancedRASplitCost() const override { return true; }		bool enableAdvancedRASplitCost() const override { return false; }
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_X86_X86SUBTARGET_H		#endif // LLVM_LIB_TARGET_X86_X86SUBTARGET_H

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-arm-none-eabi < %s \| FileCheck %s			; RUN: llc -consider-local-interval-cost -mtriple=aarch64-arm-none-eabi < %s \| FileCheck %s

	@A = external dso_local local_unnamed_addr global [8 x [8 x i64]], align 8			@A = external dso_local local_unnamed_addr global [8 x [8 x i64]], align 8
	@B = external dso_local local_unnamed_addr global [8 x [8 x i64]], align 8			@B = external dso_local local_unnamed_addr global [8 x [8 x i64]], align 8
	@C = external dso_local local_unnamed_addr global [8 x [8 x i64]], align 8			@C = external dso_local local_unnamed_addr global [8 x [8 x i64]], align 8

	define dso_local void @run_test() local_unnamed_addr #0 {			define dso_local void @run_test() local_unnamed_addr #0 {
	; CHECK-LABEL: run_test:			; CHECK-LABEL: run_test:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: // implicit-def: $q10			; CHECK-NEXT: // implicit-def: $q10
	; CHECK-NEXT: // implicit-def: $q11			; CHECK-NEXT: // implicit-def: $q11
	; CHECK-NEXT: // implicit-def: $q12			; CHECK-NEXT: // implicit-def: $q12
	; CHECK-NEXT: // implicit-def: $q13			; CHECK-NEXT: // implicit-def: $q13
	; CHECK-NEXT: .LBB0_1: // %for.cond1.preheader			; CHECK-NEXT: .LBB0_1: // %for.cond1.preheader
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: mov x12, xzr			; CHECK-NEXT: mov x12, xzr
	; CHECK-NEXT: ldr q15, [x8]			; CHECK-NEXT: ldr q15, [x8]
	; CHECK-NEXT: ldr q14, [x12]			; CHECK-NEXT: ldr q14, [x12]
				dmgreenUnsubmitted Not Done Reply Inline Actions This is quite a large regression. My understanding is that for this test enableAdvancedRASplitCost/consider-local-interval-cost was enabled specifically to prevent this kind of recursive spilling from happening. dmgreen: This is quite a large regression. My understanding is that for this test…
				qcolombetUnsubmitted Not Done Reply Inline Actions The problem is that the previous version was relying on stall information. This is a correctness fix and should get in ASAP IMHO. As far as eviction chains go, we could actually clean them up later in the pipeline. @aditya_nandakumar worked on a prototype offline that solves that as a post RA pass. We're hoping to open source it soon-ish. qcolombet: The problem is that the previous version was relying on stall information. This is a…
				dmgreenUnsubmitted Not Done Reply Inline Actions If this is a correctness fix, presumably there should be a new test case for what it is fixing? dmgreen: If this is a correctness fix, presumably there should be a new test case for what it is fixing?
				qcolombetUnsubmitted Not Done Reply Inline Actions Good point! From what I understand this is a subtle issue because I don't think we can generate wrong code out of it, it will just do not detect eviction chain properly, or have the compiler spins forever (IIUC what @mtrofin means by "the compiler ices"). I let @mtrofin comment on this. qcolombet: Good point! From what I understand this is a subtle issue because I don't think we can…
				dmgreenUnsubmitted Not Done Reply Inline Actions My understanding is that this test comes from a fairly standard matrix multiply, by the way: https://godbolt.org/z/Ej5sb8 It is a shame to break such an obvious case. I have seen some other cases under Arm where we hit cascading eviction chains like this too - something that consider-local-interval-cost didn't help with. It would be great to see a more reliable fix for that, I look forward to seeing it. dmgreen: My understanding is that this test comes from a fairly standard matrix multiply, by the way…
				qcolombetUnsubmitted Not Done Reply Inline Actions I agree the eviction chains detection is not fool proof. What Aditya (and I to some extend) worked on showed that we should be able to eliminate the need to detect the eviction chain detection all together. The approach we took is just some sort of copy rewriting as described in https://dl.acm.org/doi/10.1145/1811212.1811214, though our implementation is less thorough than this paper :). qcolombet: I agree the eviction chains detection is not fool proof. What Aditya (and I to some extend)…
				mtrofinAuthorUnsubmitted Done Reply Inline Actions @qcolombet: by "the compiler ices" I meant that, with the current patch (use of Optional), but without the Q.collectInterferingVRegs() call in the patch, the compiler crashes, supporting the claim that "the API should have been called" (i.e. we were in the past operating on stale information... at least in some cases, because removing the loop right under it leads to a set of tests, disjoint from the ones patched here, that fail) @dmgreen, all - my hope was that someone could help with fixing the regression (I think that was the goal of https://reviews.llvm.org/D35816). I have little context into that. Otherwise it'll take me a bit of time to ramp up on that issue and propose a solution. Question is, are we OK with the bug (==the code sometimes works off stale info) meanwhile? mtrofin: @qcolombet: by "the compiler ices" I meant that, with the current patch (use of Optional), but…
				mtrofinAuthorUnsubmitted Done Reply Inline Actions @dmgreen , all - I dug a bit deeper into the regression. PTAL the diff in RegAllocGreedy.cpp, deleted lines at line 1559. The call to checkInterference on line 1555 causes the reset-ing of the earlier-calculated interference collections, rendering the deleted code almost dead. Almost, because the effect of calling query in canEvictInterferenceInRange seems to have an effect - see the i128-mul.ll diff. I'll track that down. mtrofin: @dmgreen , all - I dug a bit deeper into the regression. PTAL the diff in RegAllocGreedy.cpp…
				dmgreenUnsubmitted Done Reply Inline Actions I've never looked very deeply into the register allocator I'm afraid. It's still a bit of a mystery to me. I ran some benchmarks on the new code though - none of them changed very much which is a good sign. dmgreen: I've never looked very deeply into the register allocator I'm afraid. It's still a bit of a…
				mtrofinAuthorUnsubmitted Done Reply Inline Actions Tracked down the source of the last difference. @dmgreen @qcolombet - PTAL. I will work on simplifying the Query API next, to help avoid the types of pitfalls identified in this patch. mtrofin: Tracked down the source of the last difference. @dmgreen @qcolombet - PTAL. I will work on…
	; CHECK-NEXT: ldr q0, [x10], #64			; CHECK-NEXT: ldr q0, [x10], #64
	; CHECK-NEXT: ldr x18, [x12]			; CHECK-NEXT: ldr x18, [x12]
	; CHECK-NEXT: fmov x15, d15			; CHECK-NEXT: fmov x15, d15
	; CHECK-NEXT: mov x14, v15.d[1]			; CHECK-NEXT: mov x14, v15.d[1]
	; CHECK-NEXT: fmov x13, d14			; CHECK-NEXT: fmov x13, d14
	; CHECK-NEXT: mul x1, x15, x18			; CHECK-NEXT: mul x1, x15, x18
	; CHECK-NEXT: mov x16, v0.d[1]			; CHECK-NEXT: mov x16, v0.d[1]
	; CHECK-NEXT: fmov x17, d0			; CHECK-NEXT: fmov x17, d0
	▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/bug26810.ll

	; RUN: llc < %s -march=x86 -regalloc=greedy -stop-after=greedy \| FileCheck %s			; RUN: llc -consider-local-interval-cost < %s -march=x86 -regalloc=greedy -stop-after=greedy \| FileCheck %s
				dmgreenUnsubmitted Done Reply Inline Actions If we are enabling this flag by default, we should probably update the tests, not hide them behind a flag. dmgreen: If we are enabling this flag by default, we should probably update the tests, not hide them…
				mtrofinAuthorUnsubmitted Done Reply Inline Actions We're not enabling it by default though, that'd regress compile time. mtrofin: We're not enabling it by default though, that'd regress compile time.
				dmgreenUnsubmitted Done Reply Inline Actions Ah, yes. I meant "If we are disabling this flag.." dmgreen: Ah, yes. I meant "If we are disabling this flag.."
				mtrofinAuthorUnsubmitted Done Reply Inline Actions Ah, I wanted to keep some regression testing for the behavior behind the flag. mtrofin: Ah, I wanted to keep some regression testing for the behavior behind the flag.
	; Make sure bad eviction sequence doesnt occur			; Make sure bad eviction sequence doesnt occur

	; Fix for bugzilla 26810.			; Fix for bugzilla 26810.
	; This test is meant to make sure bad eviction sequence like the one described			; This test is meant to make sure bad eviction sequence like the one described
	; below does not occur			; below does not occur
	;			;
	; movapd %xmm7, 160(%esp) # 16-byte Spill			; movapd %xmm7, 160(%esp) # 16-byte Spill
	; movapd %xmm5, %xmm7			; movapd %xmm5, %xmm7
	▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

	; RUN: llc < %s -march=x86 -regalloc=greedy -stop-after=greedy \| FileCheck %s			; RUN: llc -consider-local-interval-cost < %s -march=x86 -regalloc=greedy -stop-after=greedy \| FileCheck %s
	; Make sure bad eviction sequence doesnt occur			; Make sure bad eviction sequence doesnt occur

	; Part of the fix for bugzilla 26810.			; Part of the fix for bugzilla 26810.
	; This test is meant to make sure bad eviction sequence like the one described			; This test is meant to make sure bad eviction sequence like the one described
	; below does not occur			; below does not occur
	;			;
	; movl %ebp, 8($esp) # 4-byte Spill			; movl %ebp, 8($esp) # 4-byte Spill
	; movl %ecx, %ebp			; movl %ecx, %ebp
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/i128-mul.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefix=X86-NOBMI			; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefix=X86-NOBMI
				nikicUnsubmitted Done Reply Inline Actions Don't think regalloc details are important for this test and the two below. Might want to regenerate the output rather than adding the flag for these. (The two tests above specifically test for regalloc behavior.) nikic: Don't think regalloc details are important for this test and the two below. Might want to…
				mtrofinAuthorUnsubmitted Done Reply Inline Actions sgtm, I'll do that. mtrofin: sgtm, I'll do that.
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi2 \| FileCheck %s --check-prefix=X86-BMI			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+bmi2 \| FileCheck %s --check-prefix=X86-BMI
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64-NOBMI			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64-NOBMI
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi2 \| FileCheck %s --check-prefix=X64-BMI			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+bmi2 \| FileCheck %s --check-prefix=X64-BMI

	; PR1198			; PR1198

	define i64 @foo(i64 %x, i64 %y) nounwind {			define i64 @foo(i64 %x, i64 %y) nounwind {
	; X86-NOBMI-LABEL: foo:			; X86-NOBMI-LABEL: foo:
	▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	; X86-NOBMI-NEXT: adcl $0, %edx			; X86-NOBMI-NEXT: adcl $0, %edx
	; X86-NOBMI-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-NOBMI-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-NOBMI-NEXT: movl %ecx, (%esi,%ebp,8)			; X86-NOBMI-NEXT: movl %ecx, (%esi,%ebp,8)
	; X86-NOBMI-NEXT: movl %edi, 4(%esi,%ebp,8)			; X86-NOBMI-NEXT: movl %edi, 4(%esi,%ebp,8)
	; X86-NOBMI-NEXT: addl $1, %ebp			; X86-NOBMI-NEXT: addl $1, %ebp
	; X86-NOBMI-NEXT: movl (%esp), %edi # 4-byte Reload			; X86-NOBMI-NEXT: movl (%esp), %edi # 4-byte Reload
	; X86-NOBMI-NEXT: adcl $0, %edi			; X86-NOBMI-NEXT: adcl $0, %edi
	; X86-NOBMI-NEXT: movl %ebp, %esi			; X86-NOBMI-NEXT: movl %ebp, %esi
	; X86-NOBMI-NEXT: xorl %ebx, %esi			; X86-NOBMI-NEXT: xorl {{[0-9]+}}(%esp), %esi
	; X86-NOBMI-NEXT: movl %edi, (%esp) # 4-byte Spill			; X86-NOBMI-NEXT: movl %edi, (%esp) # 4-byte Spill
	; X86-NOBMI-NEXT: xorl {{[0-9]+}}(%esp), %edi			; X86-NOBMI-NEXT: xorl %ebx, %edi
	; X86-NOBMI-NEXT: orl %esi, %edi			; X86-NOBMI-NEXT: orl %esi, %edi
	; X86-NOBMI-NEXT: jne .LBB1_2			; X86-NOBMI-NEXT: jne .LBB1_2
	; X86-NOBMI-NEXT: .LBB1_3: # %for.end			; X86-NOBMI-NEXT: .LBB1_3: # %for.end
	; X86-NOBMI-NEXT: xorl %eax, %eax			; X86-NOBMI-NEXT: xorl %eax, %eax
	; X86-NOBMI-NEXT: xorl %edx, %edx			; X86-NOBMI-NEXT: xorl %edx, %edx
	; X86-NOBMI-NEXT: addl $24, %esp			; X86-NOBMI-NEXT: addl $24, %esp
	; X86-NOBMI-NEXT: popl %esi			; X86-NOBMI-NEXT: popl %esi
	; X86-NOBMI-NEXT: popl %edi			; X86-NOBMI-NEXT: popl %edi
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/mmx-arith.ll

	Show First 20 Lines • Show All 384 Lines • ▼ Show 20 Lines

	define <1 x i64> @test3(<1 x i64>* %a, <1 x i64>* %b, i32 %count) nounwind {			define <1 x i64> @test3(<1 x i64>* %a, <1 x i64>* %b, i32 %count) nounwind {
	; X32-LABEL: test3:			; X32-LABEL: test3:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: pushl %ebp			; X32-NEXT: pushl %ebp
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: cmpl $0, {{[0-9]+}}(%esp)			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X32-NEXT: testl %ecx, %ecx
	; X32-NEXT: je .LBB3_1			; X32-NEXT: je .LBB3_1
	; X32-NEXT: # %bb.2: # %bb26.preheader			; X32-NEXT: # %bb.2: # %bb26.preheader
	; X32-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X32-NEXT: xorl %ebx, %ebx			; X32-NEXT: xorl %ebx, %ebx
	; X32-NEXT: xorl %eax, %eax			; X32-NEXT: xorl %eax, %eax
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: xorl %edx, %edx
	; X32-NEXT: .p2align 4, 0x90			; X32-NEXT: .p2align 4, 0x90
	; X32-NEXT: .LBB3_3: # %bb26			; X32-NEXT: .LBB3_3: # %bb26
	; X32-NEXT: # =>This Inner Loop Header: Depth=1			; X32-NEXT: # =>This Inner Loop Header: Depth=1
				; X32-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X32-NEXT: movl (%edi,%ebx,8), %ebp			; X32-NEXT: movl (%edi,%ebx,8), %ebp
				; X32-NEXT: movl %ecx, %esi
	; X32-NEXT: movl 4(%edi,%ebx,8), %ecx			; X32-NEXT: movl 4(%edi,%ebx,8), %ecx
	; X32-NEXT: addl (%esi,%ebx,8), %ebp			; X32-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X32-NEXT: adcl 4(%esi,%ebx,8), %ecx			; X32-NEXT: addl (%edi,%ebx,8), %ebp
				; X32-NEXT: adcl 4(%edi,%ebx,8), %ecx
	; X32-NEXT: addl %ebp, %eax			; X32-NEXT: addl %ebp, %eax
	; X32-NEXT: adcl %ecx, %edx			; X32-NEXT: adcl %ecx, %edx
				; X32-NEXT: movl %esi, %ecx
	; X32-NEXT: incl %ebx			; X32-NEXT: incl %ebx
	; X32-NEXT: cmpl {{[0-9]+}}(%esp), %ebx			; X32-NEXT: cmpl %esi, %ebx
	; X32-NEXT: jb .LBB3_3			; X32-NEXT: jb .LBB3_3
	; X32-NEXT: jmp .LBB3_4			; X32-NEXT: jmp .LBB3_4
	; X32-NEXT: .LBB3_1:			; X32-NEXT: .LBB3_1:
	; X32-NEXT: xorl %eax, %eax			; X32-NEXT: xorl %eax, %eax
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: xorl %edx, %edx
	; X32-NEXT: .LBB3_4: # %bb31			; X32-NEXT: .LBB3_4: # %bb31
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
	▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/optimize-max-0.ll

	Show First 20 Lines • Show All 444 Lines • ▼ Show 20 Lines
	define void @bar(i8* %r, i32 %s, i32 %w, i32 %x, i8* %j, i32 %d) nounwind {			define void @bar(i8* %r, i32 %s, i32 %w, i32 %x, i8* %j, i32 %d) nounwind {
	; CHECK-LABEL: bar:			; CHECK-LABEL: bar:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: pushl %ebp			; CHECK-NEXT: pushl %ebp
	; CHECK-NEXT: pushl %ebx			; CHECK-NEXT: pushl %ebx
	; CHECK-NEXT: pushl %edi			; CHECK-NEXT: pushl %edi
	; CHECK-NEXT: pushl %esi			; CHECK-NEXT: pushl %esi
	; CHECK-NEXT: subl $28, %esp			; CHECK-NEXT: subl $28, %esp
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi			; CHECK-NEXT: movl %ebp, %edx
	; CHECK-NEXT: movl %ebp, %eax			; CHECK-NEXT: imull %eax, %edx
	; CHECK-NEXT: imull %ecx, %eax
	; CHECK-NEXT: cmpl $1, {{[0-9]+}}(%esp)			; CHECK-NEXT: cmpl $1, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill			; CHECK-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
	; CHECK-NEXT: je LBB1_19			; CHECK-NEXT: je LBB1_19
	; CHECK-NEXT: ## %bb.1: ## %bb10.preheader			; CHECK-NEXT: ## %bb.1: ## %bb10.preheader
	; CHECK-NEXT: shrl $2, %eax			; CHECK-NEXT: movl %edx, %ecx
	; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill			; CHECK-NEXT: shrl $2, %ecx
				; CHECK-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
	; CHECK-NEXT: testl %ebp, %ebp			; CHECK-NEXT: testl %ebp, %ebp
				; CHECK-NEXT: movl %eax, %edi
	; CHECK-NEXT: je LBB1_12			; CHECK-NEXT: je LBB1_12
	; CHECK-NEXT: ## %bb.2: ## %bb.nph9			; CHECK-NEXT: ## %bb.2: ## %bb.nph9
	; CHECK-NEXT: cmpl $0, {{[0-9]+}}(%esp)			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: je LBB1_12			; CHECK-NEXT: je LBB1_12
	; CHECK-NEXT: ## %bb.3: ## %bb.nph9.split			; CHECK-NEXT: ## %bb.3: ## %bb.nph9.split
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: incl %eax			; CHECK-NEXT: incl %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: movl %esi, %edx
	; CHECK-NEXT: LBB1_6: ## %bb7.preheader
	; CHECK-NEXT: ## =>This Loop Header: Depth=1
	; CHECK-NEXT: ## Child Loop BB1_4 Depth 2
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %esi, %esi
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB1_4: ## %bb6			; CHECK-NEXT: LBB1_4: ## %bb6
	; CHECK-NEXT: ## Parent Loop BB1_6 Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ## => This Inner Loop Header: Depth=2
	; CHECK-NEXT: movzbl (%eax,%esi,2), %ebx			; CHECK-NEXT: movzbl (%eax,%esi,2), %ebx
	; CHECK-NEXT: movb %bl, (%edx,%esi)			; CHECK-NEXT: movb %bl, (%edx,%esi)
	; CHECK-NEXT: incl %esi			; CHECK-NEXT: incl %esi
	; CHECK-NEXT: cmpl %edi, %esi			; CHECK-NEXT: cmpl %edi, %esi
	; CHECK-NEXT: jb LBB1_4			; CHECK-NEXT: jb LBB1_4
	; CHECK-NEXT: ## %bb.5: ## %bb9			; CHECK-NEXT: ## %bb.5: ## %bb9
	; CHECK-NEXT: ## in Loop: Header=BB1_6 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB1_4 Depth=1
				; CHECK-NEXT: movl %edi, %ebx
	; CHECK-NEXT: incl %ecx			; CHECK-NEXT: incl %ecx
	; CHECK-NEXT: addl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: addl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: addl %edi, %edx			; CHECK-NEXT: addl %edi, %edx
	; CHECK-NEXT: cmpl %ebp, %ecx			; CHECK-NEXT: cmpl %ebp, %ecx
	; CHECK-NEXT: jne LBB1_6			; CHECK-NEXT: je LBB1_12
				; CHECK-NEXT: ## %bb.6: ## %bb7.preheader
				; CHECK-NEXT: ## in Loop: Header=BB1_4 Depth=1
				; CHECK-NEXT: xorl %esi, %esi
				; CHECK-NEXT: jmp LBB1_4
	; CHECK-NEXT: LBB1_12: ## %bb18.loopexit			; CHECK-NEXT: LBB1_12: ## %bb18.loopexit
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx ## 4-byte Reload
	; CHECK-NEXT: addl %ecx, %eax			; CHECK-NEXT: addl %ecx, %eax
	; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill			; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
	; CHECK-NEXT: cmpl $1, %ebp			; CHECK-NEXT: cmpl $1, %ebp
	; CHECK-NEXT: jbe LBB1_13			; CHECK-NEXT: jbe LBB1_13
	; CHECK-NEXT: ## %bb.7: ## %bb.nph5			; CHECK-NEXT: ## %bb.7: ## %bb.nph5
	; CHECK-NEXT: cmpl $2, {{[0-9]+}}(%esp)			; CHECK-NEXT: cmpl $2, %edi
	; CHECK-NEXT: jb LBB1_13			; CHECK-NEXT: jb LBB1_13
	; CHECK-NEXT: ## %bb.8: ## %bb.nph5.split			; CHECK-NEXT: ## %bb.8: ## %bb.nph5.split
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp			; CHECK-NEXT: movl %edi, %ebp
	; CHECK-NEXT: shrl %ebp			; CHECK-NEXT: shrl %ebp
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: shrl %eax			; CHECK-NEXT: shrl %eax
	; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill			; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx ## 4-byte Reload
	; CHECK-NEXT: addl %eax, %ecx			; CHECK-NEXT: addl %eax, %ecx
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: addl $2, %edx			; CHECK-NEXT: addl $2, %edx
	; CHECK-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill			; CHECK-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx ## 4-byte Reload
	; CHECK-NEXT: addl %edx, %eax			; CHECK-NEXT: addl %edx, %eax
	; CHECK-NEXT: xorl %edx, %edx			; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: xorl %edi, %edi			; CHECK-NEXT: xorl %ebx, %ebx
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB1_9: ## %bb13			; CHECK-NEXT: LBB1_9: ## %bb13
	; CHECK-NEXT: ## =>This Loop Header: Depth=1			; CHECK-NEXT: ## =>This Loop Header: Depth=1
	; CHECK-NEXT: ## Child Loop BB1_10 Depth 2			; CHECK-NEXT: ## Child Loop BB1_10 Depth 2
	; CHECK-NEXT: movl %edi, %ebx			; CHECK-NEXT: movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
	; CHECK-NEXT: andl $1, %ebx			; CHECK-NEXT: andl $1, %ebx
	; CHECK-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill			; CHECK-NEXT: movl %edx, (%esp) ## 4-byte Spill
	; CHECK-NEXT: addl %edx, %ebx			; CHECK-NEXT: addl %edx, %ebx
	; CHECK-NEXT: imull {{[0-9]+}}(%esp), %ebx			; CHECK-NEXT: imull {{[0-9]+}}(%esp), %ebx
	; CHECK-NEXT: addl {{[-0-9]+}}(%e{{[sb]}}p), %ebx ## 4-byte Folded Reload			; CHECK-NEXT: addl {{[-0-9]+}}(%e{{[sb]}}p), %ebx ## 4-byte Folded Reload
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %esi, %esi
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB1_10: ## %bb14			; CHECK-NEXT: LBB1_10: ## %bb14
	; CHECK-NEXT: ## Parent Loop BB1_9 Depth=1			; CHECK-NEXT: ## Parent Loop BB1_9 Depth=1
	; CHECK-NEXT: ## => This Inner Loop Header: Depth=2			; CHECK-NEXT: ## => This Inner Loop Header: Depth=2
	; CHECK-NEXT: movzbl -2(%ebx,%esi,4), %edx			; CHECK-NEXT: movzbl -2(%ebx,%esi,4), %edx
	; CHECK-NEXT: movb %dl, (%eax,%esi)			; CHECK-NEXT: movb %dl, (%eax,%esi)
	; CHECK-NEXT: movzbl (%ebx,%esi,4), %edx			; CHECK-NEXT: movzbl (%ebx,%esi,4), %edx
	; CHECK-NEXT: movb %dl, (%ecx,%esi)			; CHECK-NEXT: movb %dl, (%ecx,%esi)
	; CHECK-NEXT: incl %esi			; CHECK-NEXT: incl %esi
	; CHECK-NEXT: cmpl %ebp, %esi			; CHECK-NEXT: cmpl %ebp, %esi
	; CHECK-NEXT: jb LBB1_10			; CHECK-NEXT: jb LBB1_10
	; CHECK-NEXT: ## %bb.11: ## %bb17			; CHECK-NEXT: ## %bb.11: ## %bb17
	; CHECK-NEXT: ## in Loop: Header=BB1_9 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: incl %edi			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ebx ## 4-byte Reload
				; CHECK-NEXT: incl %ebx
	; CHECK-NEXT: addl %ebp, %ecx			; CHECK-NEXT: addl %ebp, %ecx
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx ## 4-byte Reload			; CHECK-NEXT: movl (%esp), %edx ## 4-byte Reload
	; CHECK-NEXT: addl $2, %edx			; CHECK-NEXT: addl $2, %edx
	; CHECK-NEXT: addl %ebp, %eax			; CHECK-NEXT: addl %ebp, %eax
	; CHECK-NEXT: cmpl {{[-0-9]+}}(%e{{[sb]}}p), %edi ## 4-byte Folded Reload			; CHECK-NEXT: cmpl {{[-0-9]+}}(%e{{[sb]}}p), %ebx ## 4-byte Folded Reload
	; CHECK-NEXT: jb LBB1_9			; CHECK-NEXT: jb LBB1_9
	; CHECK-NEXT: LBB1_13: ## %bb20			; CHECK-NEXT: LBB1_13: ## %bb20
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi
	; CHECK-NEXT: cmpl $1, %edx			; CHECK-NEXT: cmpl $1, %esi
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp
				; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi
	; CHECK-NEXT: je LBB1_19			; CHECK-NEXT: je LBB1_19
	; CHECK-NEXT: ## %bb.14: ## %bb20			; CHECK-NEXT: ## %bb.14: ## %bb20
	; CHECK-NEXT: cmpl $3, %edx			; CHECK-NEXT: cmpl $3, %esi
	; CHECK-NEXT: jne LBB1_24			; CHECK-NEXT: jne LBB1_24
	; CHECK-NEXT: ## %bb.15: ## %bb22			; CHECK-NEXT: ## %bb.15: ## %bb22
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx ## 4-byte Reload
	; CHECK-NEXT: addl %eax, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Folded Spill			; CHECK-NEXT: addl %edx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Folded Spill
	; CHECK-NEXT: testl %ebp, %ebp			; CHECK-NEXT: testl %ebp, %ebp
	; CHECK-NEXT: je LBB1_18			; CHECK-NEXT: je LBB1_18
	; CHECK-NEXT: ## %bb.16: ## %bb.nph			; CHECK-NEXT: ## %bb.16: ## %bb.nph
	; CHECK-NEXT: movl %ebp, %esi			; CHECK-NEXT: movl %ebp, %esi
	; CHECK-NEXT: leal 15(%ebp), %eax			; CHECK-NEXT: leal 15(%ebp), %eax
	; CHECK-NEXT: andl $-16, %eax			; CHECK-NEXT: andl $-16, %eax
	; CHECK-NEXT: imull {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: imull {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: leal 15(%ecx), %ebx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: andl $-16, %ebx			; CHECK-NEXT: addl $15, %edx
	; CHECK-NEXT: addl %eax, %edi			; CHECK-NEXT: andl $-16, %edx
				; CHECK-NEXT: movl %edx, (%esp) ## 4-byte Spill
				; CHECK-NEXT: addl %eax, %ecx
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: leal (%edx,%eax), %ebp			; CHECK-NEXT: leal (%edx,%eax), %ebp
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB1_17: ## %bb23			; CHECK-NEXT: LBB1_17: ## %bb23
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subl $4, %esp			; CHECK-NEXT: subl $4, %esp
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; CHECK-NEXT: pushl %ebx
	; CHECK-NEXT: pushl %ecx			; CHECK-NEXT: pushl %ecx
	; CHECK-NEXT: pushl %edi
	; CHECK-NEXT: pushl %ebp			; CHECK-NEXT: pushl %ebp
				; CHECK-NEXT: movl %ecx, %edi
	; CHECK-NEXT: calll _memcpy			; CHECK-NEXT: calll _memcpy
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl %edi, %ecx
	; CHECK-NEXT: addl $16, %esp			; CHECK-NEXT: addl $16, %esp
	; CHECK-NEXT: addl %ecx, %ebp			; CHECK-NEXT: addl %ebx, %ebp
	; CHECK-NEXT: addl %ebx, %edi			; CHECK-NEXT: addl (%esp), %ecx ## 4-byte Folded Reload
	; CHECK-NEXT: decl %esi			; CHECK-NEXT: decl %esi
	; CHECK-NEXT: jne LBB1_17			; CHECK-NEXT: jne LBB1_17
	; CHECK-NEXT: LBB1_18: ## %bb26			; CHECK-NEXT: LBB1_18: ## %bb26
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx ## 4-byte Reload
	; CHECK-NEXT: addl %ecx, %eax			; CHECK-NEXT: addl %ecx, %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: addl %eax, %edx			; CHECK-NEXT: addl %eax, %edx
	; CHECK-NEXT: shrl %ecx			; CHECK-NEXT: shrl %ecx
	; CHECK-NEXT: subl $4, %esp			; CHECK-NEXT: subl $4, %esp
	; CHECK-NEXT: pushl %ecx			; CHECK-NEXT: pushl %ecx
	; CHECK-NEXT: pushl $128			; CHECK-NEXT: pushl $128
	; CHECK-NEXT: pushl %edx			; CHECK-NEXT: pushl %edx
	; CHECK-NEXT: jmp LBB1_23			; CHECK-NEXT: jmp LBB1_23
	; CHECK-NEXT: LBB1_19: ## %bb29			; CHECK-NEXT: LBB1_19: ## %bb29
	; CHECK-NEXT: testl %ebp, %ebp			; CHECK-NEXT: testl %ebp, %ebp
	; CHECK-NEXT: je LBB1_22			; CHECK-NEXT: je LBB1_22
	; CHECK-NEXT: ## %bb.20: ## %bb.nph11			; CHECK-NEXT: ## %bb.20: ## %bb.nph11
	; CHECK-NEXT: movl %ebp, %esi			; CHECK-NEXT: movl %ebp, %esi
	; CHECK-NEXT: leal 15(%ecx), %ebx			; CHECK-NEXT: movl %eax, %edi
	; CHECK-NEXT: andl $-16, %ebx			; CHECK-NEXT: addl $15, %eax
				; CHECK-NEXT: andl $-16, %eax
				; CHECK-NEXT: movl %eax, (%esp) ## 4-byte Spill
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB1_21: ## %bb30			; CHECK-NEXT: LBB1_21: ## %bb30
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subl $4, %esp			; CHECK-NEXT: subl $4, %esp
	; CHECK-NEXT: pushl %ecx
	; CHECK-NEXT: pushl %edi			; CHECK-NEXT: pushl %edi
				; CHECK-NEXT: pushl %ecx
	; CHECK-NEXT: pushl %ebp			; CHECK-NEXT: pushl %ebp
				; CHECK-NEXT: movl %ecx, %ebx
	; CHECK-NEXT: calll _memcpy			; CHECK-NEXT: calll _memcpy
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl %ebx, %ecx
	; CHECK-NEXT: addl $16, %esp			; CHECK-NEXT: addl $16, %esp
	; CHECK-NEXT: addl %ecx, %ebp			; CHECK-NEXT: addl %edi, %ebp
	; CHECK-NEXT: addl %ebx, %edi			; CHECK-NEXT: addl (%esp), %ecx ## 4-byte Folded Reload
	; CHECK-NEXT: decl %esi			; CHECK-NEXT: decl %esi
	; CHECK-NEXT: jne LBB1_21			; CHECK-NEXT: jne LBB1_21
	; CHECK-NEXT: LBB1_22: ## %bb33			; CHECK-NEXT: LBB1_22: ## %bb33
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: addl %eax, %ecx			; CHECK-NEXT: addl %eax, %ecx
	; CHECK-NEXT: shrl %eax			; CHECK-NEXT: shrl %eax
	; CHECK-NEXT: subl $4, %esp			; CHECK-NEXT: subl $4, %esp
	▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[regalloc] Ensure Query::collectInterferringVregs is called before interval iterationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 334708

llvm/include/llvm/CodeGen/LiveIntervalUnion.h

llvm/lib/CodeGen/LiveIntervalUnion.cpp

llvm/lib/CodeGen/LiveRegMatrix.cpp

llvm/lib/CodeGen/RegAllocGreedy.cpp

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/X86/X86Subtarget.h

llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll

llvm/test/CodeGen/X86/bug26810.ll

llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

llvm/test/CodeGen/X86/i128-mul.ll

llvm/test/CodeGen/X86/mmx-arith.ll

llvm/test/CodeGen/X86/optimize-max-0.ll

[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
ClosedPublic