This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
1
CriticalAntiDepBreaker.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1
pr27681.ll

Differential D20456

[CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker
ClosedPublic

Authored by mbodart on May 19 2016, 4:52 PM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
hfinkel

Commits

rG05aeeb5cf19e: [CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker
rL270935: [CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker

Summary

This fix addresses the failures reported in pr27681, pr27580 and pr27804, caused by a bug in CriticalAntiDepBreaker.cpp, which increased in exposure when I recently enabled the post-RA-scheduler for more X86 cpus.

At the heart of the issue is that MCSuperRegIterator does not exhaust all registers in the underlying container.
For example if starting with %CL it will encounter %CL, %CX, %ECX and %RCX, but not %CH. This isn't a direct problem,
but it leads to some subtle differences in the various register properties maintained by this optimization,
where %CH stands out as different.

In the case of pr27681, all %RCX subregs other than %CH get conservatively added to the KeepRegs set when we see
a %CL def tied to a use in PreScanInstruction. When we later see an %ECX def in ScanInstruction's def processing,
we skip state updates because of the %ECX presence in KeepRegs. This leaves %CH in an incorrect "free" state.

My fix is to ensure %CH gets added to KeepRegs as well, and to avoid choosing a KeepRegs register
when looking for a free register.

Any use of MCSuperRegIterator or MCRegAliasIterator is prone to this subtle issue. I briefly looked at the other
uses in this optimization, and it was not clear if they could be problematic, so I chose not to touch them.

Diff Detail

Event Timeline

mbodart updated this revision to Diff 57878.May 19 2016, 4:52 PM

mbodart retitled this revision from to [CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker.

mbodart updated this object.

mbodart added reviewers: qcolombet, hfinkel.

mbodart added a subscriber: llvm-commits.

spatel added a subscriber: spatel.May 19 2016, 5:32 PM

mbodart added a reviewer: MatzeB.May 20 2016, 9:18 AM

I don't understand the nature of the bug with the explanation given here (and I don't know much about this pass). Why do you need this double loop, would iterating over the register alias work as well? The double loop here also seems to requires quadratic runtime in the depth of the sub/super register graph.

Based on the comment in the source it is not clear to me why we have to include alias registers into the KeepRegs set that are not touched by the instruction.

I was also not familiar with this code, until diving in to fix pr27681.

KeepRegs was introduced in r83198 of PostRAScheduler.cpp back in 2009,
and the tied operand checks existed before then. In r212275 of
CriticalAntiDepBreaker.cpp (July 2014), and possibly other change
sets along the way, the use of KeepRegs and tied operands was tweaked
to fix pr20020.

From what I can surmise, the original intent of KeepRegs was to mark
registers as "hands off", when looking for a candidate anti-dependent
register def. So this change is muddying the water a little bit,
as I'm now also using it to say "hands off" when choosing a replacement
register. However, I did not see any other obvious solution.
Setting Classes[Reg] to -1 is another way to say "hands off"
for the purposes of choosing a replacement reg, but it's not clear
how that can be made to work in the current structure for this case.

There is an assumption in ScanInstruction that if a def'd Reg is in KeepRegs,
or is tied to a use, then we do not need to update the liveness info
(Def/KillIndices, KeepRegs, Classes, RegRefs) for it or any of its subregs,
nor set Classes to -1 for any of its superregs.

In the case of pr27681, this means those datastructures are not updated
for %CH when we see a def of %ECX, and ultimately %CH is incorrectly treated
as a suitable replacement register in findSuitableFreeRegister.

I don't know why tied operands are treated the way they are, but the bug
is not a direct consequence of tied operands. Rather, it is in the assumptions
being made about registers present in the KeepRegs set, and implicitly
their subregs.

As to the iterator question, yes it is quadratic, and no, MCRegAliasIterator
does not suffice. If you know of an iterator that would fill the need here,
great. But I couldn't find one. I'm not worried about the quadratic nature,
as the sizes here are very small.

Hi Mitch,

I haven’t looked at the patch but given your description, it sounds like using a proper tracking of the liveness (like LivePhysReg) would be the right way to go.
Does that sound reasonable?

Cheers,
-Quentin

Looking into the testcase this seems to be a problem with tracking which registers are actually free and usable, the algorithm somehow fails to recognize that %ch is not free when %ecx is used.

The KeepRegsSet on the other hand excludes some registers from renaming because that would change program semantics. This does not appear to be the right fix, the fact that the andb in the testcase is a tied operand seems accidental to me. For a proper fix we should have findSuitableFreeRegister() report false for %ch.

Some possible ways to proceed:

It would be good to have a minimal test case as a .mir file which just does "llc -run-pass post-RA-scheduler -verify-machineinstrs xxx.mir" that serves well as a testcase and should also make it realistic to single step through the algorithm with a debugger...
I wonder if isNewRegClobberedByRefs() should check subregisters instead of just if(... CheckOper.getReg() != NewReg) continue;
The We just went back in time and modified history; ... part also does not seem to respect subregisters/superregisters.

This revision now requires changes to proceed.May 23 2016, 8:50 PM

I wasn't familiar with .mir files, so thanks for pointing them out!
Unfortunately creating one directly from the original pr27681.ll file yields
a .mir file which incurs various parse and semantic errors when run through
llc, so it may take some effort to get a clean .mir file. Perhaps someone
with more experience than I can whip one out quickly?

Wrt respecting sub/super regs, there are several areas in this optimization
that look suspect to me. But that doesn't necessarily mean they are incorrect.
Maybe they are, maybe they aren't but it requires further study.
As just one example, right off the bat in StartBlock, %CL is found to be
live out. So %CL, and rightfully its superregs %CX, %ECX and %RCX are
also set to a live state, but %CH is not. It certainly appears odd that
%CH's superregs are live while %CH is not, but it's technically accurate.
And it illustrates that properties of a superreg do not necessarily apply
to its subregs.

For a proper fix we should have findSuitableFreeRegister() report false for %ch.

Wholeheartedly agree and that's what I'm trying to achieve.

The isNewRegClobberedByRefs suggestion is unfortunately insufficient.
The refs in this case are the references to the antidependent register %BL,
which does not include the instruction that defines %ECX.

I'm not familiar with LivePhysReg, so I don't know how difficult it would
be to apply here (well, both because of that unfamiliarity, and because I've
found it very difficult to associate a definitive meaning to the liveness
data structures maintained here). Fundamentally I think the shortcuts being
taken for tied operands, and registers already present in the KeepRegs set,
are a key piece of the problem. So even if LivePhysReg were employed,
those shortcuts could still lead to issues.

We need to recognize that the %ECX def clobbers %CH. Normally this appears
to be achieved in ScanInstruction by updating DefIndices[%CH] and Classes[%CH].
But that processing is skipped because of %ECX's presence in KeepRegs,
which in turn was caused by %CL having a tied instance.

I'll spend a little more time looking at those shortcuts to see if
a straightforward solution comes to mind, using DefIndices and Classes.
But if a major revamp comes into play, I likely won't have the time
to tackle it.

In D20456#438492, @mbodart wrote:

I wasn't familiar with .mir files, so thanks for pointing them out!
Unfortunately creating one directly from the original pr27681.ll file yields
a .mir file which incurs various parse and semantic errors when run through
llc, so it may take some effort to get a clean .mir file. Perhaps someone
with more experience than I can whip one out quickly?

That is unfortunate. .mir files are still a new thing and have many rough edges, nonetheless they are often the only way to produce a sane and stable testcase. It would be nice if you could file a bug with instructions on how to reproduce the problem.

Wrt respecting sub/super regs, there are several areas in this optimization
that look suspect to me. But that doesn't necessarily mean they are incorrect.
Maybe they are, maybe they aren't but it requires further study.
As just one example, right off the bat in StartBlock, %CL is found to be
live out. So %CL, and rightfully its superregs %CX, %ECX and %RCX are
also set to a live state, but %CH is not. It certainly appears odd that
%CH's superregs are live while %CH is not, but it's technically accurate.
And it illustrates that properties of a superreg do not necessarily apply
to its subregs.

For a proper fix we should have findSuitableFreeRegister() report false for %ch.

Wholeheartedly agree and that's what I'm trying to achieve.

The isNewRegClobberedByRefs suggestion is unfortunately insufficient.
The refs in this case are the references to the antidependent register %BL,
which does not include the instruction that defines %ECX.

I'm not familiar with LivePhysReg, so I don't know how difficult it would
be to apply here (well, both because of that unfamiliarity, and because I've
found it very difficult to associate a definitive meaning to the liveness
data structures maintained here). Fundamentally I think the shortcuts being
taken for tied operands, and registers already present in the KeepRegs set,
are a key piece of the problem. So even if LivePhysReg were employed,
those shortcuts could still lead to issues.

While I would strongly recommend LivePhysRegs for newly written code, I fear that it means rewriting (bigger?) parts of the pass here. Given how confusing the current code is we have to be careful not to introduce new bugs because we do not understand the current code...

We need to recognize that the %ECX def clobbers %CH. Normally this appears
to be achieved in ScanInstruction by updating DefIndices[%CH] and Classes[%CH].
But that processing is skipped because of %ECX's presence in KeepRegs,
which in turn was caused by %CL having a tied instance.

Ah that reasoning makes sense. What happens if you remove that KeepRegs check there? On first sight this just seems like a shortcut to me that we can do without...

I'll spend a little more time looking at those shortcuts to see if
a straightforward solution comes to mind, using DefIndices and Classes.
But if a major revamp comes into play, I likely won't have the time
to tackle it.

In D20456#438492, @mbodart wrote:

I'm not familiar with LivePhysReg, so I don't know how difficult it would
be to apply here (well, both because of that unfamiliarity, and because I've
found it very difficult to associate a definitive meaning to the liveness
data structures maintained here). Fundamentally I think the shortcuts being
taken for tied operands, and registers already present in the KeepRegs set,
are a key piece of the problem. So even if LivePhysReg were employed,
those shortcuts could still lead to issues.

I have to chime in here that I added the tied operand checks to fix PR20020 ( D4351 ):
http://llvm.org/bugs/show_bug.cgi?id=20020

This was one of my earliest commits. From the comments, you can see that I didn't understand how this pass works, but somehow everyone else also was and still is unsure of how this pass works!

So if nuking that part of the code (without reintroducing the bug it tried to fix of course) would help solve this bug, I'd fully support that.

Apparently I forgot to push submit!

test/CodeGen/X86/pr27681.ll
5	I would recommend to use a MIR test instead. That way we do not need to be (un)lucky with the register allocation. Moreover the test should be much smaller and easier to understand that way.

Hi,

(Still haven’t looked at the code!)

I am going to state the obvious but, given we have introduced regressions I am fine with a hacky fix as long as it is low risk. I do not have a good sense of risk at the moment, does anyone have a good feeling on that?
For the longer term solution, it sounds like we should really step back and take the time to clarify how the code is working and go for a proper redesign.

I don’t think we have a code owner for that pass, so if we should to go that direction, someone has to step up for the long term fix.

Cheers,
-Quentin

Please do not push the fix in the current form.

In D20456#439547, @qcolombet wrote:

I am going to state the obvious but, given we have introduced regressions I am fine with a hacky fix as long as it is low risk. I do not have a good sense of risk at the moment, does anyone have a good feeling on that?
For the longer term solution, it sounds like we should really step back and take the time to clarify how the code is working and go for a proper redesign.

Please do not push the fix in the current form. The fact that there was a tied operand involved seems accidental to me it very likely does not fix the real problem. I believe the only reason we are not seeing the problem more often is that subregister usage is so extremely rare on x86 (I think I only found a handful places in the whole of the llvm test-suite last time I looked).

I don’t think we have a code owner for that pass, so if we should to go that direction, someone has to step up for the long term fix.

The current code *DepBreaker.* code is bad, does liveness in untypical ways that makes it very hard to understand and I am pretty sure it was always broken. But maybe we just shouldn't enable it then before it is fixed?

I agree with Matthias, unless it wasn’t clear :).

Sanjay, I tried backing out your fix for pr20020.ll, but the test passes either way. So something else changed along the way.
This is a great argument for using a .mir test, and I'll pursue that.

I'm testing an alternative "fix" isolated to ScanInstruction. Instead of skipping the state updates when we see Reg in KeepRegs, I do all of the state updates except for removing Reg from KeepRegs. But I also heed the immediate continue, with no state updates, in the tied operand case. In particular this will reflect in DefIndices and Classes that %CH is modified by the %ECX def, and that's enough to prevent using %CH in findSuitableFreeRegister.

I did some more digging re the tied operand check, and I think the idea there is that a tied operand will still be live into the instruction, so apparently the author thought this meant the def need not be treated as a live range killer. Mucking with that assumption is not something I want to delve into with this particular change.

Matthias, this isn't just being enabled now. The X86 post scheduler had been enabled for atom, bonnell, btver2 and slm.
I agree with your general concern re the stability of the antidepence breaker, simply from paranoia after staring at the code.
But a complete overhaul is a longer term project.

For the short term I agree with Quentin that a small, safe, patch is reasonable.
But per one of Sanjay's bugzilla comments, it may also be reasonable to disable the antidepence breaker for X86.
We'd have to run some performance tests to verify that.

In D20456#439675, @mbodart wrote:

I did some more digging re the tied operand check, and I think the idea there is that a tied operand will still be live into the instruction, so apparently the author thought this meant the def need not be treated as a live range killer. Mucking with that assumption is not something I want to delve into with this particular change.

My understanding is that some things cannot be renamed legally and KeepRegs collects those:
For calls the ABI forces some values to some register, and for two-address code (the tied-operand case) we cannot simply rename the input because it has to match the output. That much of the code seems fine to me.

Matthias, this isn't just being enabled now. The X86 post scheduler had been enabled for atom, bonnell, btver2 and slm.
I agree with your general concern re the stability of the antidepence breaker, simply from paranoia after staring at the code.
But a complete overhaul is a longer term project.

For the short term I agree with Quentin that a small, safe, patch is reasonable.
But per one of Sanjay's bugzilla comments, it may also be reasonable to disable the antidepence breaker for X86.
We'd have to run some performance tests to verify that.

Yes having a fix for the issue at hand is perfectly fine, I am not trying to talk you into rewriting the pass. I was only objecting to the patch as it stands right now because adding the super/sub register iterator this way for a tied operand seems to fix the issue by accident not by design.

In D20456#439675, @mbodart wrote:

Sanjay, I tried backing out your fix for pr20020.ll, but the test passes either way. So something else changed along the way.

I confirmed that locally. There's no difference in the asm for that code after backing out the tied operand blob, so it's not just due to bogus CHECK lines in the test. I also confirmed that the original C program from PR20020 is not crashing anymore even without the changes from r212275.

In D20456#439854, @spatel wrote:

In D20456#439675, @mbodart wrote:

Sanjay, I tried backing out your fix for pr20020.ll, but the test passes either way. So something else changed along the way.

I confirmed that locally. There's no difference in the asm for that code after backing out the tied operand blob, so it's not just due to bogus CHECK lines in the test. I also confirmed that the original C program from PR20020 is not crashing anymore even without the changes from r212275.

From a high level the fix in your revision does make sense to me. We cannot just rename the input of a two-address instruction when we do not rename the output at the same time.

spatel mentioned this in D20649: [X86] Add a pattern that uses GR16_ABCD rather than GR32_ABCD to avoid falsely marking whole 32 bit register as live..May 26 2016, 8:23 AM

I uploaded the alternate fix I alluded to, which is all in ScanInstruction.
This retains the KeepReg setting from Sanjay's fix for pr20020, but performs the other liveness updates.
In particular, DefIndices[%CH] gets updated, and prevents %CH from being used as a replacement register.

Matthias, my comments about tied operands/liveness were for its use in ScanInstruction, which predates
Sanjay's use in PrescanInstruction. I also tested a slight variation of this new change to the effect:

bool Keep = KeepRegs.test(Reg);
if (!Keep && MI.isRegTiedToUseOperand(i))

continue;

This also resolves the issue with no other known fails. But I cannot say for certain whether this tweak
is appropriate, so I left it out of the change set. Simililary, I cannot say for certain that dropping my
earlier "alias closure" change is appropriate, but for simplicitly of the change set I dropped it.
If we eventually find ourselves needing more tweaks to the antidependence breaker, I would recommend
simply disabling it. For X86 at least, prior to my post scheduling change it was only being used
for a small number of -mcpu cases, so I would be surprised if there's any signficant performance impact.
And fixing these little issues piecemeal is an onion peeling time sink.

The new pr27681.mir test is my first ever .mir test, and the process by which it was created seems fragile.
I started with an llc IR dump, and deleted things until it successfully compiled and still showed the original
bug and the fix. But various tweaks like dropping -mcpu, or trying -mattr, prevented the original bug
from happening. In the end the test works, but I have no idea how coherent/robust the .mir file is.

Additionally, at least one other issue was uncovered. It varies depending on -mcpu/-mattr, but there
are cases where we choose %bh as the free register to replace %bl, and that doesn't actually break the
dependence! It's a perf issue though, not stability, so I don't want to fix it in this change set. But it's
the reason my test checks for %b instead of %bl.

This patch looks good enough to me now. I still have a question (see below) but that may be because of my limited understanding of the pass.

Thanks for the .mir test, it should already be more robust than the .ll one. I think there is room for simplifying it further though:

lib/CodeGen/CriticalAntiDepBreaker.cpp
282–283	I am surprised that we need to guard `KeepRegs.reset()` with an `if` here. But well it preserves some existing behavior so if it is somehow necessary to have the `if` here keep it.
test/CodeGen/X86/pr27681.mir
8–11 ↗	(On Diff #58656)	I tended to use the even shorter `define void @main() { ret void }` here.
16–22 ↗	(On Diff #58656)	This can probably be simplified. I think `alignment`, `exposesReturnsTwice`, `hasInlineAsm`, `isSSA`, `tracksSubRegLiveness`, `calleeSavedRegisters` have their default values and do not need to be mentioned.
26–38 ↗	(On Diff #58656)	Similarily here, I assume most of the info here can be omitted.
45–54 ↗	(On Diff #58656)	It looks like only `stack.1`, `stack.5` and `stack.6` are used. Maybe you can reduce and rename this to 3 stack objects?
60–70 ↗	(On Diff #58656)	do the contents of bb.0 actually matter for the test to work?
91 ↗	(On Diff #58656)	for stylistic reasons I would move `CHECK-LABEL` upwards and check for `CHECK-LABEL: name: main` next to the `name:` line.
96–114 ↗	(On Diff #58656)	Have you tried removing the lower part (bb.2, bb.3) and just ending bb.1 with RETQ to simplify the testcase further?

This revision is now accepted and ready to land.May 26 2016, 1:19 PM

Thanks for the .mir suggestions!

I had originally tried some of those, but llc choked.
I reduced the test a bit more, again limited by llc.

When I removed chunks of instructions from uninteresting blocks, either llc choked
or the problem did not reproduce. So I'll leave that code in. Perhaps more reduction
is feasible, but this test size seems reasonable.

mbodart added inline comments.May 26 2016, 3:20 PM

test/CodeGen/X86/pr27681.mir
9–12 ↗	(On Diff #58702)	Tried that, but llc want' to see a label corresponding to bb.0.entry.
61–71 ↗	(On Diff #58702)	When I tried removing instructions from bb.0, bb.2 and bb.3, either llc crashed, or the bug did not reproduce.
90 ↗	(On Diff #58702)	Just removing instructions from bb.2 caused the failure to not reproduce. I think, but haven't verified, that having %cl be live out is needed. I originally had the CHECK-LABEL up near main, but llc chokes on that.

MatzeB added inline comments.May 26 2016, 3:29 PM

test/CodeGen/X86/pr27681.mir
9–12 ↗	(On Diff #58702)	I did not know it referenced the IR blocks. But then my guess would be that you can rename 'bb.0.entry' to 'bb.0' to break this reference.
90 ↗	(On Diff #58702)	The problem is probably the different comment characters. Currently `#` is used to introduce comments in yaml but `;` is used for comments inside the code blocks.

mbodart added inline comments.May 26 2016, 3:37 PM

test/CodeGen/X86/pr27681.mir
9–12 ↗	(On Diff #58702)	Yup, that works. Will fix.
90 ↗	(On Diff #58702)	Interesting, very picky. I had tried using #, but I had the CHECK-LABEL line after the name line. Putting it before the name line works, so I'll do that.

Closed by commit rL270935: [CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker (authored by mbodart). · Explain WhyMay 26 2016, 4:15 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

CriticalAntiDepBreaker.cpp

20 lines

test/

CodeGen/

X86/

pr27681.ll

143 lines

Diff 57878

lib/CodeGen/CriticalAntiDepBreaker.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
// Example: an x86 "xor %eax, %eax" will have one source operand tied to the		// Example: an x86 "xor %eax, %eax" will have one source operand tied to the
// def register but not the second (see PR20020 for details).		// def register but not the second (see PR20020 for details).
// FIXME: can this check be relaxed to account for undef uses		// FIXME: can this check be relaxed to account for undef uses
// of a register? In the above 'xor' example, the uses of %eax are undef, so		// of a register? In the above 'xor' example, the uses of %eax are undef, so
// earlier instructions could still replace %eax even though the 'xor'		// earlier instructions could still replace %eax even though the 'xor'
// itself can't be changed.		// itself can't be changed.
if (MI.isRegTiedToUseOperand(i) &&		if (MI.isRegTiedToUseOperand(i) &&
Classes[Reg] == reinterpret_cast<TargetRegisterClass *>(-1)) {		Classes[Reg] == reinterpret_cast<TargetRegisterClass *>(-1)) {
for (MCSubRegIterator SubRegs(Reg, TRI, /IncludeSelf=/true);		// Traverse the closure of all of Reg's aliased registers. This ensures,
SubRegs.isValid(); ++SubRegs) {		// on X86 for example, that we encounter both the high and low byte
		// register (e.g. %CL and %CH) when Reg is a byte register. While such
		// byte registers do not overlap, the presence in KeepRegs of any larger
		// containing register such as %ECX leads to a shortcut being taken in
		// ScanInstruction which may be unsafe for Reg's bytereg counterpart.
		for (MCSuperRegIterator SuperRegs(Reg, TRI, /IncludeSelf=/true);
		SuperRegs.isValid(); ++SuperRegs)
		for (MCSubRegIterator SubRegs(SuperRegs, TRI, /IncludeSelf=*/true);
		SubRegs.isValid(); ++SubRegs)
KeepRegs.set(*SubRegs);		KeepRegs.set(*SubRegs);
}		}
for (MCSuperRegIterator SuperRegs(Reg, TRI);
SuperRegs.isValid(); ++SuperRegs) {
KeepRegs.set(*SuperRegs);
}
}

if (MO.isUse() && Special) {		if (MO.isUse() && Special) {
if (!KeepRegs.test(Reg)) {		if (!KeepRegs.test(Reg)) {
for (MCSubRegIterator SubRegs(Reg, TRI, /IncludeSelf=/true);		for (MCSubRegIterator SubRegs(Reg, TRI, /IncludeSelf=/true);
SubRegs.isValid(); ++SubRegs)		SubRegs.isValid(); ++SubRegs)
KeepRegs.set(*SubRegs);		KeepRegs.set(*SubRegs);
}		}
}		}
Show All 38 Lines	for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
// reset the kill state, any restrictions, and references.		// reset the kill state, any restrictions, and references.
for (MCSubRegIterator SRI(Reg, TRI, true); SRI.isValid(); ++SRI) {		for (MCSubRegIterator SRI(Reg, TRI, true); SRI.isValid(); ++SRI) {
unsigned SubregReg = *SRI;		unsigned SubregReg = *SRI;
DefIndices[SubregReg] = Count;		DefIndices[SubregReg] = Count;
KillIndices[SubregReg] = ~0u;		KillIndices[SubregReg] = ~0u;
KeepRegs.reset(SubregReg);		KeepRegs.reset(SubregReg);
Classes[SubregReg] = nullptr;		Classes[SubregReg] = nullptr;
RegRefs.erase(SubregReg);		RegRefs.erase(SubregReg);
}		}
// Conservatively mark super-registers as unusable.		// Conservatively mark super-registers as unusable.
		MatzeBUnsubmitted Not Done Reply Inline Actions I am surprised that we need to guard `KeepRegs.reset()` with an `if` here. But well it preserves some existing behavior so if it is somehow necessary to have the `if` here keep it. MatzeB: I am surprised that we need to guard `KeepRegs.reset()` with an `if` here. But well it…
for (MCSuperRegIterator SR(Reg, TRI); SR.isValid(); ++SR)		for (MCSuperRegIterator SR(Reg, TRI); SR.isValid(); ++SR)
Classes[SR] = reinterpret_cast<TargetRegisterClass >(-1);		Classes[SR] = reinterpret_cast<TargetRegisterClass >(-1);
}		}
}		}
for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
MachineOperand &MO = MI.getOperand(i);		MachineOperand &MO = MI.getOperand(i);
if (!MO.isReg()) continue;		if (!MO.isReg()) continue;
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != Order.size(); ++i) {
// If NewReg is dead and NewReg's most recent def is not before		// If NewReg is dead and NewReg's most recent def is not before
// AntiDepReg's kill, it's safe to replace AntiDepReg with NewReg.		// AntiDepReg's kill, it's safe to replace AntiDepReg with NewReg.
assert(((KillIndices[AntiDepReg] == ~0u) != (DefIndices[AntiDepReg] == ~0u))		assert(((KillIndices[AntiDepReg] == ~0u) != (DefIndices[AntiDepReg] == ~0u))
&& "Kill and Def maps aren't consistent for AntiDepReg!");		&& "Kill and Def maps aren't consistent for AntiDepReg!");
assert(((KillIndices[NewReg] == ~0u) != (DefIndices[NewReg] == ~0u))		assert(((KillIndices[NewReg] == ~0u) != (DefIndices[NewReg] == ~0u))
&& "Kill and Def maps aren't consistent for NewReg!");		&& "Kill and Def maps aren't consistent for NewReg!");
if (KillIndices[NewReg] != ~0u \|\|		if (KillIndices[NewReg] != ~0u \|\|
Classes[NewReg] == reinterpret_cast<TargetRegisterClass *>(-1) \|\|		Classes[NewReg] == reinterpret_cast<TargetRegisterClass *>(-1) \|\|
		KeepRegs.test(NewReg) \|\|
KillIndices[AntiDepReg] > DefIndices[NewReg])		KillIndices[AntiDepReg] > DefIndices[NewReg])
continue;		continue;
// If NewReg overlaps any of the forbidden registers, we can't use it.		// If NewReg overlaps any of the forbidden registers, we can't use it.
bool Forbidden = false;		bool Forbidden = false;
for (SmallVectorImpl<unsigned>::iterator it = Forbid.begin(),		for (SmallVectorImpl<unsigned>::iterator it = Forbid.begin(),
ite = Forbid.end(); it != ite; ++it)		ite = Forbid.end(); it != ite; ++it)
if (TRI->regsOverlap(NewReg, *it)) {		if (TRI->regsOverlap(NewReg, *it)) {
Forbidden = true;		Forbidden = true;
▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

test/CodeGen/X86/pr27681.ll

				; RUN: llc < %s -mtriple=i386 -post-RA-scheduler=true \| FileCheck %s
				; Regression test for pr27681 (and pr27580, pr27804).
				; We're trying to verify that the antidependency breaker does not
				; erroneously choose an unavailable byte register to break a dependence.
				; The test is of course highly dependent on register allocation.
				qcolombetUnsubmitted Not Done Reply Inline Actions I would recommend to use a MIR test instead. That way we do not need to be (un)lucky with the register allocation. Moreover the test should be much smaller and easier to understand that way. qcolombet: I would recommend to use a MIR test instead. That way we do not need to be (un)lucky with the…
				@c = global i16 4, align 2
				@d = global i16 5, align 2
				@a = common global i32 0, align 4
				@i = common global i32 0, align 4
				@b = common global i16 0, align 2
				@e = common global i16 0, align 2
				@f = common global i16 0, align 2
				@g = common global i16 0, align 2
				@h = common global i64 0, align 8

				; CHECK-LABEL: @main
				define i32 @main() {
				entry:
				%.pr.i = load i32, i32* @a, align 4
				%tobool.i = icmp eq i32 %.pr.i, 0
				%0 = load i32, i32* @i, align 4
				%neg.i = xor i32 %0, -1
				%1 = load i16, i16* @g, align 2
				%2 = load i16, i16* @c, align 2
				%tobool13.i = icmp ne i16 %2, 0
				%3 = load i16, i16* @b, align 2
				%conv2.i = zext i16 %3 to i64
				br i1 %tobool.i, label %for.condthread-pre-split.us.preheader.i, label %entry.split.i

				for.condthread-pre-split.us.preheader.i: ; preds = %entry
				%4 = load i16, i16* @f, align 2
				%tobool1.i = icmp eq i16 %4, 0
				br i1 %tobool1.i, label %for.condthread-pre-split.us.i.us.preheader, label %for.condthread-pre-split.us.preheader.i.split

				for.condthread-pre-split.us.i.us.preheader: ; preds = %for.condthread-pre-split.us.preheader.i
				%e.promoted.i = load i16, i16* @e, align 2
				%d.promoted.i = load i16, i16* @d, align 2
				%.pre = load i64, i64* @h, align 8
				br label %for.condthread-pre-split.us.i.us

				; CHECK-LABEL: .LBB0_8:
				; CHECK: shrl %cl, %edi
				; CHECK: xorl %esi, %edi
				; CHECK: testl %edx, %edx
				; CHECK: setne %cl
				; CHECK: orb %bl, %cl
				; CHECK: movzbl %cl, %esi
				; CHECK: movl {{[0-9]+}}(%esp), %ecx
				; CHECK: sarl %cl, %edx
				; CHECK: testl %edx, %edx
				; CHECK: setne %cl
				; The original bug was using %ch instead of %bl in the following andb
				; instruction, not recognizing %ch was clobbered by the above reload to %ecx.
				; CHECK: andb %bl, %cl
				; CHECK-DAG: movl $0, %ebx
				; CHECK-DAG: cmpl $-1, %edi
				; CHECK: je .LBB0_10

				for.condthread-pre-split.us.i.us: ; preds = %for.condthread-pre-split.us.i.us.preheader, %cleanup49.thread.us.i.us
				%5 = phi i64 [ %and.us.i.us, %cleanup49.thread.us.i.us ], [ %.pre, %for.condthread-pre-split.us.i.us.preheader ]
				%add67.us.i.us = phi i16 [ %add.us.i.us, %cleanup49.thread.us.i.us ], [ %e.promoted.i, %for.condthread-pre-split.us.i.us.preheader ]
				%conv4265.us.i.us = phi i16 [ %conv42.us.i.us, %cleanup49.thread.us.i.us ], [ %d.promoted.i, %for.condthread-pre-split.us.i.us.preheader ]
				%j.0.ph.us.i.us = phi i32 [ %xor.us.i.us, %cleanup49.thread.us.i.us ], [ 0, %for.condthread-pre-split.us.i.us.preheader ]
				%conv4.us.i.us = zext i16 %conv4265.us.i.us to i32
				%mul.us.i.us = mul nsw i32 %conv4.us.i.us, %neg.i
				%tobool8.us.i.us = icmp ne i16 %conv4265.us.i.us, 0
				%6 = or i16 %conv4265.us.i.us, %1
				%7 = icmp ne i16 %6, 0
				%lor.ext.us.i.us = zext i1 %7 to i32
				%neg9.us.i.us = xor i32 %lor.ext.us.i.us, -1
				%not.tobool11.us.i.us = icmp ne i64 %5, 0
				%8 = and i1 %tobool13.i, %not.tobool11.us.i.us
				%mul14.us.i.us = select i1 %8, i32 %mul.us.i.us, i32 0
				%shr.us.i.us = lshr i32 %conv4.us.i.us, %mul14.us.i.us
				%xor.us.i.us = xor i32 %shr.us.i.us, %neg9.us.i.us
				%tobool18.us.i.us = icmp ne i32 %mul.us.i.us, 0
				%9 = or i1 %tobool8.us.i.us, %tobool18.us.i.us
				%lor.ext20.us.i.us = zext i1 %9 to i16
				%add.us.i.us = add i16 %lor.ext20.us.i.us, %conv4265.us.i.us
				%shr26.us.i.us = ashr i32 %mul.us.i.us, %0
				%tobool27.us.i.us = icmp ne i32 %shr26.us.i.us, 0
				%10 = and i1 %tobool8.us.i.us, %tobool27.us.i.us
				%tobool31.us.i.us = icmp eq i32 %xor.us.i.us, 0
				br i1 %tobool31.us.i.us, label %land.end38.us.i.us, label %land.rhs32.us.i.us

				land.rhs32.us.i.us: ; preds = %for.condthread-pre-split.us.i.us
				%tobool33.us.i.us = icmp ne i32 %j.0.ph.us.i.us, 0
				%11 = or i1 %tobool33.us.i.us, %not.tobool11.us.i.us
				br label %land.end38.us.i.us

				land.end38.us.i.us: ; preds = %land.rhs32.us.i.us, %for.condthread-pre-split.us.i.us
				%12 = phi i1 [ false, %for.condthread-pre-split.us.i.us ], [ %11, %land.rhs32.us.i.us ]
				%land.ext39.us.i.us = zext i1 %12 to i16
				%land.ext39.op.us.i.us = xor i16 %land.ext39.us.i.us, -1
				%conv42.us.i.us = select i1 %10, i16 0, i16 %land.ext39.op.us.i.us
				%not.cmp.us.i.us = icmp ult i16 %conv42.us.i.us, %2
				br i1 %not.cmp.us.i.us, label %fn1.exit, label %cleanup49.thread.us.i.us

				cleanup49.thread.us.i.us: ; preds = %land.end38.us.i.us
				%rem.us.i.us = srem i64 %conv2.i, %5
				%conv.mask.us.i.us = and i16 %add67.us.i.us, 255
				%conv3.us.i.us = zext i16 %conv.mask.us.i.us to i64
				%and.us.i.us = and i64 %rem.us.i.us, %conv3.us.i.us
				store i64 %and.us.i.us, i64* @h, align 8
				br label %for.condthread-pre-split.us.i.us

				for.condthread-pre-split.us.preheader.i.split: ; preds = %for.condthread-pre-split.us.preheader.i
				br i1 %tobool13.i, label %for.condthread-pre-split.us.i.us1.preheader, label %for.condthread-pre-split.us.i.preheader

				for.condthread-pre-split.us.i.preheader: ; preds = %for.condthread-pre-split.us.preheader.i.split
				br label %for.condthread-pre-split.us.i

				for.condthread-pre-split.us.i.us1.preheader: ; preds = %for.condthread-pre-split.us.preheader.i.split
				br label %for.condthread-pre-split.us.i.us1

				for.condthread-pre-split.us.i.us1: ; preds = %for.condthread-pre-split.us.i.us1.preheader, %for.condthread-pre-split.us.i.us1
				br label %for.condthread-pre-split.us.i.us1

				for.condthread-pre-split.us.i: ; preds = %for.condthread-pre-split.us.i.preheader, %for.condthread-pre-split.us.i
				br label %for.condthread-pre-split.us.i

				entry.split.i: ; preds = %entry
				br i1 %tobool13.i, label %for.cond.us72.i.preheader, label %for.cond.i.preheader

				for.cond.i.preheader: ; preds = %entry.split.i
				br label %for.cond.i

				for.cond.us72.i.preheader: ; preds = %entry.split.i
				br label %for.cond.us72.i

				for.cond.us72.i: ; preds = %for.cond.us72.i.preheader, %for.cond.us72.i
				br label %for.cond.us72.i

				for.cond.i: ; preds = %for.cond.i.preheader, %for.cond.i
				br label %for.cond.i

				fn1.exit: ; preds = %land.end38.us.i.us
				%conv42.us.i.us.lcssa = phi i16 [ %conv42.us.i.us, %land.end38.us.i.us ]
				%add.us.i.us.lcssa = phi i16 [ %add.us.i.us, %land.end38.us.i.us ]
				store i16 %conv42.us.i.us.lcssa, i16* @d, align 2
				store i16 %add.us.i.us.lcssa, i16* @e, align 2
				ret i32 0
				}