This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker
ClosedPublic

Authored by mbodart on May 19 2016, 4:52 PM.

Details

Summary

This fix addresses the failures reported in pr27681, pr27580 and pr27804, caused by a bug in CriticalAntiDepBreaker.cpp, which increased in exposure when I recently enabled the post-RA-scheduler for more X86 cpus.

At the heart of the issue is that MCSuperRegIterator does not exhaust all registers in the underlying container.
For example if starting with %CL it will encounter %CL, %CX, %ECX and %RCX, but not %CH. This isn't a direct problem,
but it leads to some subtle differences in the various register properties maintained by this optimization,
where %CH stands out as different.

In the case of pr27681, all %RCX subregs other than %CH get conservatively added to the KeepRegs set when we see
a %CL def tied to a use in PreScanInstruction. When we later see an %ECX def in ScanInstruction's def processing,
we skip state updates because of the %ECX presence in KeepRegs. This leaves %CH in an incorrect "free" state.

My fix is to ensure %CH gets added to KeepRegs as well, and to avoid choosing a KeepRegs register
when looking for a free register.

Any use of MCSuperRegIterator or MCRegAliasIterator is prone to this subtle issue. I briefly looked at the other
uses in this optimization, and it was not clear if they could be problematic, so I chose not to touch them.

Diff Detail

Repository
rL LLVM

Event Timeline

mbodart updated this revision to Diff 57878.May 19 2016, 4:52 PM
mbodart retitled this revision from to [CodeGen] Fix problem with X86 byte registers in CriticalAntiDepBreaker.
mbodart updated this object.
mbodart added reviewers: qcolombet, hfinkel.
mbodart added a subscriber: llvm-commits.
spatel added a subscriber: spatel.May 19 2016, 5:32 PM
MatzeB edited edge metadata.May 20 2016, 8:29 PM

I don't understand the nature of the bug with the explanation given here (and I don't know much about this pass). Why do you need this double loop, would iterating over the register alias work as well? The double loop here also seems to requires quadratic runtime in the depth of the sub/super register graph.

Based on the comment in the source it is not clear to me why we have to include alias registers into the KeepRegs set that are not touched by the instruction.

I was also not familiar with this code, until diving in to fix pr27681.

KeepRegs was introduced in r83198 of PostRAScheduler.cpp back in 2009,
and the tied operand checks existed before then. In r212275 of
CriticalAntiDepBreaker.cpp (July 2014), and possibly other change
sets along the way, the use of KeepRegs and tied operands was tweaked
to fix pr20020.

From what I can surmise, the original intent of KeepRegs was to mark
registers as "hands off", when looking for a candidate anti-dependent
register def. So this change is muddying the water a little bit,
as I'm now also using it to say "hands off" when choosing a replacement
register. However, I did not see any other obvious solution.
Setting Classes[Reg] to -1 is another way to say "hands off"
for the purposes of choosing a replacement reg, but it's not clear
how that can be made to work in the current structure for this case.

There is an assumption in ScanInstruction that if a def'd Reg is in KeepRegs,
or is tied to a use, then we do not need to update the liveness info
(Def/KillIndices, KeepRegs, Classes, RegRefs) for it or any of its subregs,
nor set Classes to -1 for any of its superregs.

In the case of pr27681, this means those datastructures are not updated
for %CH when we see a def of %ECX, and ultimately %CH is incorrectly treated
as a suitable replacement register in findSuitableFreeRegister.

I don't know why tied operands are treated the way they are, but the bug
is not a direct consequence of tied operands. Rather, it is in the assumptions
being made about registers present in the KeepRegs set, and implicitly
their subregs.

As to the iterator question, yes it is quadratic, and no, MCRegAliasIterator
does not suffice. If you know of an iterator that would fill the need here,
great. But I couldn't find one. I'm not worried about the quadratic nature,
as the sizes here are very small.

qcolombet edited edge metadata.May 23 2016, 5:13 PM
qcolombet added a subscriber: qcolombet.

Hi Mitch,

I haven’t looked at the patch but given your description, it sounds like using a proper tracking of the liveness (like LivePhysReg) would be the right way to go.
Does that sound reasonable?

Cheers,
-Quentin

MatzeB requested changes to this revision.May 23 2016, 8:50 PM
MatzeB edited edge metadata.

Looking into the testcase this seems to be a problem with tracking which registers are actually free and usable, the algorithm somehow fails to recognize that %ch is not free when %ecx is used.

The KeepRegsSet on the other hand excludes some registers from renaming because that would change program semantics. This does not appear to be the right fix, the fact that the andb in the testcase is a tied operand seems accidental to me. For a proper fix we should have findSuitableFreeRegister() report false for %ch.

Some possible ways to proceed:

  • It would be good to have a minimal test case as a .mir file which just does "llc -run-pass post-RA-scheduler -verify-machineinstrs xxx.mir" that serves well as a testcase and should also make it realistic to single step through the algorithm with a debugger...
  • I wonder if isNewRegClobberedByRefs() should check subregisters instead of just if(... CheckOper.getReg() != NewReg) continue;
  • The We just went back in time and modified history; ... part also does not seem to respect subregisters/superregisters.
This revision now requires changes to proceed.May 23 2016, 8:50 PM

I wasn't familiar with .mir files, so thanks for pointing them out!
Unfortunately creating one directly from the original pr27681.ll file yields
a .mir file which incurs various parse and semantic errors when run through
llc, so it may take some effort to get a clean .mir file. Perhaps someone
with more experience than I can whip one out quickly?

Wrt respecting sub/super regs, there are several areas in this optimization
that look suspect to me. But that doesn't necessarily mean they are incorrect.
Maybe they are, maybe they aren't but it requires further study.
As just one example, right off the bat in StartBlock, %CL is found to be
live out. So %CL, and rightfully its superregs %CX, %ECX and %RCX are
also set to a live state, but %CH is not. It certainly appears odd that
%CH's superregs are live while %CH is not, but it's technically accurate.
And it illustrates that properties of a superreg do not necessarily apply
to its subregs.

For a proper fix we should have findSuitableFreeRegister() report false for %ch.

Wholeheartedly agree and that's what I'm trying to achieve.

The isNewRegClobberedByRefs suggestion is unfortunately insufficient.
The refs in this case are the references to the antidependent register %BL,
which does not include the instruction that defines %ECX.

I'm not familiar with LivePhysReg, so I don't know how difficult it would
be to apply here (well, both because of that unfamiliarity, and because I've
found it very difficult to associate a definitive meaning to the liveness
data structures maintained here). Fundamentally I think the shortcuts being
taken for tied operands, and registers already present in the KeepRegs set,
are a key piece of the problem. So even if LivePhysReg were employed,
those shortcuts could still lead to issues.

We need to recognize that the %ECX def clobbers %CH. Normally this appears
to be achieved in ScanInstruction by updating DefIndices[%CH] and Classes[%CH].
But that processing is skipped because of %ECX's presence in KeepRegs,
which in turn was caused by %CL having a tied instance.

I'll spend a little more time looking at those shortcuts to see if
a straightforward solution comes to mind, using DefIndices and Classes.
But if a major revamp comes into play, I likely won't have the time
to tackle it.

I wasn't familiar with .mir files, so thanks for pointing them out!
Unfortunately creating one directly from the original pr27681.ll file yields
a .mir file which incurs various parse and semantic errors when run through
llc, so it may take some effort to get a clean .mir file. Perhaps someone
with more experience than I can whip one out quickly?

That is unfortunate. .mir files are still a new thing and have many rough edges, nonetheless they are often the only way to produce a sane and stable testcase. It would be nice if you could file a bug with instructions on how to reproduce the problem.

Wrt respecting sub/super regs, there are several areas in this optimization
that look suspect to me. But that doesn't necessarily mean they are incorrect.
Maybe they are, maybe they aren't but it requires further study.
As just one example, right off the bat in StartBlock, %CL is found to be
live out. So %CL, and rightfully its superregs %CX, %ECX and %RCX are
also set to a live state, but %CH is not. It certainly appears odd that
%CH's superregs are live while %CH is not, but it's technically accurate.
And it illustrates that properties of a superreg do not necessarily apply
to its subregs.

For a proper fix we should have findSuitableFreeRegister() report false for %ch.

Wholeheartedly agree and that's what I'm trying to achieve.

The isNewRegClobberedByRefs suggestion is unfortunately insufficient.
The refs in this case are the references to the antidependent register %BL,
which does not include the instruction that defines %ECX.

I'm not familiar with LivePhysReg, so I don't know how difficult it would
be to apply here (well, both because of that unfamiliarity, and because I've
found it very difficult to associate a definitive meaning to the liveness
data structures maintained here). Fundamentally I think the shortcuts being
taken for tied operands, and registers already present in the KeepRegs set,
are a key piece of the problem. So even if LivePhysReg were employed,
those shortcuts could still lead to issues.

While I would strongly recommend LivePhysRegs for newly written code, I fear that it means rewriting (bigger?) parts of the pass here. Given how confusing the current code is we have to be careful not to introduce new bugs because we do not understand the current code...

We need to recognize that the %ECX def clobbers %CH. Normally this appears
to be achieved in ScanInstruction by updating DefIndices[%CH] and Classes[%CH].
But that processing is skipped because of %ECX's presence in KeepRegs,
which in turn was caused by %CL having a tied instance.

Ah that reasoning makes sense. What happens if you remove that KeepRegs check there? On first sight this just seems like a shortcut to me that we can do without...

I'll spend a little more time looking at those shortcuts to see if
a straightforward solution comes to mind, using DefIndices and Classes.
But if a major revamp comes into play, I likely won't have the time
to tackle it.

I'm not familiar with LivePhysReg, so I don't know how difficult it would
be to apply here (well, both because of that unfamiliarity, and because I've
found it very difficult to associate a definitive meaning to the liveness
data structures maintained here). Fundamentally I think the shortcuts being
taken for tied operands, and registers already present in the KeepRegs set,
are a key piece of the problem. So even if LivePhysReg were employed,
those shortcuts could still lead to issues.

I have to chime in here that I added the tied operand checks to fix PR20020 ( D4351 ):
http://llvm.org/bugs/show_bug.cgi?id=20020

This was one of my earliest commits. From the comments, you can see that I didn't understand how this pass works, but somehow everyone else also was and still is unsure of how this pass works!

So if nuking that part of the code (without reintroducing the bug it tried to fix of course) would help solve this bug, I'd fully support that.

Apparently I forgot to push submit!

test/CodeGen/X86/pr27681.ll
5 ↗(On Diff #57878)

I would recommend to use a MIR test instead.
That way we do not need to be (un)lucky with the register allocation. Moreover the test should be much smaller and easier to understand that way.

Hi,

(Still haven’t looked at the code!)

I am going to state the obvious but, given we have introduced regressions I am fine with a hacky fix as long as it is low risk. I do not have a good sense of risk at the moment, does anyone have a good feeling on that?
For the longer term solution, it sounds like we should really step back and take the time to clarify how the code is working and go for a proper redesign.

I don’t think we have a code owner for that pass, so if we should to go that direction, someone has to step up for the long term fix.

Cheers,
-Quentin

Please do not push the fix in the current form.

I am going to state the obvious but, given we have introduced regressions I am fine with a hacky fix as long as it is low risk. I do not have a good sense of risk at the moment, does anyone have a good feeling on that?
For the longer term solution, it sounds like we should really step back and take the time to clarify how the code is working and go for a proper redesign.

Please do not push the fix in the current form. The fact that there was a tied operand involved seems accidental to me it very likely does not fix the real problem. I believe the only reason we are not seeing the problem more often is that subregister usage is so extremely rare on x86 (I think I only found a handful places in the whole of the llvm test-suite last time I looked).

I don’t think we have a code owner for that pass, so if we should to go that direction, someone has to step up for the long term fix.

The current code *DepBreaker.* code is bad, does liveness in untypical ways that makes it very hard to understand and I am pretty sure it was always broken. But maybe we just shouldn't enable it then before it is fixed?

I agree with Matthias, unless it wasn’t clear :).

Sanjay, I tried backing out your fix for pr20020.ll, but the test passes either way. So something else changed along the way.
This is a great argument for using a .mir test, and I'll pursue that.

I'm testing an alternative "fix" isolated to ScanInstruction. Instead of skipping the state updates when we see Reg in KeepRegs, I do all of the state updates except for removing Reg from KeepRegs. But I also heed the immediate continue, with no state updates, in the tied operand case. In particular this will reflect in DefIndices and Classes that %CH is modified by the %ECX def, and that's enough to prevent using %CH in findSuitableFreeRegister.

I did some more digging re the tied operand check, and I think the idea there is that a tied operand will still be live into the instruction, so apparently the author thought this meant the def need not be treated as a live range killer. Mucking with that assumption is not something I want to delve into with this particular change.

Matthias, this isn't just being enabled now. The X86 post scheduler had been enabled for atom, bonnell, btver2 and slm.
I agree with your general concern re the stability of the antidepence breaker, simply from paranoia after staring at the code.
But a complete overhaul is a longer term project.

For the short term I agree with Quentin that a small, safe, patch is reasonable.
But per one of Sanjay's bugzilla comments, it may also be reasonable to disable the antidepence breaker for X86.
We'd have to run some performance tests to verify that.

I did some more digging re the tied operand check, and I think the idea there is that a tied operand will still be live into the instruction, so apparently the author thought this meant the def need not be treated as a live range killer. Mucking with that assumption is not something I want to delve into with this particular change.

My understanding is that some things cannot be renamed legally and KeepRegs collects those:
For calls the ABI forces some values to some register, and for two-address code (the tied-operand case) we cannot simply rename the input because it has to match the output. That much of the code seems fine to me.

Matthias, this isn't just being enabled now. The X86 post scheduler had been enabled for atom, bonnell, btver2 and slm.
I agree with your general concern re the stability of the antidepence breaker, simply from paranoia after staring at the code.
But a complete overhaul is a longer term project.

For the short term I agree with Quentin that a small, safe, patch is reasonable.
But per one of Sanjay's bugzilla comments, it may also be reasonable to disable the antidepence breaker for X86.
We'd have to run some performance tests to verify that.

Yes having a fix for the issue at hand is perfectly fine, I am not trying to talk you into rewriting the pass. I was only objecting to the patch as it stands right now because adding the super/sub register iterator this way for a tied operand seems to fix the issue by accident not by design.

Sanjay, I tried backing out your fix for pr20020.ll, but the test passes either way. So something else changed along the way.

I confirmed that locally. There's no difference in the asm for that code after backing out the tied operand blob, so it's not just due to bogus CHECK lines in the test. I also confirmed that the original C program from PR20020 is not crashing anymore even without the changes from r212275.

Sanjay, I tried backing out your fix for pr20020.ll, but the test passes either way. So something else changed along the way.

I confirmed that locally. There's no difference in the asm for that code after backing out the tied operand blob, so it's not just due to bogus CHECK lines in the test. I also confirmed that the original C program from PR20020 is not crashing anymore even without the changes from r212275.

From a high level the fix in your revision does make sense to me. We cannot just rename the input of a two-address instruction when we do not rename the output at the same time.

mbodart updated this revision to Diff 58656.May 26 2016, 11:37 AM
mbodart edited edge metadata.

I uploaded the alternate fix I alluded to, which is all in ScanInstruction.
This retains the KeepReg setting from Sanjay's fix for pr20020, but performs the other liveness updates.
In particular, DefIndices[%CH] gets updated, and prevents %CH from being used as a replacement register.

Matthias, my comments about tied operands/liveness were for its use in ScanInstruction, which predates
Sanjay's use in PrescanInstruction. I also tested a slight variation of this new change to the effect:

bool Keep = KeepRegs.test(Reg);
if (!Keep && MI.isRegTiedToUseOperand(i))

continue;

This also resolves the issue with no other known fails. But I cannot say for certain whether this tweak
is appropriate, so I left it out of the change set. Simililary, I cannot say for certain that dropping my
earlier "alias closure" change is appropriate, but for simplicitly of the change set I dropped it.
If we eventually find ourselves needing more tweaks to the antidependence breaker, I would recommend
simply disabling it. For X86 at least, prior to my post scheduling change it was only being used
for a small number of -mcpu cases, so I would be surprised if there's any signficant performance impact.
And fixing these little issues piecemeal is an onion peeling time sink.

The new pr27681.mir test is my first ever .mir test, and the process by which it was created seems fragile.
I started with an llc IR dump, and deleted things until it successfully compiled and still showed the original
bug and the fix. But various tweaks like dropping -mcpu, or trying -mattr, prevented the original bug
from happening. In the end the test works, but I have no idea how coherent/robust the .mir file is.

Additionally, at least one other issue was uncovered. It varies depending on -mcpu/-mattr, but there
are cases where we choose %bh as the free register to replace %bl, and that doesn't actually break the
dependence! It's a perf issue though, not stability, so I don't want to fix it in this change set. But it's
the reason my test checks for %b instead of %bl.

MatzeB accepted this revision.May 26 2016, 1:19 PM
MatzeB edited edge metadata.

This patch looks good enough to me now. I still have a question (see below) but that may be because of my limited understanding of the pass.

Thanks for the .mir test, it should already be more robust than the .ll one. I think there is room for simplifying it further though:

lib/CodeGen/CriticalAntiDepBreaker.cpp
279–280 ↗(On Diff #58656)

I am surprised that we need to guard KeepRegs.reset() with an if here. But well it preserves some existing behavior so if it is somehow necessary to have the if here keep it.

test/CodeGen/X86/pr27681.mir
8–11 ↗(On Diff #58656)

I tended to use the even shorter define void @main() { ret void } here.

16–22 ↗(On Diff #58656)

This can probably be simplified. I think alignment, exposesReturnsTwice, hasInlineAsm, isSSA, tracksSubRegLiveness, calleeSavedRegisters have their default values and do not need to be mentioned.

26–38 ↗(On Diff #58656)

Similarily here, I assume most of the info here can be omitted.

45–54 ↗(On Diff #58656)

It looks like only stack.1, stack.5 and stack.6 are used. Maybe you can reduce and rename this to 3 stack objects?

60–70 ↗(On Diff #58656)

do the contents of bb.0 actually matter for the test to work?

91 ↗(On Diff #58656)

for stylistic reasons I would move CHECK-LABEL upwards and check for CHECK-LABEL: name: main next to the name: line.

96–114 ↗(On Diff #58656)

Have you tried removing the lower part (bb.2, bb.3) and just ending bb.1 with RETQ to simplify the testcase further?

This revision is now accepted and ready to land.May 26 2016, 1:19 PM
mbodart updated this revision to Diff 58702.May 26 2016, 3:17 PM
mbodart edited edge metadata.

Thanks for the .mir suggestions!

I had originally tried some of those, but llc choked.
I reduced the test a bit more, again limited by llc.

When I removed chunks of instructions from uninteresting blocks, either llc choked
or the problem did not reproduce. So I'll leave that code in. Perhaps more reduction
is feasible, but this test size seems reasonable.

mbodart added inline comments.May 26 2016, 3:20 PM
test/CodeGen/X86/pr27681.mir
9–12 ↗(On Diff #58702)

Tried that, but llc want' to see a label corresponding to bb.0.entry.

61–71 ↗(On Diff #58702)

When I tried removing instructions from bb.0, bb.2 and bb.3, either llc crashed, or the bug did not reproduce.

90 ↗(On Diff #58702)

Just removing instructions from bb.2 caused the failure to not reproduce.
I think, but haven't verified, that having %cl be live out is needed.

I originally had the CHECK-LABEL up near main, but llc chokes on that.

MatzeB added inline comments.May 26 2016, 3:29 PM
test/CodeGen/X86/pr27681.mir
9–12 ↗(On Diff #58702)

I did not know it referenced the IR blocks. But then my guess would be that you can rename 'bb.0.entry' to 'bb.0' to break this reference.

90 ↗(On Diff #58702)

The problem is probably the different comment characters. Currently # is used to introduce comments in yaml but ; is used for comments inside the code blocks.

mbodart added inline comments.May 26 2016, 3:37 PM
test/CodeGen/X86/pr27681.mir
9–12 ↗(On Diff #58702)

Yup, that works. Will fix.

90 ↗(On Diff #58702)

Interesting, very picky. I had tried using #, but I had the CHECK-LABEL line after the name line. Putting it before the name line works, so I'll do that.

This revision was automatically updated to reflect the committed changes.