This is an archive of the discontinued LLVM Phabricator instance.

critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308)
ClosedPublic

Authored by spatel on Aug 19 2014, 4:58 PM.

Details

Summary

In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker caused a miscompile because it broke a WAR hazard using a register that it thinks is available based on info from a kill inst. We should never use any def/use info from a kill because they are really just nops.

This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.

The test case is a reduced version of the code from the bug report.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 12686.Aug 19 2014, 4:58 PM
spatel retitled this revision from to critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: hfinkel, atrick.
spatel added a subscriber: Unknown Object (MLST).
atrick accepted this revision.Aug 19 2014, 11:32 PM
atrick edited edge metadata.

I'm signing off on this because it seems reasonably safe, though might not be solving the underlying problem. It does need to be reviewed by someone else. Both Hal and Will Schmidt have worked on similar issues very recently so they will be more informed.

I have to say, the presence of the KILL instruction in your Pre-sched code looks fairly misleading if not wrong. It is certainly incorrect in terms of liveness. If the call's %RDI argument is not "undef" then who defines it? If it is live into the call, the KILL should not define it. I would take a closer look at MachineCopyPropagation and other CodeGen passes to determine where things went wrong.

%RAX<def> = MOV64rm %RIP, 1, %noreg, <ga:@Part_Class>[TF=5], %noreg; mem:LD8[GOT]
%RDI<def> = KILL %RBP
CALL64pcrel32 <ga:@Object_NewImage>[TF=6], <regmask>, %RSP<imp-use>, %RDI<imp-use>, %ESI<imp-use>, %EDX<imp-use>, %ECX<imp-use>, %R8<imp-use,kill>, %R9<imp-use,kill>, %RSP<imp-def>, %EAX<imp-def>
This revision is now accepted and ready to land.Aug 19 2014, 11:32 PM

Thanks for the prompt review, Andy. I've added Will Schmidt as a reviewer.

I agree that the output of MachineCopyPropagation looks wrong; yesterday was my first look at that code, so I didn't want to hold up this patch while investigating that. At the least, this patch makes PostRA slightly more efficient by not spending time processing kills.

If it's ok with everyone, I'll add a fixme comment in MachineCopyPropagation, open a new bug, and take a look. If anyone more familiar with that code can help, I'd certainly appreciate it.

wschmidt edited edge metadata.Aug 20 2014, 8:18 AM

Well, I think this is probably ok since KILL doesn't translate into something real that actually clobbers all or part of a register. So long as true register kills (clobbers) are honored by the anti-dependence breaker, I don't see a problem. KILL seems a bit like a crutch that should be used very sparingly; actual dependencies should be represented on the machine instructions.

Just FYI, I'm Bill Schmidt, not Will Schmidt. (I am the one you were looking for, but there are two of us. Will is willschm, and I'm wschmidt. We have a lottery each week to see which of us is the evil twin so we know how to act.)

MachineCopyPropagation appears to also be the root cause for:
http://llvm.org/bugs/show_bug.cgi?id=18663

This was patched in the AggressiveAntiDepBreaker by Will with:
http://llvm.org/viewvc/llvm-project?view=revision&revision=214429

I don't see a PR for MachineCopyPropagation yet...

Actually, you do need to add willschm as a reviewer, as he is the person who committed the patch in the previous comment. I don't immediately see how to add one myself, so will let you handle it... sorry, Phabricator n00b here.

Added willschm as reviewer because Abbott & Costello aren't available.

On 2nd reading of bug 18863, I see that it was left open to track the issue in MachineCopyPropagation. I'll add what I'm seeing in my test case there.

willschm edited edge metadata.Aug 20 2014, 9:10 AM

Changes look OK to me. I would defer final say/ technical commentary to Hal, (versus myself). Hal should have a better understanding of all the parts here than I do.

hfinkel edited edge metadata.Aug 20 2014, 10:04 AM

I think that we might still want to process Kill instructions that have a subregister/subregister relationship. These are the kinds of kills that the code is expecting, and there is logic to group these and rename them together. We probably don't want to completely disable that functionality. Does that makes sense?

When we commit a fix to this, please also revert r214429 (which was my related temporary fix to a subset of this problem).

P.S. In the future, please make sure to upload full-context diffs (see the instructions here: http://llvm.org/docs/Phabricator.html).

lib/CodeGen/CriticalAntiDepBreaker.cpp
95 ↗(On Diff #12686)

I think we could say this in a better way; how about something like this:

"Kill instructions can define registers, but are really nops, and there might be a real definition earlier that needs to be paired with uses dominated by this kill."

In D4977#16, @hfinkel wrote:

I think that we might still want to process Kill instructions that have a subregister/subregister relationship. These are the kinds of
kills that the code is expecting, and there is logic to group these and rename them together. We probably don't want to
completely disable that functionality. Does that makes sense?

Yes, that makes sense for the Aggressive antidep breaker. I hadn't looked at the Aggressive implementation until now.

I don't see any handling for kills in the Critical antidep breaker though. These 2 implementations seem quite different at first glance. I can add a note in the code comments for this patch to also consider the behavior in the Aggressive class.

When we commit a fix to this, please also revert r214429 (which was my related temporary fix to a subset of this problem).

Sure - I posted the testcase in this patch into bug 18663 with my basic analysis of what's happening in MCP.

Just to be clear, do you approve this patch (with some changes to the comments) as a temporary fix to avoid the miscompile?

P.S. In the future, please make sure to upload full-context diffs (see the instructions here: http://llvm.org/docs/Phabricator.html).

Sorry - will do.

hfinkel accepted this revision.Aug 20 2014, 10:37 AM
hfinkel edited edge metadata.

Yes, this looks good to me.

(Sorry for confusing the issue w/ my comments on the aggressive anti-dep breaker -- thanks for checking on the overlap).

spatel closed this revision.Aug 20 2014, 11:12 AM
spatel updated this revision to Diff 12709.

Closed by commit rL216114 (authored by @spatel).