This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto.
ClosedPublic

Authored by craig.topper on Mar 29 2019, 12:15 AM.

Details

Summary

Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.

It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.

Not sure if we should also flush PendingLoads for asm goto as well.

I'm also not sure how to directly write a test for this. I'm not sure how to get the TokenFactor to cause something to fail.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 29 2019, 12:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2019, 12:15 AM
craig.topper edited the summary of this revision. (Show Details)Mar 29 2019, 12:16 AM
chandlerc requested changes to this revision.Mar 29 2019, 12:28 AM

While I think we should figure out how to test this, we shouldn't wait to land what seems like an important correctness fix on figuring out how to test. So mostly looking for a particular change below.

Maybe you can use synthetic asserts somewhere here to get bugpoint to help you reduce a test case if you have an application that manages to hit it?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7915–7917

While this predicate makes sense for why we would *semantically* need to flush loads, I think we need to flush them for INLINEASM_BR even if it doesn't say this because otherwise we will be at risk of disrupting the terminator layout.

This revision now requires changes to proceed.Mar 29 2019, 12:28 AM

I dont' have an application that managed to hit. I just happened to be trying to understand chains and the optimizations made in selectiondagbuilder and realized I didn't know what I was doing when I created INLINEASM_BR.

I'm not sure PendingLoads really need to be flushed. As long as nothing else between here and the branch flushes the pending loads we should be fine. Their chain outputs will just never be connected to anything. And their data output usages shoudl have already been collected by something ahead of the inlineasm_br or real br. If their data isn't used their dead anyway.

The reason invoke has a call to (void)getRoot() in the EHPad part of lowerInvokable is because at the time that was added the code basically did this for invoke

getControlRoot() to prepare to create the eh_label.
create eh_label and call DAG.setRoot()
getRoot() to create the call.

The problem was that getRoot() on the create call doesn't call DAG.getRoot() unless PendingLoads is empty. If pending loads isn't empty, it expects the root hasn't changed since the loads were created and that the loads themselves used that root in their input and so just creates a token factor of the load chain outputs and sets that as the root. So the call to getRoot to create the call overwrote the setRoot() call made by the eh_label creation causing the eh_label to be disconnected from the graph. Adding the getRoot call in front of the getControlRoot and eh_label creation fixed this by ensuring the PendingLoads was empty when the call was created. The code has been refactored several times since then and now the CallLoweringInfo object is created outside of lowerInvokable and getRoot is called there. So the (void)getRoot() inside lowerInvokable is probably unnecessary now. And the CLI.setChain(getRoot()) that occurs after creating the eh_label can just be CLI.setChain(DAG.getRoot()) since there are no pending loads to flush.

craig.topper requested review of this revision.Apr 8 2019, 1:41 PM

Ping

efriedma added inline comments.May 15 2019, 5:09 PM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7924

There's currently a call to getControlRoot() in visitCallBr. If I'm understanding correctly, the reason we would want to call it here, instead, is to make sure the relevant copytoreg nodes are scheduled before the INLINEASM_BR, instead of after it?

I guess the reason this isn't leading to practical problems is that in most cases, a CopyToReg doesn't actually correspond to any instruction? And in cases where it does, we get lucky and schedule it before the INLINEASM_BR even though though there's no edge that would enforce it?

If you can't construct a case where the end result of isel is actually different, it might make sense to construct a testcase that checks -debug-only=isel or something like that; not usually a fan of that, but it might be the best alternative here.

8292

Does this have any practical effect? Nothing can actually use the root after this point, I think. Not a big deal either way, though.

efriedma added inline comments.May 15 2019, 5:12 PM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8292

Err, actually, I guess visitCallBr uses the root? In that case, I'm surprised you haven't seen any issues where the inline asm is getting thrown away by instruction selection.

craig.topper marked an inline comment as done.May 15 2019, 5:30 PM
craig.topper added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8292

I think ResultValues.empty() will be true for callbr given the current restrictions on no outputs, but I just wanted to explicitly mention CallBr out of paranoia.

chandlerc accepted this revision.May 15 2019, 6:24 PM

LGTM as long as Eli is ok with the testing arriving after the fact.

I dont' have an application that managed to hit. I just happened to be trying to understand chains and the optimizations made in selectiondagbuilder and realized I didn't know what I was doing when I created INLINEASM_BR.

I'm not sure PendingLoads really need to be flushed. As long as nothing else between here and the branch flushes the pending loads we should be fine. Their chain outputs will just never be connected to anything. And their data output usages shoudl have already been collected by something ahead of the inlineasm_br or real br. If their data isn't used their dead anyway.

The reason invoke has a call to (void)getRoot() in the EHPad part of lowerInvokable is because at the time that was added the code basically did this for invoke

getControlRoot() to prepare to create the eh_label.
create eh_label and call DAG.setRoot()
getRoot() to create the call.

The problem was that getRoot() on the create call doesn't call DAG.getRoot() unless PendingLoads is empty. If pending loads isn't empty, it expects the root hasn't changed since the loads were created and that the loads themselves used that root in their input and so just creates a token factor of the load chain outputs and sets that as the root. So the call to getRoot to create the call overwrote the setRoot() call made by the eh_label creation causing the eh_label to be disconnected from the graph. Adding the getRoot call in front of the getControlRoot and eh_label creation fixed this by ensuring the PendingLoads was empty when the call was created. The code has been refactored several times since then and now the CallLoweringInfo object is created outside of lowerInvokable and getRoot is called there. So the (void)getRoot() inside lowerInvokable is probably unnecessary now. And the CLI.setChain(getRoot()) that occurs after creating the eh_label can just be CLI.setChain(DAG.getRoot()) since there are no pending loads to flush.

BTW, thanks for the awesome write up. Some of this maybe should go into a FIXME comment around that getRoot to document that maybe we should revisit it.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7924

+1

But I'm personally happy with this going in as-is, and working on a (somewhat iffy) testing strategy as a follow-up.

This revision is now accepted and ready to land.May 15 2019, 6:24 PM
This revision was automatically updated to reflect the committed changes.