This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add a new G_INVOKE_REGION_START instruction to fix an EH bug
ClosedPublic

Authored by aemerson on Nov 13 2022, 1:59 AM.

Details

Summary

We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.

Say we have:

throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
B returnbb

bb:
%v = phi i1 %true, throwbb, %false....

When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).

throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
%true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
B returnbb

bb:
%v = phi i32 %true, throwbb, %false....

To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.

Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing getFirstTerminator() method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all getFirstTerminator() uses, I've added a new
method that does a forward walk instead.

Diff Detail

Event Timeline

aemerson created this revision.Nov 13 2022, 1:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2022, 1:59 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
aemerson requested review of this revision.Nov 13 2022, 1:59 AM
paquette added inline comments.Nov 15 2022, 3:04 PM
llvm/include/llvm/Target/GenericOpcodes.td
1278

It'd be useful to note in a comment that this is intended to be a terminator.

llvm/lib/CodeGen/MachineBasicBlock.cpp
259

find_if?

llvm/lib/CodeGen/MachineVerifier.cpp
836

MachineVerifier test?

Should the invoke just stay as a terminator instead? I'd hate to have more code placement mechanisms

llvm/docs/GlobalISel/GenericOpcode.rst
850–854

Don't see the point of adding an unused instruction

llvm/test/CodeGen/AArch64/GlobalISel/invoke-region.ll
61

Can you add a case with non-constant phi inputs

Should the invoke just stay as a terminator instead? I'd hate to have more code placement mechanisms

What do you mean stay as a terminator? Invoke is an IR only instruction that we no longer have after translation.

aemerson added inline comments.Nov 15 2022, 9:37 PM
llvm/lib/CodeGen/MachineVerifier.cpp
836

I couldn’t think of anything to verify since these have no operands or results

paquette added inline comments.Nov 16 2022, 10:12 AM
llvm/lib/CodeGen/MachineVerifier.cpp
836

probably just a test that shows that we don't get an error when this precedes a non-terminator

I don't know anything about exceptions, but looking at translateInvoke, I'm confused by several things.

Why do you need this new thing, instead of just looking for EH_LABEL? Why is this specially handling inlineasm such that it may turn off the label if the called operand can't throw as some kind of optimization? If it can't throw, wouldn't it optimize to use a regular call in the first place?

I don't know anything about exceptions, but looking at translateInvoke, I'm confused by several things.

Why do you need this new thing, instead of just looking for EH_LABEL? Why is this specially handling inlineasm such that it may turn off the label if the called operand can't throw as some kind of optimization? If it can't throw, wouldn't it optimize to use a regular call in the first place?

I’m not sure about the inline asm thing since I’m new to this code too, but semantically EH_LABEL isn’t a terminator, it’s a label used for computing exception table information. It seems cleaner to have a dedicated barrier instruction. The reason the legalizations insert at the first terminator is because that guarantees the instruction will execute. If you specifically look for an EH label then it’s leaking implementation details about instruction placement to the API client. Granted, having to use a forward walking ‘getFirstIteratorForward()’ isn’t ideal either.

Does G_INVOKE_REGION_START work with the terminators() iterator?

llvm/lib/CodeGen/MachineBasicBlock.cpp
259

Should we check for debug instructions + skip them here like in getFirstTerminator(), or is that not necessary?

aemerson added inline comments.Nov 16 2022, 3:08 PM
llvm/lib/CodeGen/MachineBasicBlock.cpp
259

I don't think it's necessary since debug instructions would never be terminators.

arsenm added inline comments.Nov 16 2022, 3:11 PM
llvm/lib/CodeGen/MachineBasicBlock.cpp
259

getFirstTerminator does check for them, so I'm not sure what the rule is for debug instructions

Does G_INVOKE_REGION_START work with the terminators() iterator?

It would not. terminators() uses getFirstTerminator() as the start iterator for the range, and therefore it only counts terminators which are adjacent instructions at the end of the block.

I'm open to other approaches to fix this, its just that a new terminator seemed like the least hacky approach, but it's not without it's own issues.

aemerson added inline comments.Nov 16 2022, 3:13 PM
llvm/lib/CodeGen/MachineBasicBlock.cpp
259

It's because getFirstTerminator() stops searching at the first non-debug-non-terminator instruction, walking up the block. That's how it determines whether something is the "first" terminator.

Unfortunately I can't really change that because it would make every call to it more expensive.

aemerson updated this revision to Diff 477557.Nov 23 2022, 10:31 AM

Address comments.

Forgot to remove the END instruction, I’ll do that and update.

aemerson updated this revision to Diff 477679.Nov 23 2022, 8:37 PM
aemerson retitled this revision from [GlobalISel] Add new G_INVOKE_REGION_START/END instructions to fix an EH bug to [GlobalISel] Add a new G_INVOKE_REGION_START instruction to fix an EH bug.
aemerson edited the summary of this revision. (Show Details)

Remove G_INVOKE_REGION_END

paquette accepted this revision.Dec 6 2022, 9:50 AM

I think this looks good at this point

This revision is now accepted and ready to land.Dec 6 2022, 9:50 AM
This revision was landed with ongoing or failed builds.Dec 7 2022, 10:29 AM
This revision was automatically updated to reflect the committed changes.