Page MenuHomePhabricator

[WebAssembly] Allow DBG_VALUE after terminator in MachineVerifier
AbandonedPublic

Authored by aheejin on May 2 2021, 7:42 PM.

Details

Summary

When a stackified variable has an associated DBG_VALUE instruction,
DebugFixup pass adds a DBG_VALUE instruction after the stackified
value's last use to clear the variable's debug range info. But when the
last use instruction is a terminator, it can cause a verification
failure (when run with -verify-machineinstrs) because there are no
instructions allowed after a terminator.

For example:

%myvar = ...
DBG_VALUE target-index(wasm-operand-stack), $noreg, !"myvar", ...
BR_IF 0, %myvar, ...
DBG_VALUE $noreg, $noreg, !"myvar", ...

In this test, %myvar is stackified, so the first DBG_VALUE
instruction's first operand has changed to wasm-operand-stack to
denote it. And an additional DBG_VALUE instruction is added after its
last use, BR_IF, to signal variable myvar is not in the operand
stack anymore. But because the DBG_VALUE instruction is added after
the BR_IF, a terminator, it fails MachineVerifier.

I experimented on whether we could add the DBG_VALUE before the
terminator in this kind of case, but it loses info in the resulting
debug info. If we do that in the example above, the result will be:

%myvar = ...
DBG_VALUE target-index(wasm-operand-stack), $noreg, !"myvar", ...
DBG_VALUE $noreg, $noreg, !"myvar", ...
BR_IF 0, %myvar, ...

Now the debug info for myvar has changed twice, rendering the first
DBG_VALUE meaningless. In this case this info is dropped because its
range is empty.

So this CL adds an exception for wasm in MachineVerifier's terminator
check. The reason this does not happen with other targets is, I think,
in general when DBG_VALUE has to be inserted right after an
instruction that writes to a register. If that register used to contain
a variable's value, that variable's DBG_VALUE has to be cleared. And
also if the newly written value is a value for another variable,
DBG_VALUE for that variable has to be added. But terminator
instructions don't generally write to registers, obviating the need to
add DBG_VALUEs right after them.
But wasm is special in this way, because uses, not defs, can change the
value stack location's contents. Our terminator instructions, such as
BR_IF don't write to registers but uses registers, which can be
stackified, changing the value stack and thus possibly invalidating some
variables' value in previous DBG_VALUEs, so they have to be cleared by
additional DBG_VALUEs after the terminator in that case.

I am not very familiar with how debug info works, so I might be
mistaken, and please correct me if so.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50175.

Diff Detail

Event Timeline

aheejin created this revision.May 2 2021, 7:42 PM
aheejin requested review of this revision.May 2 2021, 7:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2021, 7:42 PM
aheejin edited the summary of this revision. (Show Details)May 2 2021, 9:02 PM

Yes, this makes sense to me. Another way to put it is that our br_if has the side effect of forcing termination of a "register" lifetime, which is uncommon.

Does LLVM not emit code for any other stack machines where this could happen? x87? :)

The check is sadly not particularly elegant. I wonder if maintainers of common code would like this code better if instead it was moved into the else if below, purely to suppress the error. Makes the common logic slightly easier to follow?

aheejin updated this revision to Diff 342461.May 3 2021, 10:42 AM

ll file indentation fix

Is throw also a terminator?

llvm/lib/CodeGen/MachineVerifier.cpp
803

I guess the inclusion of FirstTerminator means that if there is a br_if followed by a br or whatever, then we only allow DBG_VALUE after the br_if which makes sense.
Would this miss the case where there is only one terminator (i.e. just a br, in which case it would also be the first terminator), or is FirstTerminator not set in that case?

aheejin edited the summary of this revision. (Show Details)May 3 2021, 7:35 PM

Yes, this makes sense to me. Another way to put it is that our br_if has the side effect of forcing termination of a "register" lifetime, which is uncommon.

Does LLVM not emit code for any other stack machines where this could happen? x87? :)

I don't know. I searched briefly but haven't figured out how debug info works in x87 yet. It might take some more time. I'd appreciate if you know any pointers.

The check is sadly not particularly elegant. I wonder if maintainers of common code would like this code better if instead it was moved into the else if below, purely to suppress the error. Makes the common logic slightly easier to follow?

Yeah it is hacky, and I'm not sure how to avoid it. I thought about merging the condition in the else part, but ended up not doing it because it makes the code less readable, and I also thought separating the hacky part as its own block might help others to read the code. But if you think it doesn't help reading please let me know.

aheejin added inline comments.May 3 2021, 8:11 PM
llvm/lib/CodeGen/MachineVerifier.cpp
803

FirstTerminator is set whenever we've seen a terminator within a BB. There can be multiple terminators; in our case br_if followed by br, and there will be more in other targets. LLVM's verification rule is after any terminator occurs within a BB, only other terminator instructions can come after that. Does this answer your question? I'm not sure if I understood your question well.

jmorse added a subscriber: jmorse.May 4 2021, 11:39 AM

Hi,

I'm not familiar with WebAssembly, but thought I'd describe how this would works in other backends -- it might inspire you to take a different route. Sorry if it doesn't apply to WebAssembly.

For the problem of:

But wasm is special in this way, because uses, not defs, can change the
value stack location's contents. Our terminator instructions, such as
BR_IF don't write to registers but uses registers, which can be
stackified, changing the value stack and thus possibly invalidating some
variables' value in previous DBG_VALUEs, so they have to be cleared by
additional DBG_VALUEs after the terminator in that case.

There's a function / class, DbgEntityHistoryCalculator, that would handle this for other architectures. It translates DBG_VALUE instructions into location lists described using MCSymbols. It starts a location range at the point where a DBG_VALUE occurs, and then terminates that range at:

  • Another DBG_VALUE for the same variable,
  • The end of the block,
  • When the relevant machine register is clobbered.

It seems to me that if the WebAssembly value stack can be tracked within a block in a similar way, then when a value is no longer present, DbgEntityHistoryCalculator could terminate the location range as if a register had been clobbered. X86 tail-calls are an example: they clobber registers, but we don't insert DBG_VALUE $noregs after them, because DbgEntityHistoryCalculator observes the clobber and terminates the range.

If WebAssembly can't use DbgEntityHistoryCalculator for some reason, then yes, something like this patch would be needed.

dschuff added inline comments.May 4 2021, 2:54 PM
llvm/lib/CodeGen/MachineVerifier.cpp
803

I think it answers my question.
What I was wondering was if we have only an unconditional branch followed by a DBG_VALUE (and no other terminator), that would be still invalid but would be allowed by this code.

Thanks, interesting about DbgEntityHistoryCalculator, I saw it but thought that it was just part of the way the translation from dbg_values to DWARF/other debuginfo was done in the platform-independent code. And to be clear, our debug info generation appears to actually be working correctly; it's only the MachineVerifier that is complaining.
I'll take a closer look at DbgEntityHistoryCalculator

aheejin abandoned this revision.EditedMay 11 2021, 11:26 PM

@jmorse Thank you very much for the info! I didn't know DbgEntityHistoryCalculator terminates debug value ranges at the end of a BB. That solves everything without this hacky stuff. I uploaded a new patch in D102309.