This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
1/3
MachineVerifier.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
stackified-debug.ll

Differential D101736

[WebAssembly] Allow DBG_VALUE after terminator in MachineVerifier
AbandonedPublic

Authored by aheejin on May 2 2021, 7:42 PM.

Download Raw Diff

Details

Reviewers

dschuff
aardappel
yurydelendik

Summary

When a stackified variable has an associated DBG_VALUE instruction,
DebugFixup pass adds a DBG_VALUE instruction after the stackified
value's last use to clear the variable's debug range info. But when the
last use instruction is a terminator, it can cause a verification
failure (when run with -verify-machineinstrs) because there are no
instructions allowed after a terminator.

For example:

%myvar = ...
DBG_VALUE target-index(wasm-operand-stack), $noreg, !"myvar", ...
BR_IF 0, %myvar, ...
DBG_VALUE $noreg, $noreg, !"myvar", ...

In this test, %myvar is stackified, so the first DBG_VALUE
instruction's first operand has changed to wasm-operand-stack to
denote it. And an additional DBG_VALUE instruction is added after its
last use, BR_IF, to signal variable myvar is not in the operand
stack anymore. But because the DBG_VALUE instruction is added after
the BR_IF, a terminator, it fails MachineVerifier.

I experimented on whether we could add the DBG_VALUE before the
terminator in this kind of case, but it loses info in the resulting
debug info. If we do that in the example above, the result will be:

%myvar = ...
DBG_VALUE target-index(wasm-operand-stack), $noreg, !"myvar", ...
DBG_VALUE $noreg, $noreg, !"myvar", ...
BR_IF 0, %myvar, ...

Now the debug info for myvar has changed twice, rendering the first
DBG_VALUE meaningless. In this case this info is dropped because its
range is empty.

So this CL adds an exception for wasm in MachineVerifier's terminator
check. The reason this does not happen with other targets is, I think,
in general when DBG_VALUE has to be inserted right after an
instruction that writes to a register. If that register used to contain
a variable's value, that variable's DBG_VALUE has to be cleared. And
also if the newly written value is a value for another variable,
DBG_VALUE for that variable has to be added. But terminator
instructions don't generally write to registers, obviating the need to
add DBG_VALUEs right after them.
But wasm is special in this way, because uses, not defs, can change the
value stack location's contents. Our terminator instructions, such as
BR_IF don't write to registers but uses registers, which can be
stackified, changing the value stack and thus possibly invalidating some
variables' value in previous DBG_VALUEs, so they have to be cleared by
additional DBG_VALUEs after the terminator in that case.

I am not very familiar with how debug info works, so I might be
mistaken, and please correct me if so.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50175.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aheejin created this revision.May 2 2021, 7:42 PM

Herald added subscribers: wingo, ecnelises, sunfish and 3 others. · View Herald TranscriptMay 2 2021, 7:42 PM

aheejin requested review of this revision.May 2 2021, 7:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2021, 7:42 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B102227: Diff 342301.May 2 2021, 8:34 PM

aheejin edited the summary of this revision. (Show Details)May 2 2021, 9:02 PM

Yes, this makes sense to me. Another way to put it is that our br_if has the side effect of forcing termination of a "register" lifetime, which is uncommon.

Does LLVM not emit code for any other stack machines where this could happen? x87? :)

The check is sadly not particularly elegant. I wonder if maintainers of common code would like this code better if instead it was moved into the else if below, purely to suppress the error. Makes the common logic slightly easier to follow?

ll file indentation fix

Harbormaster completed remote builds in B102333: Diff 342461.May 3 2021, 12:35 PM

Is throw also a terminator?

llvm/lib/CodeGen/MachineVerifier.cpp
803	I guess the inclusion of `FirstTerminator` means that if there is a `br_if` followed by a `br` or whatever, then we only allow DBG_VALUE after the `br_if` which makes sense. Would this miss the case where there is only one terminator (i.e. just a `br`, in which case it would also be the first terminator), or is `FirstTerminator` not set in that case?

aheejin edited the summary of this revision. (Show Details)May 3 2021, 7:35 PM

In D101736#2733694, @aardappel wrote:

Yes, this makes sense to me. Another way to put it is that our br_if has the side effect of forcing termination of a "register" lifetime, which is uncommon.

Does LLVM not emit code for any other stack machines where this could happen? x87? :)

I don't know. I searched briefly but haven't figured out how debug info works in x87 yet. It might take some more time. I'd appreciate if you know any pointers.

The check is sadly not particularly elegant. I wonder if maintainers of common code would like this code better if instead it was moved into the else if below, purely to suppress the error. Makes the common logic slightly easier to follow?

Yeah it is hacky, and I'm not sure how to avoid it. I thought about merging the condition in the else part, but ended up not doing it because it makes the code less readable, and I also thought separating the hacky part as its own block might help others to read the code. But if you think it doesn't help reading please let me know.

aheejin added inline comments.May 3 2021, 8:11 PM

llvm/lib/CodeGen/MachineVerifier.cpp
803	`FirstTerminator` is set whenever we've seen a terminator within a BB. There can be multiple terminators; in our case `br_if` followed by `br`, and there will be more in other targets. LLVM's verification rule is after any terminator occurs within a BB, only other terminator instructions can come after that. Does this answer your question? I'm not sure if I understood your question well.

Hi,

I'm not familiar with WebAssembly, but thought I'd describe how this would works in other backends -- it might inspire you to take a different route. Sorry if it doesn't apply to WebAssembly.

For the problem of:

But wasm is special in this way, because uses, not defs, can change the
value stack location's contents. Our terminator instructions, such as
BR_IF don't write to registers but uses registers, which can be
stackified, changing the value stack and thus possibly invalidating some
variables' value in previous DBG_VALUEs, so they have to be cleared by
additional DBG_VALUEs after the terminator in that case.

There's a function / class, DbgEntityHistoryCalculator, that would handle this for other architectures. It translates DBG_VALUE instructions into location lists described using MCSymbols. It starts a location range at the point where a DBG_VALUE occurs, and then terminates that range at:

Another DBG_VALUE for the same variable,
The end of the block,
When the relevant machine register is clobbered.

It seems to me that if the WebAssembly value stack can be tracked within a block in a similar way, then when a value is no longer present, DbgEntityHistoryCalculator could terminate the location range as if a register had been clobbered. X86 tail-calls are an example: they clobber registers, but we don't insert DBG_VALUE $noregs after them, because DbgEntityHistoryCalculator observes the clobber and terminates the range.

If WebAssembly can't use DbgEntityHistoryCalculator for some reason, then yes, something like this patch would be needed.

dschuff added inline comments.May 4 2021, 2:54 PM

llvm/lib/CodeGen/MachineVerifier.cpp
803	I think it answers my question. What I was wondering was if we have only an unconditional branch followed by a DBG_VALUE (and no other terminator), that would be still invalid but would be allowed by this code.

Thanks, interesting about DbgEntityHistoryCalculator, I saw it but thought that it was just part of the way the translation from dbg_values to DWARF/other debuginfo was done in the platform-independent code. And to be clear, our debug info generation appears to actually be working correctly; it's only the MachineVerifier that is complaining.
I'll take a closer look at DbgEntityHistoryCalculator

@jmorse Thank you very much for the info! I didn't know DbgEntityHistoryCalculator terminates debug value ranges at the end of a BB. That solves everything without this hacky stuff. I uploaded a new patch in D102309.

aheejin mentioned this in D102309: [WebAssembly] Omit DBG_VALUE after terminator.May 11 2021, 11:29 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineVerifier.cpp

12 lines

test/

CodeGen/

WebAssembly/

stackified-debug.ll

29 lines

Diff 342461

llvm/lib/CodeGen/MachineVerifier.cpp

Show First 20 Lines • Show All 788 Lines • ▼ Show 20 Lines	if (Indexes && Indexes->hasIndex(*MI)) {
SlotIndex idx = Indexes->getInstructionIndex(*MI);		SlotIndex idx = Indexes->getInstructionIndex(*MI);
if (!(idx > lastIndex)) {		if (!(idx > lastIndex)) {
report("Instruction index out of order", MI);		report("Instruction index out of order", MI);
errs() << "Last instruction was at " << lastIndex << '\n';		errs() << "Last instruction was at " << lastIndex << '\n';
}		}
lastIndex = idx;		lastIndex = idx;
}		}

		// WebAssemblyDebugFixup pass can generate a DBG_VALUE instruction after a
		// terminator, in case the terminator instruction consumes a stack operand.
		// Consuming a stack operand changes the contents of the stack, so a DBG_VALUE
		// instruction is necessary to terminate the associated variable's range.
		// The DBG_VALUE instruction takes a form of
		// DBG_VALUE $noreg, $noreg, !"variable", ...
		if (TM->getTargetTriple().isWasm() && FirstTerminator && MI->isDebugValue() &&
		dschuffUnsubmitted Not Done Reply Inline Actions I guess the inclusion of `FirstTerminator` means that if there is a `br_if` followed by a `br` or whatever, then we only allow DBG_VALUE after the `br_if` which makes sense. Would this miss the case where there is only one terminator (i.e. just a `br`, in which case it would also be the first terminator), or is `FirstTerminator` not set in that case? dschuff: I guess the inclusion of `FirstTerminator` means that if there is a `br_if` followed by a `br`…
		aheejinAuthorUnsubmitted Done Reply Inline Actions `FirstTerminator` is set whenever we've seen a terminator within a BB. There can be multiple terminators; in our case `br_if` followed by `br`, and there will be more in other targets. LLVM's verification rule is after any terminator occurs within a BB, only other terminator instructions can come after that. Does this answer your question? I'm not sure if I understood your question well. aheejin: `FirstTerminator` is set whenever we've seen a terminator within a BB. There can be multiple…
		dschuffUnsubmitted Not Done Reply Inline Actions I think it answers my question. What I was wondering was if we have only an unconditional branch followed by a DBG_VALUE (and no other terminator), that would be still invalid but would be allowed by this code. dschuff: I think it answers my question. What I was wondering was if we have only an unconditional…
		MI->getOperand(0).isReg() && MI->getOperand(1).isReg() &&
		!MI->getOperand(0).getReg().isValid() &&
		!MI->getOperand(1).getReg().isValid())
		return;

// Ensure non-terminators don't follow terminators.		// Ensure non-terminators don't follow terminators.
if (MI->isTerminator()) {		if (MI->isTerminator()) {
if (!FirstTerminator)		if (!FirstTerminator)
FirstTerminator = MI;		FirstTerminator = MI;
} else if (FirstTerminator) {		} else if (FirstTerminator) {
report("Non-terminator instruction after the first terminator", MI);		report("Non-terminator instruction after the first terminator", MI);
errs() << "First terminator was:\t" << *FirstTerminator;		errs() << "First terminator was:\t" << *FirstTerminator;
}		}
▲ Show 20 Lines • Show All 2,382 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/stackified-debug.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s \| FileCheck %s

	; Input C code:			; Input C code:

	; int i = input(); // Nested case			; int i = input(); // Nested case
	; int j = input(); // Trivial def-use.			; int j = input(); // Trivial def-use.
	; output(i, j);			; output(i, j);

	; The ll below generates 330 lines of .S, so relevant parts that the			; The ll below generates 330 lines of .S, so relevant parts that the
	Show All 27 Lines
	; CHECK: .int32 .Ltmp2-.Lfunc_begin0			; CHECK: .int32 .Ltmp2-.Lfunc_begin0
	; CHECK: .int16 4 # Loc expr size			; CHECK: .int16 4 # Loc expr size
	; CHECK: .int8 237 # DW_OP_WASM_location			; CHECK: .int8 237 # DW_OP_WASM_location
	; CHECK: .int8 2 # 2			; CHECK: .int8 2 # 2
	; CHECK: .int8 1 # 1			; CHECK: .int8 1 # 1
	; CHECK: .int8 159 # DW_OP_stack_value			; CHECK: .int8 159 # DW_OP_stack_value




	source_filename = "stackified.c"			source_filename = "stackified.c"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	define void @foo() !dbg !12 {			define void @foo() !dbg !12 {
	entry:			entry:
	%call = call i32 @input(), !dbg !18			%call = call i32 @input(), !dbg !18
	call void @llvm.dbg.value(metadata i32 %call, metadata !16, metadata !DIExpression()), !dbg !19			call void @llvm.dbg.value(metadata i32 %call, metadata !16, metadata !DIExpression()), !dbg !19
	%call1 = call i32 @input(), !dbg !20			%call1 = call i32 @input(), !dbg !20
	call void @llvm.dbg.value(metadata i32 %call1, metadata !17, metadata !DIExpression()), !dbg !19			call void @llvm.dbg.value(metadata i32 %call1, metadata !17, metadata !DIExpression()), !dbg !19
	call void @output(i32 %call, i32 %call1), !dbg !21			call void @output(i32 %call, i32 %call1), !dbg !21
	ret void, !dbg !22			ret void, !dbg !22
	}			}

	declare i32 @input()			; DebugFixup pass adds a DBG_VALUE instruction to clear the debug value range of
				; "myvar" (!27) after BR_IF instruction in the entry BB. Even though it is
				; generally not allowed to have more instructions after a terminator, this
				; special case should not crash MachineVerifier.
				define void @dbg_value_after_terminator(i32 %a, i32 %b) !dbg !23 {
				entry:
				%cmp = icmp ne i32 %a, %b, !dbg !25
				call void @llvm.dbg.value(metadata i1 %cmp, metadata !27, metadata !DIExpression(DW_OP_LLVM_convert, 1, DW_ATE_unsigned, DW_OP_LLVM_convert, 8, DW_ATE_unsigned, DW_OP_stack_value)), !dbg !25
				br i1 %cmp, label %bb.1, label %bb.0, !dbg !25

	declare !dbg !4 void @output(i32, i32)			bb.0: ; preds = %entry
				unreachable

				bb.1: ; preds = %entry
				ret void
				}

				declare i32 @input()
				declare !dbg !4 void @output(i32, i32)
	declare void @llvm.dbg.value(metadata, metadata, metadata)			declare void @llvm.dbg.value(metadata, metadata, metadata)

	!llvm.dbg.cu = !{!0}			!llvm.dbg.cu = !{!0}
	!llvm.module.flags = !{!8, !9, !10}			!llvm.module.flags = !{!8, !9, !10}
	!llvm.ident = !{!11}			!llvm.ident = !{!11}

	!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.0 (https://github.com/llvm/llvm-project.git ed7aaf832444411ce93aa0443425ce401f5c7a8e)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, nameTableKind: None)			!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.0 (https://github.com/llvm/llvm-project.git ed7aaf832444411ce93aa0443425ce401f5c7a8e)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, nameTableKind: None)
	!1 = !DIFile(filename: "stackified.c", directory: "C:\\stuff\\llvm-project")			!1 = !DIFile(filename: "stackified.c", directory: "C:\\stuff\\llvm-project")
	Show All 13 Lines
	!15 = !{!16, !17}			!15 = !{!16, !17}
	!16 = !DILocalVariable(name: "i", scope: !12, file: !1, line: 4, type: !7)			!16 = !DILocalVariable(name: "i", scope: !12, file: !1, line: 4, type: !7)
	!17 = !DILocalVariable(name: "j", scope: !12, file: !1, line: 5, type: !7)			!17 = !DILocalVariable(name: "j", scope: !12, file: !1, line: 5, type: !7)
	!18 = !DILocation(line: 4, column: 11, scope: !12)			!18 = !DILocation(line: 4, column: 11, scope: !12)
	!19 = !DILocation(line: 0, scope: !12)			!19 = !DILocation(line: 0, scope: !12)
	!20 = !DILocation(line: 5, column: 11, scope: !12)			!20 = !DILocation(line: 5, column: 11, scope: !12)
	!21 = !DILocation(line: 6, column: 3, scope: !12)			!21 = !DILocation(line: 6, column: 3, scope: !12)
	!22 = !DILocation(line: 7, column: 1, scope: !12)			!22 = !DILocation(line: 7, column: 1, scope: !12)
				!23 = distinct !DISubprogram(name: "dbg_value_after_terminator", scope: null, type: !24, spFlags: DISPFlagDefinition, unit: !0)
				!24 = !DISubroutineType(types: !2)
				!25 = !DILocation(line: 0, scope: !26)
				!26 = distinct !DILexicalBlock(scope: !23)
				!27 = !DILocalVariable(name: "myvar", scope: !26, type: !28)
				!28 = !DIBasicType(name: "bool", size: 8, encoding: DW_ATE_boolean)