This is an archive of the discontinued LLVM Phabricator instance.

[ARM] support symbolic expressions as branch target
AbandonedPublic

Authored by jcai19 on Feb 25 2021, 2:32 PM.

Details

Summary

Currently ARM backend validates the range of branch targets before the
layout of fragments is finalized. This causes build failure if symbolic
expressions are used, with the exception of a single symbolic value.
For example, "b.w ." works but "b.w . + 2" currently fails to
assemble. This fixes the issue by delaying this check (in
ARMAsmParser::validateInstruction) until the symbolic values are
resolved (in ARMAsmBackend::adjustFixupValue).

Link:
https://github.com/ClangBuiltLinux/linux/issues/1286

Diff Detail

Event Timeline

jcai19 created this revision.Feb 25 2021, 2:32 PM
jcai19 requested review of this revision.Feb 25 2021, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 2:32 PM
jcai19 updated this revision to Diff 326551.Feb 25 2021, 4:43 PM

Update the test cases.

With this patch applied, I can _build_ an armv7 THUMB2 linux kernel w/ Clang's integrated assembler, which I haven't been able to previously.

This looks like the last blocker to ship Clang's integrated assembler for Linux kernel's in Android (not the last bug, but the last one I'd consider a hard blocker).

llvm/test/MC/ARM/thumb2-b.w-target.s
6–7

Does anyone know why GNU objdump and llvm-objdump differ in the byte ordering? Looks like a pre-existing bug, but man does it look strange.

10–11

can you add whitespace after the : so this is slightly more readable?

10–11

ie. to align the columns.

17–20

delete newlines

MaskRay added inline comments.
llvm/test/MC/ARM/thumb2-b.w-target.s
10

Delete encodings. Separations of concerns: they are tested by encoding/decoding tests.

You can add a label, then you can use CHECK-LABEL: <foo>: and CHECK-NEXT:.

jcai19 updated this revision to Diff 326572.Feb 25 2021, 6:17 PM

Remove encodings from the tests.

jcai19 marked 3 inline comments as done.Feb 25 2021, 6:22 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1079–1080

I cant help but wonder whether we should leave isSignedOffset as is, and modify decisions to check vs delay up into the relevant call site? All of these is<Something> checks for operands have nice symmetry, so it's curious to disturb it just for isSignedOffset.

llvm/test/MC/ARM/thumb2-branch-ranges.s
100–101

This checks that error is not printed in the output, but then checks that error: ... is? Or am I misunderstanding that? (I guess I don't understand the point of the CHECK-NOT).

jcai19 added inline comments.Feb 25 2021, 11:38 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1079–1080

That's what I first implemented: I modified the call site to not report errors if the target of a b.w instruction is not a constant expression. But looking at the code again, I realized (A) isSignedOffset is local to this file and most of its uses are for checking the range of different branch instructions, so hopefully modifying this function won't have too big an impact on other instructions, and (B) more importantly if it's safe to delay the check of single symbolic values (which is essentially a special case of symbolic expressions ), I don't see why we should not hold off the checks for more general symbolic expressions. This should also allow symbolic expressions in the other branch instructions that call this function. WDYT?

llvm/test/MC/ARM/thumb2-branch-ranges.s
100–101

Yes I have the same question. I copied this line from previous tests, but maybe I could remove this line?

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1079–1080

So isSignedOffset is attempting down casts of MCExpr to figure out whether the operand is a signed offset.

For the case we care about, looking at the various derived classes of MCExpr in llvm/include/llvm/MC/MCExpr.h, it looks like the case we care about is a MCBinaryExpression.

That case is not checked in isSignedOffset. I'm not sure that it should be, either, since the answer is "maybe, but I won't know until after layout."

The relevant call site is ARMAsmParser::validateInstruction, in the case for ARM::t2B. There's a comment above the case "Final range checking for Thumb unconditional branch instructions.". Your XFAIL test case added to llvm/test/MC/ARM/thumb2-branch-ranges.s demonstrates that such range check is _not_ final. Oh, the error is different and less developer friendly; the relocation itself fails to be encoded.

So I think it would be better to amend the call site to check whether isa<MCBinaryExpr>(static_cast<ARMOperand &>(*Operands[op]).getImm()), and skip the check. (It does seem that many other cases probably suffer something similar; different instructions eagerly assume they have MCConstantExpr operands and probably don't work when expressions are used as operands in assembler. But perhaps we could fix this up first for t2B or see if we can get it working, then clean up the rest.)

Then I'm curious whether we can call ARMAsmParser::validateInstruction for each instruction that has operands that we resolve post fragment layout? Then we'd have the nicer existing error message that "branch target out of range"? If not, I guess that's ok, since the relocation won't be encodeable, so garbage input will still fail to assemble, and at least we'll support asm expressions as operands.

jcai19 added inline comments.Feb 26 2021, 11:33 AM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1079–1080

So isSignedOffset is attempting down casts of MCExpr to figure out whether the operand is a signed offset.

For the case we care about, looking at the various derived classes of MCExpr in llvm/include/llvm/MC/MCExpr.h, it looks like the case we care about is a MCBinaryExpression.

I think MCBinaryExpression will cover cases like b.w . + (2f - 1b + 2) but not if we remove the parentheses.

That case is not checked in isSignedOffset. I'm not sure that it should be, either, since the answer is "maybe, but I won't know until after layout."

The relevant call site is ARMAsmParser::validateInstruction, in the case for ARM::t2B. There's a comment above the case "Final range checking for Thumb unconditional branch instructions.". Your XFAIL test case added to llvm/test/MC/ARM/thumb2-branch-ranges.s demonstrates that such range check is _not_ final. Oh, the error is different and less developer friendly; the relocation itself fails to be encoded.

So I think it would be better to amend the call site to check whether isa<MCBinaryExpr>(static_cast<ARMOperand &>(*Operands[op]).getImm()), and skip the check. (It does seem that many other cases probably suffer something similar; different instructions eagerly assume they have MCConstantExpr operands and probably don't work when expressions are used as operands in assembler. But perhaps we could fix this up first for t2B or see if we can get it working, then clean up the rest.)

Then I'm curious whether we can call ARMAsmParser::validateInstruction for each instruction that has operands that we resolve post fragment layout? Then we'd have the nicer existing error message that "branch target out of range"? If not, I guess that's ok, since the relocation won't be encodeable, so garbage input will still fail to assemble, and at least we'll support asm expressions as operands.

I have created https://reviews.llvm.org/D97568 like you suggested, which was also what I first implemented. I think the code is a bit redundant because we need to check if an expression is MCConstant expression twice, but maybe like you said, we could first fix one case and clean the rest up up bit by bit.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1079–1080

I think MCBinaryExpression will cover cases like b.w . + (2f - 1b + 2) but not if we remove the parentheses.

Which derived class is it without parens in the expression? Maybe we should be checking for either?

jcai19 updated this revision to Diff 326841.Feb 26 2021, 4:40 PM

Skip range check of MCBinaryExpr instead of any non-constant expressions.

jcai19 added inline comments.Feb 26 2021, 4:41 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1079–1080

I double checked and it turned out they were still considered binary expression. I thought I verified but I'm apparently wrong. I've update the patch in case we want to revisit it in the future.

DavidSpickett added inline comments.Mar 1 2021, 8:57 AM
llvm/test/MC/ARM/thumb2-branch-ranges.s
100–101

My best guess is that the CHECK-NOT checks that none of the supporting assembly generates an error before we get the expected error from the relocations. https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-not-directive

I thought maybe this prevents mutliple CHECKs matching on the same error message. However we're checking line numbers, for exactly that reason. (even if Filecheck behaves that way, which I'm not sure it does)

Probably a case of belt and braces when it was first written but in hindsight not needed.

jcai19 updated this revision to Diff 327286.Mar 1 2021, 2:01 PM

Update test cases.

jcai19 added inline comments.Mar 1 2021, 2:07 PM
llvm/test/MC/ARM/thumb2-branch-ranges.s
100–101

Thanks for looking into it. I've removed this line for better clarity. Also unless someone thinks this change is more preferable, I'm closing it in favor of https://reviews.llvm.org/D97568, which is similar to this patch but is limited to b.w instruction. Will update tests there as well.

jcai19 abandoned this revision.Mar 1 2021, 2:07 PM