Currently ARM backend validates the range of branch targets before the
layout of fragments is finalized. This causes build failure if symbolic
expressions are used, with the exception of a single symbolic value.
For example, "b.w ." works but "b.w . + 2" currently fails to
assemble. This fixes the issue by delaying this check (in
ARMAsmParser::validateInstruction) until the symbolic values are
resolved (in ARMAsmBackend::adjustFixupValue).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
With this patch applied, I can _build_ an armv7 THUMB2 linux kernel w/ Clang's integrated assembler, which I haven't been able to previously.
This looks like the last blocker to ship Clang's integrated assembler for Linux kernel's in Android (not the last bug, but the last one I'd consider a hard blocker).
llvm/test/MC/ARM/thumb2-b.w-target.s | ||
---|---|---|
6–7 | Does anyone know why GNU objdump and llvm-objdump differ in the byte ordering? Looks like a pre-existing bug, but man does it look strange. | |
10–11 | can you add whitespace after the : so this is slightly more readable? | |
10–11 | ie. to align the columns. | |
17–20 | delete newlines |
llvm/test/MC/ARM/thumb2-b.w-target.s | ||
---|---|---|
10 | Delete encodings. Separations of concerns: they are tested by encoding/decoding tests. You can add a label, then you can use CHECK-LABEL: <foo>: and CHECK-NEXT:. |
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp | ||
---|---|---|
1079–1080 | I cant help but wonder whether we should leave isSignedOffset as is, and modify decisions to check vs delay up into the relevant call site? All of these is<Something> checks for operands have nice symmetry, so it's curious to disturb it just for isSignedOffset. | |
llvm/test/MC/ARM/thumb2-branch-ranges.s | ||
100–101 | This checks that error is not printed in the output, but then checks that error: ... is? Or am I misunderstanding that? (I guess I don't understand the point of the CHECK-NOT). |
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp | ||
---|---|---|
1079–1080 | That's what I first implemented: I modified the call site to not report errors if the target of a b.w instruction is not a constant expression. But looking at the code again, I realized (A) isSignedOffset is local to this file and most of its uses are for checking the range of different branch instructions, so hopefully modifying this function won't have too big an impact on other instructions, and (B) more importantly if it's safe to delay the check of single symbolic values (which is essentially a special case of symbolic expressions ), I don't see why we should not hold off the checks for more general symbolic expressions. This should also allow symbolic expressions in the other branch instructions that call this function. WDYT? | |
llvm/test/MC/ARM/thumb2-branch-ranges.s | ||
100–101 | Yes I have the same question. I copied this line from previous tests, but maybe I could remove this line? |
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp | ||
---|---|---|
1079–1080 | So isSignedOffset is attempting down casts of MCExpr to figure out whether the operand is a signed offset. For the case we care about, looking at the various derived classes of MCExpr in llvm/include/llvm/MC/MCExpr.h, it looks like the case we care about is a MCBinaryExpression. That case is not checked in isSignedOffset. I'm not sure that it should be, either, since the answer is "maybe, but I won't know until after layout." The relevant call site is ARMAsmParser::validateInstruction, in the case for ARM::t2B. There's a comment above the case "Final range checking for Thumb unconditional branch instructions.". Your XFAIL test case added to llvm/test/MC/ARM/thumb2-branch-ranges.s demonstrates that such range check is _not_ final. Oh, the error is different and less developer friendly; the relocation itself fails to be encoded. So I think it would be better to amend the call site to check whether isa<MCBinaryExpr>(static_cast<ARMOperand &>(*Operands[op]).getImm()), and skip the check. (It does seem that many other cases probably suffer something similar; different instructions eagerly assume they have MCConstantExpr operands and probably don't work when expressions are used as operands in assembler. But perhaps we could fix this up first for t2B or see if we can get it working, then clean up the rest.) Then I'm curious whether we can call ARMAsmParser::validateInstruction for each instruction that has operands that we resolve post fragment layout? Then we'd have the nicer existing error message that "branch target out of range"? If not, I guess that's ok, since the relocation won't be encodeable, so garbage input will still fail to assemble, and at least we'll support asm expressions as operands. |
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp | ||
---|---|---|
1079–1080 |
I think MCBinaryExpression will cover cases like b.w . + (2f - 1b + 2) but not if we remove the parentheses.
I have created https://reviews.llvm.org/D97568 like you suggested, which was also what I first implemented. I think the code is a bit redundant because we need to check if an expression is MCConstant expression twice, but maybe like you said, we could first fix one case and clean the rest up up bit by bit. |
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp | ||
---|---|---|
1079–1080 |
Which derived class is it without parens in the expression? Maybe we should be checking for either? |
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp | ||
---|---|---|
1079–1080 | I double checked and it turned out they were still considered binary expression. I thought I verified but I'm apparently wrong. I've update the patch in case we want to revisit it in the future. |
llvm/test/MC/ARM/thumb2-branch-ranges.s | ||
---|---|---|
100–101 | My best guess is that the CHECK-NOT checks that none of the supporting assembly generates an error before we get the expected error from the relocations. https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-not-directive I thought maybe this prevents mutliple CHECKs matching on the same error message. However we're checking line numbers, for exactly that reason. (even if Filecheck behaves that way, which I'm not sure it does) Probably a case of belt and braces when it was first written but in hindsight not needed. |
llvm/test/MC/ARM/thumb2-branch-ranges.s | ||
---|---|---|
100–101 | Thanks for looking into it. I've removed this line for better clarity. Also unless someone thinks this change is more preferable, I'm closing it in favor of https://reviews.llvm.org/D97568, which is similar to this patch but is limited to b.w instruction. Will update tests there as well. |
I cant help but wonder whether we should leave isSignedOffset as is, and modify decisions to check vs delay up into the relevant call site? All of these is<Something> checks for operands have nice symmetry, so it's curious to disturb it just for isSignedOffset.