This is an archive of the discontinued LLVM Phabricator instance.

[ARM] support symbolic expressions as branch target in b.w
ClosedPublic

Authored by jcai19 on Feb 26 2021, 11:25 AM.

Details

Summary

Currently ARM backend validates the range of branch targets before the
layout of fragments is finalized. This causes build failure if symbolic
expressions are used, with the exception of a single symbolic value.
For example, "b.w ." works but "b.w . + 2" currently fails to
assemble. This fixes the issue by delaying this check (in
ARMAsmParser::validateInstruction) of b.w instructions until the symbol
expressions are resolved (in ARMAsmBackend::adjustFixupValue).

Link:
https://github.com/ClangBuiltLinux/linux/issues/1286

Diff Detail

Event Timeline

jcai19 created this revision.Feb 26 2021, 11:25 AM
jcai19 requested review of this revision.Feb 26 2021, 11:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2021, 11:25 AM

This is created based on the discussion at https://reviews.llvm.org/D97501#inline-914358.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
7955

if you don't use the return value from dyn_cast, you should use isa.
https://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates

or rewrite this as two nested ifs:

if (auto foo = dyn_cast<...):
  if (!foo.isSignedOffset<...):
    return Error(...);

I made a very similar patch for RISC-V: D92293.

Patching isSignedOffset is fine since it fixes more instructions. It is also fine to restrict the lift on t2B currently as you don't have tests for others.

llvm/test/MC/ARM/thumb2-b.w-target.s
2

Use consistent comment marker in a test.

6

No need to say GNU objdump behavior. The specific llvm-objdump tests for the symbolization will do it.

12

start+0xfffff is undeeded detail. There should be specific llvm-objdump tests for the symbolization.

llvm/test/MC/ARM/thumb2-branch-ranges.s
101

[[@LINE]] is deprecated FileCheck syntax.

[[#@LINE+1]]:[[#]]: error: ...

jcai19 updated this revision to Diff 326848.Feb 26 2021, 4:58 PM
jcai19 marked 4 inline comments as done.

Skip range check of MCBinaryExpr in b.w instead of any non-constant expressions.

jcai19 marked an inline comment as done.Feb 26 2021, 4:59 PM
jcai19 updated this revision to Diff 327297.Mar 1 2021, 2:46 PM

Remove unncessary CHECK-NOT.

MaskRay accepted this revision.Mar 1 2021, 2:59 PM
MaskRay added inline comments.
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
7954

Add a period after a complete sentence.

This revision is now accepted and ready to land.Mar 1 2021, 2:59 PM
jcai19 updated this revision to Diff 327301.Mar 1 2021, 3:06 PM

Add period after a comment.

jcai19 marked an inline comment as done.Mar 1 2021, 3:07 PM
This revision was landed with ongoing or failed builds.Mar 1 2021, 5:42 PM
This revision was automatically updated to reflect the committed changes.
nickdesaulniers added inline comments.Mar 2 2021, 1:34 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
7967

It's definitely not great seeing this problem across multiple different instructions; in particular I suspect that t2Bcc is the same operation as t2B (branch), just with a condition code (cc) and smaller immediate.

I still think the callers should defer checks for MCBinaryExpr (as this patch does), but then seeing things like isThumbBranchTarget/isARMBranchTarget makes me nervous that we can't validate MCBinaryExpr operands until after adjusting fixups (very late) for many other instructions.

If MCSymbolRefExpr is just the . operator in assembler, and MCBinaryExpr is for expressions, I wonder if isUnsignedOffset, isSignedOffset, isLEOffset, and isThumbMemPC should all return true for MCBinaryExpr?

isARMBranchTarget/isThumbBranchTarget default to true (for example for MCSymbolRefExpr and MCBinaryExpr, but the list above do not. And that's curious to me. We can't know if these immediates satisfy some range constraints until after fixup; right now these methods are a mix of what they permit when expressions involving . are used.

nickdesaulniers added inline comments.Mar 2 2021, 1:52 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
7967

And part of that is that these isSignedOffset and friends all return bool, when for MCSymbolRefExpr and MCBinaryExpr operands the answer has a third possible state which is "maybe, but I won't know until adjusting fixups" which is one of the last things the assembler does, without rerunning instruction validation. isUnsignedOffset, isSignedOffset, isLEOffset, and isThumbMemPC all respond "no" conservatively; isARMBranchTarget/isThumbBranchTarget respond "sure" perhaps too eagerly. I suspect given a MCSymbolRefExpr or MCBinaryExpr, isARMBranchTarget/isThumbBranchTarget both respond true, when after fixups the answer perhaps should have been "no."

jcai19 added inline comments.Mar 2 2021, 5:57 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
7967

Agreed, I will take a look at the code and see if we could refactor it for a more permanent solution. Also . gets converted into a symbol like any regular label, so MCSymbolRefExpr covers any expressions with a single label or ..