This is an archive of the discontinued LLVM Phabricator instance.

[ARM] support symbolic expression as immediate in memory instructions
ClosedPublic

Authored by jcai19 on Mar 18 2021, 8:24 PM.

Details

Summary

Currently the ARM backend only accpets constant expressions as the
immediate operand in load and store instructions. This allows the
result of symbolic expressions to be used in memory instructions. For
example,

0:
.space 2048
strb r2, [r0, #(.-0b)]

would be assembled into the following instructions.

strb r2, [r0, #2048]

This only adds support to ldr, ldrb, str, and strb in arm mode to
address the build failure of Linux kernel for now, but should facilitate
adding support to similar instructions in the future if the need arises.

Link:
https://github.com/ClangBuiltLinux/linux/issues/1329

Diff Detail

Event Timeline

jcai19 created this revision.Mar 18 2021, 8:24 PM
jcai19 requested review of this revision.Mar 18 2021, 8:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2021, 8:24 PM
jcai19 edited reviewers, added: nickdesaulniers; removed: nickdepinet.Mar 18 2021, 8:29 PM
jcai19 added a subscriber: nathanchance.

I'm not familiar with how fixups work so someone else should check that part. What you do with them looks fine but I don't know if those names are llvm internal or we have to match some published spec elsewhere.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
3013

Here we check that Memory.OffsetImm is non zero (non null?) and we don't after. Does that check get done later anyway?

Ditto for the other functions that look like this one.

5949

Nit: remove the newline

llvm/test/MC/ARM/arm-memory-instructions-immediate.s
5

Unindent this or indent the .space line below.

Also could you add a comment describing the purpose? To make it clear this is looking at a specific kind of immediates.

The fixup code looks reasonable to me as it is mostly reusing the same logic as the pc-relative fixup. The names are internal to LLVM so there isn't any particular standard. I think this fixup would correspond to the relocation R_ARM_ABS12 https://github.com/ARM-software/abi-aa/blob/master/aaelf32/aaelf32.rst#relocation-codes so it may be worth using fixup_arm_ldst_abs_12 but that is only a suggestion. I wouldn't expect the assembler to use the relocation as the range is so short that any attempt to use it would likely result in an out of range error.

llvm/test/MC/ARM/arm-memory-instructions-immediate.s
12

Could we add some test cases for negative and zero offsets. It looks like this is handled in the fixup code but it would be good to have some tests to make sure it isn't broken in the future.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
2950

else if (const auto *CE = ...)

(Prefer auto when using dyn_cast; the type info is otherwise mostly redundant).

3003

else if (const auto *CE =...

3013

other functions have

if (!Memory.OffsetImm)
     Inst.addOperand(MCOperand::createImm(0));

if addExpr doesn't handle their more specific cases.

3061

const auto *CE =...

3090–3091

const auto *CE = ...

3140–3141

const auto *CE =

3233–3234

const auto *CE

3245

const auto *CE

3262–3263

const auto *CE

5938–5951

const auto *CE

5947–5952

The tails of these two branches match, other than the second operand. Can you declare a local const MCExpr *, simply assign to it from both branches, then sink the call to push_back to be executed after both branches merge?

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
983–984

can you sink the subexpression MO.getExpr() into the argument list for MCFixup::create here, so that it matches below?

983–1018

seeing:

if !x:

foo()

else:

bar()

is a code smell. We should prefer:

if x:

bar()

else:

foo()

There seems like a fair amount of duplication between sub branches of these nested conditionals, at least handling an operand that is an expression. I wonder if we could reduce the duplication somehow?

1003–1004

What if we're in Thumb2 mode? I think we need a different fixup kind, like L991-996.

Do we need to increment MCNumCPRelocations?

jcai19 updated this revision to Diff 332010.Mar 19 2021, 3:09 PM
jcai19 marked 17 inline comments as done.

Address comments so far.

The fixup code looks reasonable to me as it is mostly reusing the same logic as the pc-relative fixup. The names are internal to LLVM so there isn't any particular standard. I think this fixup would correspond to the relocation R_ARM_ABS12 https://github.com/ARM-software/abi-aa/blob/master/aaelf32/aaelf32.rst#relocation-codes so it may be worth using fixup_arm_ldst_abs_12 but that is only a suggestion. I wouldn't expect the assembler to use the relocation as the range is so short that any attempt to use it would likely result in an out of range error.

Thanks for the suggestion, I've updated the name.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
3013

Yes addExpr would take care of the non-null case. For the cases I could not call addExpr directly, I've handled the case separately.

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
1003–1004

What if we're in Thumb2 mode? I think we need a different fixup kind, like L991-996.

This patch is only intended for some memory instructions in arm mode. I will look into the thumb mode but do we care about it at this moment?

Do we need to increment MCNumCPRelocations?

The expressions of interest here are really those would produce immediate values, so I'm not sure if I should count them as relocation. WDYT?

llvm/test/MC/ARM/arm-memory-instructions-immediate.s
12

Ah yes. I did test negative numbers locally but forgot to add it here, thanks for the reminder.

jcai19 edited the summary of this revision. (Show Details)Mar 19 2021, 3:11 PM
llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
983–984

I wonder why we aren't consistent about the use of getEncodingValue when assigning to Reg? Are these actually different values? (I recognize your change doesn't introduce this, just curious).

983–1018

(I'm tempted to go and pre-commit the previous change as that code smell is pervasive throughout this file).

Looks better, now I see:

if MO.isReg():
  foo()
else:
  if MO.isExpr():
    bar()
  else:
    baz()

should be

if MO.isReg():
  foo()
else if MO.isExpr():
  bar()
else: # MO.isImm
  baz()

Then it's obvious that the MCOperand is an immediate, and removes one level of unnecessary indentation.

1003–1004

I will look into the thumb mode but do we care about it at this moment?

We don't have a pressing need for THUMB2 support, but I worry that someone using clang may find the wrong fixup(relocation?) applied in the event they are assembling a THUMB2 source with a symbolic operand, due to this change. Perhaps a check for isThumb2(STI) then assert if thumb, or skip generating the fixup? (Or maybe another reviewer has thoughts?)

The expressions of interest here are really those would produce immediate values, so I'm not sure if I should count them as relocation.

Looks like it's just a stat (so probably doesn't really matter too much):

STATISTIC(MCNumCPRelocations, "Number of constant pool relocations created.");

Since it's an immediate, it's encoded in the instruction; not a constant pool. But seeing how it's incremented elsewhere throughout this TU, it looks like everytime we have an Expr MCOperand, we bump it. So I think you should here, too. Maybe other reviewers can help clarify?

Seeing how similar the users of MCNumCPRelocations makes me think there's a whole lot of other instruction formats with this issue that you're fixing. Though I think it's fine that we start with ldr/str.

llvm/test/MC/ARM/arm-memory-instructions-immediate.s
26

if you add .thumb you can repeat some tests for THUMB (I would then add .arm before the initial group of instructions to clearly delineate between ARM mode and THUMB. getAddrModeImm12OpValue has pre-existing checks for THUMB, so I suspect we should be able to add coverage for your additions with THUMB.

MaskRay added inline comments.Mar 22 2021, 10:52 AM
llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
985
llvm/test/MC/ARM/arm-memory-instructions-immediate.s
2

You can omit <

jcai19 updated this revision to Diff 332391.Mar 22 2021, 12:08 PM
jcai19 marked 3 inline comments as done.

Address comments.

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
983–984

Yeah I had the same question, so I changed the original code a little bit to make it more obvious for others to notice. I wonder if this is a typo?

1003–1004

We don't have a pressing need for THUMB2 support, but I worry that someone using clang may find the wrong fixup(relocation?) applied in the event they are assembling a THUMB2 source with a symbolic operand, due to this change. Perhaps a check for isThumb2(STI) then assert if thumb, or skip generating the fixup? (Or maybe another reviewer has thoughts?)

I did some more digging and the thumb mode of these instructions are actually handled by different functions, so right now comping them in thumb mode (either with thumbv7 in the target triple or .thumb) will crash the compiler with assertion failures. I have been looking into supporting thumb mode a little bit, and found extra complexity: while the instructions supported by this patch (ldr/ldrb, str/strb) have one encoding in arm mode, they have multiple encodings in thumb mode, depending on the value of the immediate operand, or if it is (un)signed. Some of these encodings are 16 bits while the others 32. IAS associate each encoding with a different opcode and tries to decide the opcode before the fixups are resolved, so it is difficult to decide the precise encoding when expressions are present. One solution I could think of right now is to always match such instructions to the shortest encoding so they don't unnecessarily generate 32-bit instructions when 16-bit version can be used, at the cost of not being able to support expressions with a result outside of the range. WDYT? Also I think it may be easier to implement the thumb mode support in a different CL so we don't have to deal with two complications at the same time.

PS: I re-examined my CL for supporting expressions in b.w instructions (https://reviews.llvm.org/D97568). b.w has two encoding in thumb mode but both are 32-bit instructions and they do not look that differently, so IAS somehow has only one opcode for both of them (verified with --debug-only=asm-matcher). So b.w probably don't have this issue.

Looks like it's just a stat (so probably doesn't really matter too much):

STATISTIC(MCNumCPRelocations, "Number of constant pool relocations created.");

Since it's an immediate, it's encoded in the instruction; not a constant pool. But seeing how it's incremented elsewhere throughout this TU, it looks like everytime we have an Expr MCOperand, we bump it. So I think you should here, too. Maybe other reviewers can help clarify?

+1.

Seeing how similar the users of MCNumCPRelocations makes me think there's a whole lot of other instruction formats with this issue that you're fixing. Though I think it's fine that we start with ldr/str.

Yes there are other memory operations with similar issues but hopefully once we have this checked in it would be much easier to fix those.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1099–1117

Something still really bugs me about these is* functions; perhaps I'm still hung up on D97568.

I understand that we can't validate immediates when expressions are used until post fixup; doing such validation of operands before fixups are performs seems like a major phase ordering issue in LLVM's assemblers.

So for a lot of these is* functions, if we don't have an immediate operand, we can't tell if it fits whatever constraints we're checking. This change makes these functions return true as in "maybe." Rather than return a type that's a tri-state for "definitely yes," "definitely no," or "can't tell until post fixups." Then other is* functions don't have that added logic, like isFPImm below, isLEOffset and isSignedOffset above, etc. I'm very concerned about isARMBranchTarget/isThumbBranchTarget.

I almost wonder if fixing the phase ordering in LLVM will fix this more generally for all instructions/operands or even additional ISAs. Perhaps let's hold onto this change, but see how feasible it would be to validate instructions post fixups in LLVM? Then we can keep these is functions more strict about only accepting immediate operands.

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
1003–1004

think it may be easier to implement the thumb mode support in a different CL so we don't have to deal with two complications at the same time.

Got it, thanks for the due diligence. Sounds good. Sounds like "relaxation" might be needed to efficiently select the encoding when expressions are used.

jcai19 added inline comments.Mar 22 2021, 5:52 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1099–1117

I almost wonder if fixing the phase ordering in LLVM will fix this more generally for all instructions/operands or even additional ISAs. Perhaps let's hold onto this change, but see how feasible it would be to validate instructions post fixups in LLVM? Then we can keep these is functions more strict about only accepting immediate operands.

I realized that resolving some of these fixups also depends on the chose encoding of the instructions between the symbols, i.e. whether it is 16 or 32 bits. So it may be difficult to switch the order of the two phases. Maybe we can add different an option to allow users to decide if size is a major concern, which would allow us to choose a better encoding. It's suboptimal but I suspect such code is not dominant in most code bases, so maybe it's acceptable. WDYT?

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1099–1117

It's not the size of the chosen instruction (relaxation) I'm concerned with. It's the lack of symmetry between considering whether an operand is a MCConstantExpr or not between these ARMAsmParser::is* methods.

So the relevant call chains look like:

AsmParser::Run ->
  AsmParser::parseStatement ->
    ARMAsmParser::MatchAndEmitInstruction ->
      ARMAsmParser::validateInstruction ->
        is*
  MCStreamer::Finish ->
    MCObjectStreamer::finishImpl ->
      MCAssembler::Finish -> 
        MCAssembler::layout ->
          ARMAsmBackend::applyFixup ->
            ARMAsmBackend::adjustFixupValue

Where the is* methods of ARMAsmParser assume they have immediates that can validated eagerly, which for expression operands we can't validate until after ARMAsmBackend::adjustFixupValue. ARMAsmBackend::adjustFixupValue does some checking/diagnostic emission, but I wonder if we can rerun ARMAsmParser::validateInstruction immediately upon applying a fixup?

I wonder whether the is* methods should have a "strict mode" check (boolean parameter, perhaps); in the case of !MCConstantExpr operands, we check once less strict (don't consider that the operand is not a MCConstantExpr a failure), then after fixups immediately call validateInstruction again but do consider !MCConstantExpr a failure. I don't know if it's possible to get the MCInst from the MCFixup so that ARMAsmBackend::adjustFixupValue could call ARMAsmParser::validateInstruction. Seems the MCAsmBackends and the backend parsers don't have references to one another....

I'm very concerned about isARMBranchTarget/isThumbBranchTarget.

Nevermind that comment, they optimistically return true in the case of !MCConstantExpr.

If not, I do like how ARMAsmParser::isAdrLabel has comments like

// If we have an immediate that's not a constant, treat it as a label
// reference needing a fixup.

I think adding those elsewhere in this CL would go a long way in clarifying the intent of these is* methods, maybe without strictly copying the part about "a label."

jcai19 added inline comments.Mar 25 2021, 12:04 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1099–1117

I see. Thanks for the clarification. Will investigate.

jcai19 updated this revision to Diff 335356.Apr 5 2021, 5:11 PM

Only accept MCExpr in the affected instructions.

jcai19 added inline comments.Apr 5 2021, 5:17 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1099–1117

Where the is* methods of ARMAsmParser assume they have immediates that can validated eagerly, which for expression operands we can't validate until after ARMAsmBackend::adjustFixupValue. ARMAsmBackend::adjustFixupValue does some checking/diagnostic emission, but I wonder if we can rerun ARMAsmParser::validateInstruction immediately upon applying a fixup?

I wonder whether the is* methods should have a "strict mode" check (boolean parameter, perhaps); in the case of !MCConstantExpr operands, we check once less strict (don't consider that the operand is not a MCConstantExpr a failure), then after fixups immediately call validateInstruction again but do consider !MCConstantExpr a failure. I don't know if it's possible to get the MCInst from the MCFixup so that ARMAsmBackend::adjustFixupValue could call ARMAsmParser::validateInstruction. Seems the MCAsmBackends and the backend parsers don't have references to one another....

I looked into the code, but couldn't find a way to convert the MCExpr in a MCFixup to a MCInst. It might be possible to call ARMAsmParser::validateInstruction after MCFixups are resolved, by adding the arguments (MCInst &Inst, const OperandVector &Ops) needed as additional members when creating MCFixups, but that would require significantly more changes to the code and raise memory usage. Also having two copies of code seemed to not be unique to ARM backend, for example, we have two copies of code to decide maximum NOP length in emitNop of X86MCInstLower.cpp and X86AsmBackend::getMaximumNopSize in . I wonder if there is any rationale behind the redundancy?

If not, I do like how ARMAsmParser::isAdrLabel has comments like

// If we have an immediate that's not a constant, treat it as a label
// reference needing a fixup.

I think adding those elsewhere in this CL would go a long way in clarifying the intent of these is* methods, maybe without strictly copying the part about "a label."

I limited this patch to only accept MCExpr in the affected instructions, and added comments.

nickdesaulniers accepted this revision.Apr 6 2021, 11:27 AM

Only accept MCExpr in the affected instructions.

Yeah, I think the conservative approach is the way to go here; that way your only functional change is the one you add tests for (ie. isMemImm12Offset). Another thing we could do to be more concise is simply return false if (!isa<MCConstExpr>(Memory.OffsetImm)) for the majority of cases, but return true for such a case in isMemImm12Offset. That would avoid the added dyn_casts and resulting indentation. But in effect it's the same as what you have, so it doesn't matter to me.

I'm ok with this, and thanks for taking the time to work on it. It would be good to collect another LGTM from another reviewer before submitting.

This revision is now accepted and ready to land.Apr 6 2021, 11:27 AM

I'm ok with this, and thanks for taking the time to work on it. It would be good to collect another LGTM from another reviewer before submitting.

Thanks for reviewing the patches! Will definitely wait for additional inputs before I submit.

I have a one major concern, and a number of minor nits that could easily be addressed prior to commit if necessary.

The major concern is that if the intent is to just permit expressions for Arm state LDR instructions (addition of one fixup) then a lot of code in the AsmParser looks like it is permitting expressions for Thumb and Thumb2 LDR instructions that wouldn't use the one new fixup. Can you double check the predicates to make sure that they apply to the instructions you want to permit expressions on? It is possible I've got something wrong though. If I'm right then where we would have had errors had someone use an expression then we may get an internal error or crash as the backend won't be able to handle the expression. My suggestion would be to take out from ARMAsmParser all the code permitting exceptions except for the ARM instructions that support the fixup. Adding support for expressions to all the ARM and Thumb instructions is possible but that would probably be better done over several patches. If I'm wrong please ignore me and sorry for the noise.

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1110–1113

This predicate is only called for the ThumbMemPC class (see ARMInstrThumb.td) if the intent is to allow expressions only in certain Arm load and store instructions then I think this may be letting through expressions where the backend won't support them. It is highly possible that there are no tests that fail because there may be no expect fail tests written.

1502–1507

I think this is Thumb2 t2ldr_pcrel_imm12_asmoperand in ARMInstrThumb2.td

3007

Looks like a few changes are just formatting (clang-format?) While it doesn't bother me personally, I have seen some people prefer that pure formatting changes are committed separately with a [NFC] tag as it minimises the diff when trying to bisect a problem. Maybe worth committing these bits first in a separate [NFC]

3058–3072

I think this comment makes more sense when it immediately precedes CE->getValue() / 4; on line 3062

3137–3143

Same as before, can the comment be moved to immediately before line 3141?

3234

To be consistent I suppose we ought to add the "// The lower two bits are always zero and as such are not encoded."

3263

Another opportunity for // The lower two bits are always zero and as such are not encoded.

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
991

One thing that could be confusing is that there is only an Arm fixup here, whereas in the else if (MO.isExpr()) { ... } there are both Arm and Thumb fixups.

If it is only possible to be here for an Arm Instruction, my apologies I haven't had time to double check the reference manual to see if Thumb has more restricted addressing modes than Arm, can we put a comment here? Possibly an assert that the STI is Arm state?

jcai19 updated this revision to Diff 336180.Apr 8 2021, 11:48 AM
jcai19 marked 5 inline comments as done.

Address comments.

I have a one major concern, and a number of minor nits that could easily be addressed prior to commit if necessary.

The major concern is that if the intent is to just permit expressions for Arm state LDR instructions (addition of one fixup) then a lot of code in the AsmParser looks like it is permitting expressions for Thumb and Thumb2 LDR instructions that wouldn't use the one new fixup. Can you double check the predicates to make sure that they apply to the instructions you want to permit expressions on? It is possible I've got something wrong though. If I'm right then where we would have had errors had someone use an expression then we may get an internal error or crash as the backend won't be able to handle the expression. My suggestion would be to take out from ARMAsmParser all the code permitting exceptions except for the ARM instructions that support the fixup. Adding support for expressions to all the ARM and Thumb instructions is possible but that would probably be better done over several patches. If I'm wrong please ignore me and sorry for the noise.

So the current patch only accepts symbolic expressions in isMemImm12Offset, so only instructions that call this function would be able to accept symbolic expressions, namely, ldr, ldrb, str and strb in ARM mode. All the other is* function would return false when symbolic expressions are spotted, including the thumb mode of the support instructions.

$ cat foo.s 
.syntax unified
.thumb
0:
.space 2048
1:
.space 20
2:
str r0, [r1, #(2b-1b)]

llvm-mc -triple=armv7a -filetype=obj foo.s -o ias.o
foo.s:8:1: error: invalid instruction, any one of the following would fix this:
str r0, [r1, #(2b-1b)]
^
foo.s:8:9: note: invalid operand for instruction
str r0, [r1, #(2b-1b)]
        ^
foo.s:8:1: note: instruction requires: arm-mode
str r0, [r1, #(2b-1b)]
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
1110–1113

This change here should not alter the semantics, i.e. only MCConstantExprs get through. Other types of MCExprs would trigger the following return false case.

1502–1507

Same as above.

jcai19 updated this revision to Diff 336182.Apr 8 2021, 11:52 AM

Fix a typo.

jcai19 marked an inline comment as not done.Apr 8 2021, 11:56 AM
jcai19 added inline comments.
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
3007

This was actually part of the automatic linting done by arc diff. It seems I I can't use it without accepting the linting?

peter.smith accepted this revision.Apr 9 2021, 1:26 AM

Thanks for the update LGTM. It sounds like you have all the cases covered. Apologies for the noise.

jcai19 marked an inline comment as not done.Apr 9 2021, 11:09 AM

Thanks for the update LGTM. It sounds like you have all the cases covered. Apologies for the noise.

No problem! Thank you for reviewing the patch!

This revision was landed with ongoing or failed builds.Apr 12 2021, 12:14 PM
This revision was automatically updated to reflect the committed changes.

This broke the ppc64le-sanitizer builderr: https://lab.llvm.org/buildbot/#/builders/19/builds/3565. Will fix it.