This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Support expression results as immediate values in mov
ClosedPublic

Authored by jcai19 on May 15 2020, 12:36 PM.

Details

Summary

This patch adds support of using the result of an expression as an
immediate value. For example,

0:
.skip 4
1:
mov x0, 1b - 0b

is assembled to

mov x0, #4

Currently it does not support expressions requiring relocation unless
explicitly specified. This fixes PR#45781.

Diff Detail

Event Timeline

jcai19 created this revision.May 15 2020, 12:36 PM
jcai19 planned changes to this revision.May 15 2020, 12:36 PM
jcai19 updated this revision to Diff 264755.May 18 2020, 4:46 PM

Support the cases when the result of subtraction between two labels are negative.

  • Testing that the expression is the subtraction of two SymbolRefs seems needlessly restrictive, could we accept any expression, and reject it in the AsmBackend if it can't be fully resolved or converted to a relocation?
  • Does this check that the value fits into a 16 bit immediate? The GNU assembler also seems to accept expressions which fit into a MOVZ/MOVN with a shift, though I can't see that being very useful for expressions involving symbol addresses.
  • This needs tests.
jcai19 updated this revision to Diff 264988.May 19 2020, 12:04 PM

Remove the restraction of supporting subtraction of two symbols only.

jcai19 retitled this revision from [AArch64] Add support of labels substraction to mov to [AArch64] Support expression results as immediate values in mov.May 19 2020, 12:06 PM
jcai19 edited the summary of this revision. (Show Details)
jcai19 added reviewers: peter.smith, ostannard.

Thank you for the comments! Please find my replies inlined.

  • Testing that the expression is the subtraction of two SymbolRefs seems needlessly restrictive, could we accept any expression, and reject it in the AsmBackend if it can't be fully resolved or converted to a relocation?

Sounds good. I have relaxed the restriction, but the patch still can't seem to deal with cases like mov reg, label. Will take a look at that.

  • Does this check that the value fits into a 16 bit immediate? The GNU assembler also seems to accept expressions which fit into a MOVZ/MOVN with a shift, though I can't see that being very useful for expressions involving symbol addresses.

I mistakenly thought the bound has been checked. Double checking it again though it seemed the check was missing. I have added it.

  • This needs tests.

Added. Will add more cases once the patch supports more label-based expressions as an immediate operand.

jcai19 updated this revision to Diff 264994.May 19 2020, 12:29 PM
jcai19 edited the summary of this revision. (Show Details)

Add a comment.

efriedma added inline comments.
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
974

What happens when Shift isn't zero?

1785

Why are we adding two operands here?

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
243

I'd prefer to explicitly compute a RefKind here, as opposed to inferring that a null RefKind has some specific meaning.

jcai19 updated this revision to Diff 266261.May 26 2020, 10:16 AM
jcai19 marked 2 inline comments as done.

Update based on comments.

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
974

Thank you for the comments. And sorry for the delay, I was trying to look into the cases where the value of Shift is not zero. However, even gcc could not support mov instructions with shift operand.

$ cat foo.s
mov x0, 0x1, lsl 16

$ aarch64-linux-gnu-gcc -c foo.s
foo.s: Assembler messages:
foo.s:1: Error: unexpected characters following instruction at operand 2 -- `mov x0,0x1,lsl 16'

So far the only cases I found with non-zero Shift is during instruction matching stage, when different values of Shift are tried out, and that should not affect this code. Are there additional cases that would cause Shift to not be zero?

jcai19 marked an inline comment as done.May 26 2020, 10:17 AM
jcai19 added inline comments.
llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
243

We don't currently have a RefKind corresponding to zero. Should we add one?

jcai19 updated this revision to Diff 266264.May 26 2020, 10:27 AM

Fix a typo.

efriedma added inline comments.May 26 2020, 10:32 AM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
974

This codepath triggers for constructs like "mov x0, 0x10000".

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
243

I haven't looked carefully at the implications, but I think we should wrap the expressions in an AArch64MCExpr so we can set an appropriate RefKind.

jcai19 marked an inline comment as done.May 26 2020, 1:28 PM
jcai19 added inline comments.
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
974

Thanks for the example. Since we do not know the result of symbol arithmetic at parsing time so if the result is out of [-0xffff, 0xfff] range, this patch wouldn't be able to handle the cases where Shift is not 0. GNU assembler can handle those cases, and I assume that is because GAS use a multi-iteration design so it can check whether the value is valid after the expression is evaluated, unlike the one-iteration design used by the IAS. I am not sure if there will be many use cases requiring the support of a bigger range. @ostannard previously mentioned

The GNU assembler also seems to accept expressions which fit into a MOVZ/MOVN with a shift, though I can't see that being very useful for expressions involving symbol addresses.

Maybe this is okay?

efriedma added inline comments.May 26 2020, 1:45 PM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
974

It's probably okay.

Please make the code here actually verify that Shift is zero, though, instead of just assuming it's zero.

jcai19 updated this revision to Diff 266393.May 26 2020, 7:08 PM

Add the expression operand as AArch64MCExpr.

efriedma added inline comments.May 26 2020, 9:21 PM
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

Please also add a testcase for something like mov x0, 0b, where the value can't be resolved in the assembler.

jcai19 marked 4 inline comments as done.May 27 2020, 9:24 AM
jcai19 added inline comments.
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
974

Thanks for the clarification.

llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

This is actually interesting. GAS assembled the following code

0:
.skip 4
1:
mov x0, 1b

into

mov x0, #0x4

With the proposed change, IAS would calculate the offset (4) correctly but discard it as it considers the expression 1b unresolved, and generate the following code.

mov x0, #0x0

What's the expected outcome here? Is subtraction between two symbols the only valid expressions we should accept? Thanks.

efriedma added inline comments.May 27 2020, 2:17 PM
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

Converting the address of a label to constant without a relocation is clearly a bug; we shouldn't try to reproduce that behavior.

I think I'd prefer to be conservative and forbid expressions that would require a relocation in the object file, at least for now. (The user can spell it out with movz/movk if the actually want a relocation.)

jcai19 updated this revision to Diff 266950.May 28 2020, 10:58 AM
jcai19 marked an inline comment as done.
This comment was removed by jcai19.
jcai19 updated this revision to Diff 266961.May 28 2020, 11:16 AM

Revert to the implementation before the last version.

jcai19 marked an inline comment as done.May 28 2020, 11:23 AM
jcai19 added inline comments.
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

It seems IAS currently replaces all the labels that it cannot fully resolve with 0, like the following example.

$ cat foo.s
movz x13, #:abs_g0_s:some_label

$ llvm-mc -triple=aarch64 -filetype=obj --debug-only=asm-matcher foo.s -o foo.o
$ aarch64-linux-gnu-objdump -d foo.o
foo.o: file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <.text>:

0:	d280000d 	mov	x13, #0x0                   	// #0

Adding the proposed check seemed to break all such cases. Shall we address that in a separate change?

efriedma added inline comments.May 28 2020, 12:00 PM
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

It's not zero, it's a relocation. objdump -dr output:

0000000000000000 <.text>:
   0:   d280000d        mov     x13, #0x0                       // #0
                        0: R_AARCH64_MOVW_SABS_G0       some_label

I'm suggesting that we forbid emitting relocations specifically for the alias "mov", and allow them if someone requests them with "movz" etc.

jcai19 marked an inline comment as done.May 28 2020, 2:26 PM
jcai19 added inline comments.
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

I see, thank you for the clarification. Double checking again the current patch also emits relocation for instructions like "mov x0, 1b" too. I tried to add the check at the beginning of AArch64AsmBackend::applyFixup as this was the first backend function IAS called after the expressions are evaluated, but it broke the movz example I mentioned in previous comments. Do you have any suggestions as to where we can add this check for mov alias specifically? Also why can't we emit relocations for mov if we allow movz to do so? I thought mov immediate was an alias of movz on 64-bit ARM. Thanks.

efriedma added inline comments.May 28 2020, 3:18 PM
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

Do you have any suggestions as to where we can add this check for mov alias specifically?

Hmm... maybe we should actually go back to doing something more like what the earlier versions did, and not create an AArch64MCExpr.

Also why can't we emit relocations for mov if we allow movz to do so?

It's essentially a question of usability. I'm concerned about confusing users: someone tries to use mov to compute the address of a function, the assembler eats it, and they eventually get an obscure link error complaining the value doesn't fit into the relocation.

If the user explicitly writes :abs_g0_s:, we can assume they know what they're doing.

jcai19 updated this revision to Diff 267077.May 28 2020, 4:27 PM

Do not create AArch64MCExpr.

jcai19 edited the summary of this revision. (Show Details)May 28 2020, 4:29 PM
jcai19 marked an inline comment as done.
jcai19 added inline comments.
llvm/test/MC/AArch64/mov-invalid-expression-as-immediate.s
9 ↗(On Diff #266393)

Thank you for the all clarification! I have updated the patch to not create AAarch64MCExpr.

jcai19 edited the summary of this revision. (Show Details)May 28 2020, 4:30 PM
jcai19 updated this revision to Diff 267080.May 28 2020, 4:40 PM

Fix a typo.

efriedma added inline comments.May 28 2020, 5:15 PM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
979

E can't be null here?

Maybe worth briefly explaining here what this allows.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
246–260

Put the comment inside the if?

249

Can you make the error message specify the range we're looking for?

455

Please put parentheses around the "&&".

jcai19 updated this revision to Diff 267105.May 28 2020, 6:18 PM
jcai19 marked 5 inline comments as done.

Updates based on comments.

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
979

the function would have returned at its begining if E is null. Comment added.

jcai19 added a comment.Jun 2 2020, 2:19 PM

@efriedma Does the current version of the patch look like it is good to go? Thanks.

efriedma accepted this revision.Jun 8 2020, 5:13 PM

LGTM

This revision is now accepted and ready to land.Jun 8 2020, 5:13 PM
jcai19 added a comment.Jun 8 2020, 5:56 PM

LGTM

Thank you!

This revision was automatically updated to reflect the committed changes.
kristof.beyls added inline comments.Jun 9 2020, 2:01 AM
llvm/test/MC/AArch64/mov-expr-as-immediate.s
2

It seems that this test line has a race condition, and is causing bots to fail non-deterministically.
For example:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/30108/steps/test-check-all/logs/stdio
http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/6224/steps/check/logs/stdio

The error message I get on a local test run (which sometimes passes, sometimes fails) is per below.
I think this indicates that instead of using a pipe (|), there should be a semi-colon (or something similar if lit does not support semi-colons) between the run of llvm-mc and the run of llvm-objdump.
It seems that in some invocations, the pipe results in llvm-objdump getting run before llvm-mc is finished writing to temporary file %t.

llvm/test/MC/AArch64/mov-expression-as-immediate.s seems to have the same issue.

If the above seems correct, could you fix this?

kribey01@hw-a20-17:~/dev/llvm.org/build2$ ./bin/llvm-lit -sv ~/dev/llvm.org/llvm-project/llvm/test/MC/AArch64/mov-expression-as-immediate.s
FAIL: LLVM :: MC/AArch64/mov-expression-as-immediate.s (1 of 1)
******************** TEST 'LLVM :: MC/AArch64/mov-expression-as-immediate.s' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/kribey01/dev/llvm.org/build2/bin/llvm-mc -triple aarch64-none-linux-gnu /home/kribey01/dev/llvm.org/llvm-project/llvm/test/MC/AArch64/mov-expression-as-immediate.s -filetype=obj -o /home/kribey01/dev/llvm.org/build2/test/MC/AArch64/Output/mov-expression-as-immediate.s.tmp | /home/kribey01/dev/llvm.org/build2/bin/llvm-objdump -d /home/kribey01/dev/llvm.org/build2/test/MC/AArch64/Output/mov-expression-as-immediate.s.tmp | /home/kribey01/dev/llvm.org/build2/bin/FileCheck /home/kribey01/dev/llvm.org/llvm-project/llvm/test/MC/AArch64/mov-expression-as-immediate.s
--
Exit Code: 2

Command Output (stderr):
--
/home/kribey01/dev/llvm.org/build2/bin/llvm-objdump: error: '/home/kribey01/dev/llvm.org/build2/test/MC/AArch64/Output/mov-expression-as-immediate.s.tmp': The file was not recognized as a valid object file
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/kribey01/dev/llvm.org/build2/bin/FileCheck /home/kribey01/dev/llvm.org/llvm-project/llvm/test/MC/AArch64/mov-expression-as-immediate.s

--

********************
********************
Failed Tests (1):
  LLVM :: MC/AArch64/mov-expression-as-immediate.s

I ended up pushing a fix for the racy tests in 7e6f891df85

I ended up pushing a fix for the racy tests in 7e6f891df85

Thank you Kristof.