Page MenuHomePhabricator

[ARM] Fix so immediates and pc relative checks
Needs ReviewPublic

Authored by simonwallis2 on Jul 30 2020, 2:18 AM.

Details

Summary

ARM Constant Islands would not detect values that
could not be represented by SO immediates,
breaking only during the emission of object
files.

Diff Detail

Event Timeline

dnsampaio created this revision.Jul 30 2020, 2:18 AM
dnsampaio requested review of this revision.Jul 30 2020, 2:18 AM
This revision is now accepted and ready to land.Jul 30 2020, 6:36 AM
dmgreen added inline comments.Jul 30 2020, 6:47 AM
llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
326 ↗(On Diff #281843)

I've not looked into any of the details yet, but what's happening with these changes? I wouldn't expect any of them to be hitting the limit on a adr.

dmgreen added inline comments.Jul 30 2020, 6:49 AM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
1791–1792

I guess, should this be true?

dnsampaio marked 2 inline comments as done.Jul 30 2020, 8:40 AM
dnsampaio added inline comments.
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
1791–1792

Indeed, we are testing t1 coding here.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
326 ↗(On Diff #281843)

Were consequence of not performing the reduction, due the error below.

dnsampaio updated this revision to Diff 281945.Jul 30 2020, 8:41 AM
dnsampaio marked 2 inline comments as done.

Fixed true -> false for isSOImm when checking for reducing to t1 instructions

simonwallis2 added inline comments.Aug 17 2020, 11:54 PM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
1169

IsSoImm and NegOk arguments are in the wrong order.

Feel free to move on this patch, I'm in vacations.

simonwallis2 commandeered this revision.Aug 18 2020, 1:57 AM
simonwallis2 edited reviewers, added: dnsampaio; removed: simonwallis2.
This revision now requires review to proceed.Aug 18 2020, 1:57 AM

Fix order of arguments in 1 call to isCPEntryInRange()

Supply the optional final argument in another instance of call to isCPEntryInRange(),
for readability.

Thanks. That does fix the problems I was seeing in thumb2 codesize test cases.

I am still unsure about the PCOffset change. Is that for correctness or an optimisation? Or correctness specific to SoImm due to it's non-uniform nature? I would be fairly surprised if we were always getting the limit of a t2ADR wrong and it had not caused problems up to this point. Either way, can we add some tests for it? And if it is just an independent optimization, consider moving it to a separate patch.

llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
1010–1013

long int -> int64_t

simonwallis2 marked an inline comment as done.
simonwallis2 edited the summary of this revision. (Show Details)

After review feedback, I removed the change to pc offset estimates from this patch.
This patch is now just about fixing the implementation of SO immediates.

Rewrite long int as int64_t.

dmgreen added a comment.EditedAug 21 2020, 2:00 AM

Constant island pass is notorious for subtle bugs and being difficult to test.

I created this test case that breaks, by iterating over all values of SPACE.

I think this is why Diogo would have added the PCOffset handling. The pass is usually conservative about the ends of the offset ranges. Now there is a point where we need to be exact where the SOImm range changes from being 2byte addressable to 4byte addressable.

Dave

# RUN: llc --filetype=obj -start-before=arm-cp-islands -o - %s | \
# RUN: llvm-objdump --arch=armv8a --disassemble - | FileCheck %s

--- |
  target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
  target triple = "armv8.2a-arm-none-eabi"

  define dso_local i32 @main() #0 {
    ret i32 0
  }

  attributes #0 = { "frame-pointer"="all" }
  !4 = !{i32 210}

...
---
name:            main
alignment:       4
tracksRegLiveness: true
constants:
  - id:              0
    value:           half 0xH5440
    alignment:       2
machineFunctionInfo: {}
body:             |
  bb.0 (%ir-block.0):
    liveins: $lr

    $sp = frame-setup STMDB_UPD $sp, 14, $noreg, killed $r11, killed $lr
    frame-setup CFI_INSTRUCTION def_cfa_offset 8
    frame-setup CFI_INSTRUCTION offset $lr, -4
    frame-setup CFI_INSTRUCTION offset $r11, -8
    $r11 = frame-setup MOVr killed $sp, 14, $noreg, $noreg
    frame-setup CFI_INSTRUCTION def_cfa_register $r11
    $sp = frame-setup SUBri killed $sp, 80, 14, $noreg, $noreg

    renamable $r1 = LEApcrel %const.0, 14, $noreg
    renamable $r1 = LDRH killed renamable $r1, $noreg, 0, 14, $noreg :: (load 2 from constant-pool)
    INLINEASM &nop, 1, !4
    renamable $r0 = SPACE 1024, undef renamable $r0

    $sp = frame-destroy MOVr $r11, 14, $noreg, $noreg
    $sp = frame-destroy LDMIA_RET $sp, 14, $noreg, def $r11, def $pc, implicit killed $r0
...

Reinstated the change to pc offset.
This is necessary for ARM-state SO immediates.

Added a test case which tries to place a constant pool after a gap of a specified count of bytes,
or when that is not a valid offset it falls back to inserting the constant pool locally.
This test is invoked several times at significant offsets to demonstrate that the limits are correct.

Thanks. Nice test.

llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
286

We should also include ARM::LEApcrelJT, as they can use SoImm's too.

287

Why 4 and not 8? I think it's meant to be "2 instructions widths"

288

You needn't add t2LEApcrel here. There are lots of opcodes that could be pc-relative (as in initializeFunctionInfo). We only need to include the ones we are being careful about the adjustment - the SoImm codes.

simonwallis2 marked 2 inline comments as done.Aug 26 2020, 2:29 AM
simonwallis2 added inline comments.
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
287

Value of 4 was determined empirically.

getPCOffset: added handling of LEApcrelJT

dmgreen added inline comments.Aug 26 2020, 4:46 AM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
287

You fill me with confidence. And would 8 empirically have problems too?

simonwallis2 added inline comments.Aug 26 2020, 5:17 AM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
287

Values of 0 or 8 cause: error: out of range pc-relative fixup value
A value of 4 did not cause failures.

Why 4 and not 8?
It depends on where you start measuring. I have not pinpointed where,
but I believe that the base address from which the offset was calculated has already been adjusted by 4 bytes,
to take into account the width of the load instruction,
that is to say that it points to the address of the instruction after the load, rather than the load itself.

efriedma added inline comments.Aug 26 2020, 1:44 PM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
287

From the ARM reference manual:

• When executing an ARM instruction, PC reads as the address of the current instruction plus 8.
• When executing a Thumb instruction, PC reads as the address of the current instruction plus 4.

So it sort of makes sense that we'd need to distinguish between ARM-mode and Thumb-mode... but it's suspicious that this is only getting applied to LEApcrel, and not all ARM-mode instructions.

1015

I'm a little concerned that using getSOImmVal is going to throw off the algorithm. The algorithm is based around there being some range of legal addresses, and moving constants into that range. The current code doesn't generate discontinuities like this, and I'm worried adding them will cause problems for convergence.

Can we split this change off from whatever changes are necessary for correctness? It should be conservatively correct to treat an soimm offset as "a multiple of 4 between -1020 and 1020".

simonwallis2 added inline comments.Sep 1 2020, 2:51 AM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
1015

I'm a little concerned that using getSOImmVal is going to throw off the algorithm. The algorithm is based around there being some range of legal addresses, and moving constants into that range. The current code doesn't generate discontinuities like this, and I'm worried adding them will cause problems for convergence.

Can we split this change off from whatever changes are necessary for correctness?

I take on board your observations about a range of legal addresses.
Although I have not seen evidence of any problems for convergence with non-contiguous offsets,
they are not necessary for this bugfix.

1015

It should be conservatively correct to treat an soimm offset as "a multiple of 4 between -1020 and 1020".

The current treatment of an SoImm offset as "a multiple of 4 between -1020 and 1020" is part of the cause of this bug.

I will split this patch into smaller parts and resubmit.

dnsampaio added inline comments.Sep 30 2020, 11:47 PM
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp
287

Just as insight, I only applied to the one instruction there was a problem for the moment.
Following the debug information, you can see that the offset values that were computed in this pass were 4 bytes out from the code generated. I did not investigate if it also ocured in arm mode.

dnsampaio resigned from this revision.Sep 30 2020, 11:48 PM
dnsampaio removed a reviewer: dnsampaio.