Page MenuHomePhabricator

[Thumb] set code alignment for 16-bit load from constant pool
ClosedPublic

Authored by simonwallis2 on Jul 20 2020, 6:38 AM.

Details

Summary

[Thumb] set code alignment for 16-bit load from constant pool

LLVM miscompiles this code when compiling for a target with v8.2-A FP16 and the Thumb ISA at -O0:

extern void bar(__fp16 P5);

int main() {

__fp16 P5 = 1.96875;
bar(P5);

}

The code section containing main has 2 byte alignment.
It needs to have 4 byte alignment,
because the load literal instruction has an offset from the
load address with the low 2 bits zeroed.

I do not include a test case in this check-in.
llc and llvm-mc do not exhibit this bug. They do not set code section alignment
in the same manner as clang.

Diff Detail

Event Timeline

simonwallis2 created this revision.Jul 20 2020, 6:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2020, 6:38 AM

Hi, thanks for working on this.
What exactly are you trying to fix? From what I see from https://developer.arm.com/docs/ddi0597/h/simd-and-floating-point-instructions-alphabetic-order/vldr-immediate-load-simdfp-register-immediate
VLDR.16 s0,{pc}+0x16 requires only a alignment of 2 bytes, as it has only a single zero appended in the case of half (.16):

T1
Half-precision scalar (size == 01)
(Armv8.2)
VLDR{<c>}{<q>}.16 <Sd>, [<Rn> {, #{+/-}<imm>}]
esize = 8 << UInt(size);  add = (U == '1');
imm32 = if esize == 16 then ZeroExtend(imm8:'0', 32) else ZeroExtend(imm8:'00', 32);

Sorry, it seems I was looking the wrong instruction, it should be the label variant: vldr.16 s0, .LCPI0_0

So the correct instruction is: https://developer.arm.com/docs/ddi0597/h/simd-and-floating-point-instructions-alphabetic-order/vldr-literal-load-simdfp-register-literal

For the half-precision scalar variant: the assembler calculates the required value of the offset from the Align(PC, 4) value of the instruction to this label. Permitted values are multiples of 2 in the range -510 to 510.

The significant phrase is the Align(PC, 4) part.
The calculated value of the offset depends on the alignment of the VLDR.16 instruction.
That is why the code section needs to be 4-byte aligned.
If the code section is 2-byte aligned and the linker places the section at a non-4-byte aligned address, the offset will point to a different address.

Note that this bug is not restricted to loading 16-bit floating point literals using VLDR.16.
The same fault is displayed loading 16-bit short literals using LDRH.

Ok, this patch seems to be correct, but it would be nice to have a test.
You can use clang -mllvm -stop-before=arm-cp-islands -mllvm --simplify-mir to obtain a machine IR before the patch, and use llc -run-pass=arm-cp-islands to validate that the alignment for the function is set to 4.

Added MIR test, as suggested.

dnsampaio accepted this revision.Jul 22 2020, 1:42 AM

Thanks. LGTM.

This revision is now accepted and ready to land.Jul 22 2020, 1:42 AM
This revision was automatically updated to reflect the committed changes.