This is an archive of the discontinued LLVM Phabricator instance.

The code section containing main has 2 byte alignment.
It needs to have 4 byte alignment,
because the load literal instruction has an offset from the
load address with the low 2 bits zeroed.

I do not include a test case in this check-in.
llc and llvm-mc do not exhibit this bug. They do not set code section alignment
in the same manner as clang.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

simonwallis2 created this revision.Jul 20 2020, 6:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2020, 6:38 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B64908: Diff 279219!Jul 20 2020, 7:09 AM

Hi, thanks for working on this.
What exactly are you trying to fix? From what I see from https://developer.arm.com/docs/ddi0597/h/simd-and-floating-point-instructions-alphabetic-order/vldr-immediate-load-simdfp-register-immediate
VLDR.16 s0,{pc}+0x16 requires only a alignment of 2 bytes, as it has only a single zero appended in the case of half (.16):

T1
Half-precision scalar (size == 01)
(Armv8.2)
VLDR{<c>}{<q>}.16 <Sd>, [<Rn> {, #{+/-}<imm>}]
esize = 8 << UInt(size);  add = (U == '1');
imm32 = if esize == 16 then ZeroExtend(imm8:'0', 32) else ZeroExtend(imm8:'00', 32);

Sorry, it seems I was looking the wrong instruction, it should be the label variant: vldr.16 s0, .LCPI0_0

So the correct instruction is: https://developer.arm.com/docs/ddi0597/h/simd-and-floating-point-instructions-alphabetic-order/vldr-literal-load-simdfp-register-literal

For the half-precision scalar variant: the assembler calculates the required value of the offset from the Align(PC, 4) value of the instruction to this label. Permitted values are multiples of 2 in the range -510 to 510.

The significant phrase is the Align(PC, 4) part.
The calculated value of the offset depends on the alignment of the VLDR.16 instruction.
That is why the code section needs to be 4-byte aligned.
If the code section is 2-byte aligned and the linker places the section at a non-4-byte aligned address, the offset will point to a different address.

Note that this bug is not restricted to loading 16-bit floating point literals using VLDR.16.
The same fault is displayed loading 16-bit short literals using LDRH.

Ok, this patch seems to be correct, but it would be nice to have a test.
You can use clang -mllvm -stop-before=arm-cp-islands -mllvm --simplify-mir to obtain a machine IR before the patch, and use llc -run-pass=arm-cp-islands to validate that the alignment for the function is set to 4.

Added MIR test, as suggested.

Thanks. LGTM.

This revision is now accepted and ready to land.Jul 22 2020, 1:42 AM

Harbormaster completed remote builds in B65189: Diff 279717.Jul 22 2020, 1:56 AM

Closed by commit rG94e4e37d5564: [Thumb] set code alignment for 16-bit load from constant pool (authored by simonwallis2). · Explain WhyJul 22 2020, 2:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMConstantIslandPass.cpp

6 lines

test/

CodeGen/

ARM/

const-load-align-thumb.mir

59 lines

Diff 279730

llvm/lib/Target/ARM/ARMConstantIslandPass.cpp

Show First 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	ARMConstantIslands::doInitialConstPlacement(std::vector<MachineInstr*> &CPEMIs) {
const Align MaxAlign = MCP->getConstantPoolAlign();		const Align MaxAlign = MCP->getConstantPoolAlign();
const unsigned MaxLogAlign = Log2(MaxAlign);		const unsigned MaxLogAlign = Log2(MaxAlign);

// Mark the basic block as required by the const-pool.		// Mark the basic block as required by the const-pool.
BB->setAlignment(MaxAlign);		BB->setAlignment(MaxAlign);

// The function needs to be as aligned as the basic blocks. The linker may		// The function needs to be as aligned as the basic blocks. The linker may
// move functions around based on their alignment.		// move functions around based on their alignment.
MF->ensureAlignment(BB->getAlignment());		// Special case: halfword literals still need word alignment on the function.
		Align FuncAlign = MaxAlign;
		if (MaxAlign == 2)
		FuncAlign = Align(4);
		MF->ensureAlignment(FuncAlign);

// Order the entries in BB by descending alignment. That ensures correct		// Order the entries in BB by descending alignment. That ensures correct
// alignment of all entries as long as BB is sufficiently aligned. Keep		// alignment of all entries as long as BB is sufficiently aligned. Keep
// track of the insertion point for each alignment. We are going to bucket		// track of the insertion point for each alignment. We are going to bucket
// sort the entries as they are created.		// sort the entries as they are created.
SmallVector<MachineBasicBlock::iterator, 8> InsPoint(MaxLogAlign + 1,		SmallVector<MachineBasicBlock::iterator, 8> InsPoint(MaxLogAlign + 1,
BB->end());		BB->end());

▲ Show 20 Lines • Show All 1,921 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/const-load-align-thumb.mir

This file was added.

				# RUN: llc -mtriple=arm-eabi -run-pass=arm-cp-islands %s -o - \| FileCheck %s
				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.2a-arm-none-eabi"

				define hidden i32 @main() {
				entry:
				%P5 = alloca half, align 2
				store half 0xH3FE0, half* %P5, align 2
				%0 = load half, half* %P5, align 2
				call void @z_bar(half %0)
				ret i32 0
				}

				declare dso_local void @z_bar(half)

				...
				---
				name: main
				alignment: 2
				tracksRegLiveness: true
				frameInfo:
				stackSize: 16
				maxAlignment: 4
				adjustsStack: true
				hasCalls: true
				maxCallFrameSize: 0
				localFrameSize: 2
				stack:
				- { id: 0, name: P5, offset: -10, size: 2, alignment: 2, local-offset: -2 }
				- { id: 1, type: spill-slot, offset: -4, size: 4, alignment: 4, callee-saved-register: '$lr',
				callee-saved-restored: false }
				- { id: 2, type: spill-slot, offset: -8, size: 4, alignment: 4, callee-saved-register: '$r7' }
				constants:
				- id: 0
				value: half 0xH3FE0
				alignment: 2
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				$sp = frame-setup tSUBspi $sp, 2, 14 /* CC::al */, $noreg
				frame-setup CFI_INSTRUCTION def_cfa_offset 16
				renamable $s0 = VLDRH %const.0, 0, 14, $noreg :: (load 2 from constant-pool)
				VSTRH killed renamable $s0, $sp, 3, 14, $noreg :: (store 2 into %ir.P5)
				renamable $r0 = t2LDRHi12 $sp, 6, 14 /* CC::al */, $noreg :: (dereferenceable load 2 from %ir.P5)
				tBL 14 /* CC::al */, $noreg, @z_bar, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $r0, implicit-def $sp
				renamable $r0, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg
				$sp = frame-destroy tADDspi $sp, 2, 14 /* CC::al */, $noreg
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0

				; CHECK: name: main
				; CHECK-NEXT: alignment: 4
				...