This is an archive of the discontinued LLVM Phabricator instance.

LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned data at -O0/fast-isel (-mno-unaligned-access)
ClosedPublic

Authored by labrinea on Jun 10 2015, 8:31 AM.

Details

Summary

Clang doesn't correctly generate loads in the presence of the -mno-unaligned-access flag. At -O0 it produces:
$ clang --target=aarch64-arm-none-eabi -mno-unaligned-access -O0 -S test.c -o-

foo:
	stp	x29, x30, [sp, #-16]!
	mov	 x29, sp
	sub	sp, sp, #32             // =32
	adrp	x8, a1
	add	x8, x8, :lo12:a1
	movz	x2, #0x5
	sub	x9, x29, #8             // =8
	add	x10, sp, #16            // =16
	str	x0, [sp, #16]
	mov	 x0, x9
	mov	 x1, x10
	str	x8, [sp, #8]            // 8-byte Folded Spill
	bl	memcpy
	ldur	w11, [x29, #-7]
	ldr	x8, [sp, #8]            // 8-byte Folded Reload
	ldur	w12, [x8, #1]
	add	 w11, w12, w11
	stur	w11, [x8, #1]
	mov	 w0, w11
	mov	 sp, x29
	ldp	x29, x30, [sp], #16
	ret

At -O1 and above, it produces correct code, using ldrb's to access to non-aligned data:
clang --target=aarch64-arm-none-eabi -mno-unaligned-access -O1 -S test.c -o-

foo:
        adrp	x8, a1
	add	x8, x8, :lo12:a1
	ldrb	w9, [x8, #1]!
	ldrb	w10, [x8, #3]
	ldrb	w11, [x8, #2]
	ldrb	w12, [x8, #1]
	bfi	w11, w10, #8, #8
	lsr	x10, x0, #8
	bfi	w9, w12, #8, #8
	bfi	w9, w11, #16, #16
	add	 w0, w9, w10
	strb	 w0, [x8]
	lsr	w9, w0, #24
	lsr	w10, w0, #16
	lsr	w11, w0, #8
	strb	w9, [x8, #3]
	strb	w10, [x8, #2]
	strb	w11, [x8, #1]
	ret

The root cause seems to be in fast-isel not producing unaligned access correctly for -mno-unaligned-access:
clang --target=aarch64-arm-none-eabi -mno-unaligned-access -O1 -mllvm -fast-isel -S test.c -o-

foo:
        adrp	x8, a1
	add	x8, x8, :lo12:a1
	ldur	w9, [x8, #1]
	lsr	x10, x0, #8
	add	 w0, w9, w10
	stur	w0, [x8, #1]
	ret

Diff Detail

Repository
rL LLVM

Event Timeline

labrinea updated this revision to Diff 27443.Jun 10 2015, 8:31 AM
labrinea retitled this revision from to LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned data at -O0/fast-isel (-mno-unaligned-access).
labrinea updated this object.
labrinea edited the test plan for this revision. (Show Details)
labrinea added reviewers: t.p.northover, mcrosier.
labrinea set the repository for this revision to rL LLVM.
labrinea added a subscriber: Unknown Object (MLST).
mcrosier edited edge metadata.Jun 10 2015, 8:42 AM

Adding Juergen as he might want to fix this in a less conservative way.

mcrosier added inline comments.Jun 10 2015, 8:43 AM
test/CodeGen/AArch64/arm64-strict-align.ll
3–4

Adding -fast-isel to the run line means we're no longer testing the ISelDAG path. Please create a new RUN line for fast-isel.

labrinea updated this object.Jun 10 2015, 8:53 AM
labrinea edited edge metadata.
labrinea updated this object.
labrinea updated this object.Jun 10 2015, 9:02 AM
labrinea updated this revision to Diff 27459.EditedJun 10 2015, 11:15 AM

Created a new RUN line for fast-isel.

mcrosier accepted this revision.Jun 15 2015, 5:54 AM
mcrosier edited edge metadata.

Please address the comments about the test case and then commit.

test/CodeGen/AArch64/arm64-strict-align.ll
4

Please use the "CHECK-STRICT" prefix; the output is exactly the same for fast-isel and non-fast-isel. No need for the redundancy.

This revision is now accepted and ready to land.Jun 15 2015, 5:54 AM
labrinea updated this revision to Diff 27662.Jun 15 2015, 6:31 AM
labrinea edited edge metadata.

Removed redundant label from regression test.

LGTM. Feel free to commit.

This revision was automatically updated to reflect the committed changes.