This is an archive of the discontinued LLVM Phabricator instance.

[AlignmentFromAssumptions] getNewAlignmentDiff(): use getURemExpr()
ClosedPublic

Authored by cjld on Aug 22 2019, 1:12 AM.

Details

Summary

Better way to fix misaligned mov instruction ,

alignment-from-assumptions pass doesn't generate aligned mov instructions, a example below:

// b.cc
#include <cstddef>
#include <stdint.h>

typedef long long index;

extern "C" index g_tid;
extern "C" index g_num;


void add3(float* __restrict__ a, float* __restrict__ b, float* __restrict__ c) {
    index n = 64*1024;
    index m = 16*1024;
    index k = 4*1024;
    index tid = g_tid;
    index num = g_num;
    __builtin_assume_aligned(a, 32);
    __builtin_assume_aligned(b, 32);
    __builtin_assume_aligned(c, 32);
    for (index i0=tid*k; i0<m; i0+=num*k)
        for (index i1=0; i1<n*m; i1+=m)
            for (index i2=0; i2<k; i2++)
                c[i1+i0+i2] = b[i0+i2] + a[i1+i0+i2];
}

compile with clang ./b.cc -Ofast -march=native -std=c++14 -S -o b.s. (intel i7-7500U)
which yield:

// b.s
......
	vmovaps	-224(%rdi,%rbx,4), %ymm0
	vmovups	-192(%rdi,%rbx,4), %ymm1
	vmovups	-160(%rdi,%rbx,4), %ymm2
	vmovups	-128(%rdi,%rbx,4), %ymm3
	vaddps	-224(%rsi,%rbx,4), %ymm0, %ymm0
	vaddps	-192(%rsi,%rbx,4), %ymm1, %ymm1
	vaddps	-160(%rsi,%rbx,4), %ymm2, %ymm2
	vaddps	-128(%rsi,%rbx,4), %ymm3, %ymm3
	vmovaps	%ymm0, -224(%rdx,%rbx,4)
	vmovups	%ymm1, -192(%rdx,%rbx,4)
	vmovups	%ymm2, -160(%rdx,%rbx,4)
	vmovups	%ymm3, -128(%rdx,%rbx,4)
......

expect:

// b.s
......
	vmovaps	-224(%rdi,%rbx,4), %ymm0
	vmovaps	-192(%rdi,%rbx,4), %ymm1
	vmovaps	-160(%rdi,%rbx,4), %ymm2
	vmovaps	-128(%rdi,%rbx,4), %ymm3
	vaddps	-224(%rsi,%rbx,4), %ymm0, %ymm0
	vaddps	-192(%rsi,%rbx,4), %ymm1, %ymm1
	vaddps	-160(%rsi,%rbx,4), %ymm2, %ymm2
	vaddps	-128(%rsi,%rbx,4), %ymm3, %ymm3
	vmovaps	%ymm0, -224(%rdx,%rbx,4)
	vmovaps	%ymm1, -192(%rdx,%rbx,4)
	vmovaps	%ymm2, -160(%rdx,%rbx,4)
	vmovaps	%ymm3, -128(%rdx,%rbx,4)
......

This is because the alignment-from-assumptions pass using the wrong function to calculate the alignment

Diff Detail

Repository
rL LLVM

Event Timeline

cjld created this revision.Aug 22 2019, 1:12 AM
lebedev.ri retitled this revision from Better way to fix misaligned mov instruction to [AlignmentFromAssumptions] getNewAlignmentDiff(): use getURemExpr().Aug 22 2019, 6:48 AM
lebedev.ri added a reviewer: hfinkel.

Thanks, this looks like a fix, although i'm surprized by it's simplicity.

jdoerfert accepted this revision.Aug 22 2019, 9:38 AM

LGTM.

The original code tried to do the modulo computation (as per comment and the looks of it) but the operands of DiffUnitsSCEV = SE->getMinusSCEV(DiffAlign, DiffSCEV) were swapped.
Swapping them should yield the same result as using URem but using URem is better so this is fine.

This revision is now accepted and ready to land.Aug 22 2019, 9:38 AM

Requested by @cjld to commit on his behalf. (I will simplify the description a bit.)

This revision was automatically updated to reflect the committed changes.